Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

Ju, Da; Xu, Jing; Boureau, Y-Lan; Weston, Jason

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 2208

Computer Science > Computation and Language

Title: Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

Authors: Da Ju, Jing Xu, Y-Lan Boureau, Jason Weston

(Submitted on 5 Aug 2022)

Abstract: The promise of interaction between intelligent conversational agents and humans is that models can learn from such feedback in order to improve. Unfortunately, such exchanges in the wild will not always involve human utterances that are benign or of high quality, and will include a mixture of engaged (helpers) and unengaged or even malicious users (trolls). In this work we study how to perform robust learning in such an environment. We introduce a benchmark evaluation, SafetyMix, which can evaluate methods that learn safe vs. toxic language in a variety of adversarial settings to test their robustness. We propose and analyze several mitigating learning algorithms that identify trolls either at the example or at the user level. Our main finding is that user-based methods, that take into account that troll users will exhibit adversarial behavior across multiple examples, work best in a variety of settings on our benchmark. We then test these methods in a further real-life setting of conversations collected during deployment, with similar results.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2208.03295 [cs.CL]
	(or arXiv:2208.03295v1 [cs.CL] for this version)

Submission history

From: Jason Weston [view email]
[v1] Fri, 5 Aug 2022 17:33:33 GMT (9853kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2208.03295

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Learning from data in the mixed adversarial non-adversarial case: Finding the helpers and ignoring the trolls

Submission history