Verifying the Robustness of Automatic Credibility Assessment

Przybyła, Piotr; Shvets, Alexander; Saggion, Horacio

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2303

Computer Science > Computation and Language

Title: Verifying the Robustness of Automatic Credibility Assessment

Authors: Piotr Przybyła, Alexander Shvets, Horacio Saggion

(Submitted on 14 Mar 2023 (v1), last revised 11 Aug 2023 (this version, v2))

Abstract: Text classification methods have been widely investigated as a way to detect content of low credibility: fake news, social media bots, propaganda, etc. Quite accurate models (likely based on deep neural networks) help in moderating public electronic platforms and often cause content creators to face rejection of their submissions or removal of already published texts. Having the incentive to evade further detection, content creators try to come up with a slightly modified version of the text (known as an attack with an adversarial example) that exploit the weaknesses of classifiers and result in a different output. Here we systematically test the robustness of popular text classifiers against available attacking techniques and discover that, indeed, in some cases insignificant changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on four misinformation detection tasks in an evaluation framework designed to simulate real use-cases of content moderation. Finally, we manually analyse a subset adversarial examples and check what kinds of modifications are used in successful attacks. The BODEGA code and data is openly shared in hope of enhancing the comparability and replicability of further research in this area

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2303.08032 [cs.CL]
	(or arXiv:2303.08032v2 [cs.CL] for this version)

Submission history

From: Piotr Przybyła [view email]
[v1] Tue, 14 Mar 2023 16:11:47 GMT (102kb,D)
[v2] Fri, 11 Aug 2023 09:59:07 GMT (99kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.08032

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Verifying the Robustness of Automatic Credibility Assessment

Submission history