The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Ranaldi, Leonardo; Nourbakhsh, Aria; Patrizi, Arianna; Ruzzetti, Elena Sofia; Onorati, Dario; Fallucchi, Francesca; Zanzotto, Fabio Massimo

doi:10.26615/978-954-452-092-2_102

Full-text links:

Download:

Computer Science > Computation and Language

Title: The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Authors: Leonardo Ranaldi, Aria Nourbakhsh, Arianna Patrizi, Elena Sofia Ruzzetti, Dario Onorati, Francesca Fallucchi, Fabio Massimo Zanzotto

(Submitted on 14 Jan 2022 (v1), last revised 17 Nov 2023 (this version, v3))

Abstract: Pre-trained Transformers are challenging human performances in many NLP tasks. The massive datasets used for pre-training seem to be the key to their success on existing tasks. In this paper, we explore how a range of pre-trained Natural Language Understanding models perform on definitely unseen sentences provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks perform on par with pre-trained Transformers even after fine-tuning. Only after what we call extreme domain adaptation, that is, retraining with the masked language model task on all the novel corpus, pre-trained Transformers reach their standard high results. This suggests that huge pre-training corpora may give Transformers unexpected help since they are exposed to many of the possible sentences.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
DOI:	10.26615/978-954-452-092-2_102
Cite as:	arXiv:2201.05613 [cs.CL]
	(or arXiv:2201.05613v3 [cs.CL] for this version)

Submission history

From: Leonardo Ranaldi Mr [view email]
[v1] Fri, 14 Jan 2022 16:04:09 GMT (188kb)
[v2] Wed, 9 Feb 2022 20:31:43 GMT (206kb)
[v3] Fri, 17 Nov 2023 13:01:01 GMT (2126kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.05613

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Submission history