We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Self-Supervised Contrastive Learning with Adversarial Perturbations for Robust Pretrained Language Models

Abstract: This paper improves the robustness of the pretrained language model, BERT, against word substitution-based adversarial attacks by leveraging self-supervised contrastive learning with adversarial perturbations. One advantage of our method compared to previous works is that it is capable of improving model robustness without using any labels. Additionally, we also create an adversarial attack for word-level adversarial training on BERT. The attack is efficient, allowing adversarial training for BERT on adversarial examples generated \textit{on the fly} during training. Experimental results show that our method improves the robustness of BERT against four different word substitution-based adversarial attacks. Additionally, combining our method with adversarial training gives higher robustness than adversarial training alone. Furthermore, to understand why our method can improve the model robustness against adversarial attacks, we study vector representations of clean examples and their corresponding adversarial examples before and after applying our method. As our method improves model robustness with unlabeled raw data, it opens up the possibility of using large text datasets to train robust language models.
Comments: Work in progress
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2107.07610 [cs.CL]
  (or arXiv:2107.07610v2 [cs.CL] for this version)

Submission history

From: Zhao Meng [view email]
[v1] Thu, 15 Jul 2021 21:03:34 GMT (97kb,D)
[v2] Thu, 16 Sep 2021 11:31:24 GMT (128kb,D)

Link back to: arXiv, form interface, contact.