Backdoor Pre-trained Models Can Transfer to All

Shen, Lujia; Ji, Shouling; Zhang, Xuhong; Li, Jinfeng; Chen, Jing; Shi, Jie; Fang, Chengfang; Yin, Jianwei; Wang, Ting

doi:10.1145/3460120.3485370

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2111

Computer Science > Computation and Language

Title: Backdoor Pre-trained Models Can Transfer to All

Authors: Lujia Shen, Shouling Ji, Xuhong Zhang, Jinfeng Li, Jing Chen, Jie Shi, Chengfang Fang, Jianwei Yin, Ting Wang

(Submitted on 30 Oct 2021)

Abstract: Pre-trained general-purpose language models have been a dominating component in enabling real-world natural language processing (NLP) applications. However, a pre-trained model with backdoor can be a severe threat to the applications. Most existing backdoor attacks in NLP are conducted in the fine-tuning phase by introducing malicious triggers in the targeted class, thus relying greatly on the prior knowledge of the fine-tuning task. In this paper, we propose a new approach to map the inputs containing triggers directly to a predefined output representation of the pre-trained NLP models, e.g., a predefined output representation for the classification token in BERT, instead of a target label. It can thus introduce backdoor to a wide range of downstream tasks without any prior knowledge. Additionally, in light of the unique properties of triggers in NLP, we propose two new metrics to measure the performance of backdoor attacks in terms of both effectiveness and stealthiness. Our experiments with various types of triggers show that our method is widely applicable to different fine-tuning tasks (classification and named entity recognition) and to different models (such as BERT, XLNet, BART), which poses a severe threat. Furthermore, by collaborating with the popular online model repository Hugging Face, the threat brought by our method has been confirmed. Finally, we analyze the factors that may affect the attack performance and share insights on the causes of the success of our backdoor attack.

Subjects:	Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
DOI:	10.1145/3460120.3485370
Cite as:	arXiv:2111.00197 [cs.CL]
	(or arXiv:2111.00197v1 [cs.CL] for this version)

Submission history

From: Lujia Shen [view email]
[v1] Sat, 30 Oct 2021 07:11:24 GMT (2357kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.00197

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Backdoor Pre-trained Models Can Transfer to All

Submission history