Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Li, Yige; Lyu, Xixiang; Koren, Nodens; Lyu, Lingjuan; Li, Bo; Ma, Xingjun

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2110

Computer Science > Machine Learning

Title: Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Authors: Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma

(Submitted on 22 Oct 2021 (v1), last revised 1 Dec 2021 (this version, v3))

Abstract: Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the \emph{clean} and the \emph{backdoor} portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage \emph{gradient ascent} mechanism for standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at \url{this https URL}.

Comments:	Accepted to NeurIPS 2021
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2110.11571 [cs.LG]
	(or arXiv:2110.11571v3 [cs.LG] for this version)

Submission history

From: Yige Li [view email]
[v1] Fri, 22 Oct 2021 03:30:48 GMT (35221kb,D)
[v2] Mon, 25 Oct 2021 03:41:22 GMT (38403kb,D)
[v3] Wed, 1 Dec 2021 10:47:34 GMT (38640kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.11571

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Submission history