A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples

Wang, Beilun; Gao, Ji; Qi, Yanjun

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1612

Computer Science > Machine Learning

Title: A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples

Authors: Beilun Wang, Ji Gao, Yanjun Qi

(Submitted on 1 Dec 2016 (v1), revised 17 Jan 2017 (this version, v3), latest version 27 Sep 2017 (v12))

Abstract: Adversarial samples are maliciously created inputs that lead a machine learning classifier to produce incorrect output labels. An adversarial sample is often generated by adding adversarial noise (AN) to a normal test sample. Recent literature has tried to analyze and harden learning-based classifiers under such AN. However, most previous studies are empirical and provide little understanding of the underlying reasons why many machine learning classifiers, including deep neural networks (DNNs), are vulnerable to AN. To fill this gap, we propose a theoretical framework using two topology spaces to understand classifiers' robustness against AN. The central idea of our work is that for a certain classification task, the robustness of a classifier $f_1$ against AN is decided by both $f_1$ and its oracle $f_2$ (such as a human annotator of that specific task). This motivates us to formulate a formal definition of "strong-robustness" that describes when a classifier $f_1$ is always robust against AN according to its $f_2$. The second key piece of our framework is the decomposition of $f_i = c_i \circ g_i $, in which $i \in {1,2}$, $g_i$ includes feature learning operations and $c_i$ includes relatively simple decision functions for the classification. We theoretically prove that $f_1$ is strong-robust against AN $\Leftrightarrow$ a special topology relationship exists between the two feature spaces defined by $g_1$ and $g_2$. Surprisingly, our theorems indicate that the strong-robustness of $f_1$ against AN is fully determined by its $g_1$, not $c_1$. Empirically we find that the Siamese architecture can intuitively help DNN models approach topological equivalence between the two feature spaces, which in turns effectively improves its robustness against AN.

Comments:	30 pages , submitting to ICLR 2017
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1612.00334 [cs.LG]
	(or arXiv:1612.00334v3 [cs.LG] for this version)

Submission history

From: Yanjun Qi Dr. [view email]
[v1] Thu, 1 Dec 2016 16:20:39 GMT (978kb,D)
[v2] Mon, 5 Dec 2016 17:07:35 GMT (1311kb,D)
[v3] Tue, 17 Jan 2017 22:23:55 GMT (2899kb,D)
[v4] Sat, 21 Jan 2017 16:37:24 GMT (2908kb,D)
[v5] Thu, 26 Jan 2017 15:32:06 GMT (2918kb,D)
[v6] Wed, 1 Feb 2017 17:30:50 GMT (2922kb,D)
[v7] Thu, 2 Feb 2017 14:39:50 GMT (2922kb,D)
[v8] Fri, 3 Feb 2017 16:06:39 GMT (2924kb,D)
[v9] Mon, 27 Feb 2017 20:18:26 GMT (3233kb,D)
[v10] Thu, 9 Mar 2017 22:00:56 GMT (3218kb,D)
[v11] Thu, 27 Apr 2017 14:36:40 GMT (3029kb,D)
[v12] Wed, 27 Sep 2017 16:02:48 GMT (3236kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1612.00334v3

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples

Submission history