A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples

Wang, Beilun; Gao, Ji; Qi, Yanjun

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1612

Computer Science > Machine Learning

Title: A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples

Authors: Beilun Wang, Ji Gao, Yanjun Qi

(Submitted on 1 Dec 2016 (v1), revised 3 Feb 2017 (this version, v8), latest version 27 Sep 2017 (v12))

Abstract: Adversarial samples are maliciously created inputs that lead a learning-based classifier to produce incorrect output labels. An adversarial sample is often generated by adding adversarial perturbation (AP) to a normal test sample. Recent studies that tried to analyze classifiers under such AP are mostly empirical and provide little understanding of why. To fill this gap, we propose a theoretical framework for analyzing learning-based classifiers, especially deep neural networks (DNN) in the face of such AP. By using concepts from topology, this framework brings forth the key reasons why an adversarial can fool a classifier ($f_1$) and suggests a new focus on its oracle ($f_2$, like human annotators of that specific task). By investigating the topology relationship between two (pseudo)metric spaces corresponding to predictor $f_1$ and oracle $f_2$, we develop several ideal conditions that can determine if $f_1$ is always robust (strong-robust) against adversarial samples according to its $f_2$. The theoretical framework leads to a set of novel and complementary insights that have not been uncovered by the literature. Surprisingly our theorems find that just one extra irrelevant feature can make a classifier not strong-robust, and the right feature representation learning is the key to getting a classifier that is both accurate and strong robust. Empirically we find that "Siamese architecture" can be used to help DNN models get close to the desired topological relationship for strong-robustness, which in turn effectively improves its performance against AP.

Comments:	30 pages , submitting to ICLR 2017
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1612.00334 [cs.LG]
	(or arXiv:1612.00334v8 [cs.LG] for this version)

Submission history

From: Yanjun Qi Dr. [view email]
[v1] Thu, 1 Dec 2016 16:20:39 GMT (978kb,D)
[v2] Mon, 5 Dec 2016 17:07:35 GMT (1311kb,D)
[v3] Tue, 17 Jan 2017 22:23:55 GMT (2899kb,D)
[v4] Sat, 21 Jan 2017 16:37:24 GMT (2908kb,D)
[v5] Thu, 26 Jan 2017 15:32:06 GMT (2918kb,D)
[v6] Wed, 1 Feb 2017 17:30:50 GMT (2922kb,D)
[v7] Thu, 2 Feb 2017 14:39:50 GMT (2922kb,D)
[v8] Fri, 3 Feb 2017 16:06:39 GMT (2924kb,D)
[v9] Mon, 27 Feb 2017 20:18:26 GMT (3233kb,D)
[v10] Thu, 9 Mar 2017 22:00:56 GMT (3218kb,D)
[v11] Thu, 27 Apr 2017 14:36:40 GMT (3029kb,D)
[v12] Wed, 27 Sep 2017 16:02:48 GMT (3236kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1612.00334v8

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples

Submission history