Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

Wu, Xi; Jang, Uyeong; Chen, Jiefeng; Chen, Lingjiao; Jha, Somesh

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1711

Computer Science > Machine Learning

Title: Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

Authors: Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, Somesh Jha

(Submitted on 21 Nov 2017 (v1), last revised 8 Jun 2018 (this version, v3))

Abstract: In this paper we study leveraging confidence information induced by adversarial training to reinforce adversarial robustness of a given adversarially trained model. A natural measure of confidence is $\|F({\bf x})\|_\infty$ (i.e. how confident $F$ is about its prediction?). We start by analyzing an adversarial training formulation proposed by Madry et al.. We demonstrate that, under a variety of instantiations, an only somewhat good solution to their objective induces confidence to be a discriminator, which can distinguish between right and wrong model predictions in a neighborhood of a point sampled from the underlying distribution. Based on this, we propose Highly Confident Near Neighbor (${\tt HCNN}$), a framework that combines confidence information and nearest neighbor search, to reinforce adversarial robustness of a base model. We give algorithms in this framework and perform a detailed empirical study. We report encouraging experimental results that support our analysis, and also discuss problems we observed with existing adversarial training.

Comments:	To appear in ICML 2018
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Cite as:	arXiv:1711.08001 [cs.LG]
	(or arXiv:1711.08001v3 [cs.LG] for this version)

Submission history

From: Xi Wu [view email]
[v1] Tue, 21 Nov 2017 19:15:05 GMT (75kb,D)
[v2] Mon, 1 Jan 2018 20:12:55 GMT (77kb,D)
[v3] Fri, 8 Jun 2018 13:46:51 GMT (96kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1711.08001

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

Submission history