Reverse engineering adversarial attacks with fingerprints from adversarial examples

Nicholson, David Aaron; Emanuele, Vincent

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 2301

Change to browse by:

Computer Science > Artificial Intelligence

Title: Reverse engineering adversarial attacks with fingerprints from adversarial examples

Authors: David Aaron Nicholson, Vincent Emanuele

(Submitted on 31 Jan 2023 (v1), last revised 1 Feb 2023 (this version, v2))

Abstract: In spite of intense research efforts, deep neural networks remain vulnerable to adversarial examples: an input that forces the network to confidently produce incorrect outputs. Adversarial examples are typically generated by an attack algorithm that optimizes a perturbation added to a benign input. Many such algorithms have been developed. If it were possible to reverse engineer attack algorithms from adversarial examples, this could deter bad actors because of the possibility of attribution. Here we formulate reverse engineering as a supervised learning problem where the goal is to assign an adversarial example to a class that represents the algorithm and parameters used. To our knowledge it has not been previously shown whether this is even possible. We first test whether we can classify the perturbations added to images by attacks on undefended single-label image classification models. Taking a "fight fire with fire" approach, we leverage the sensitivity of deep neural networks to adversarial examples, training them to classify these perturbations. On a 17-class dataset (5 attacks, 4 bounded with 4 epsilon values each), we achieve an accuracy of 99.4% with a ResNet50 model trained on the perturbations. We then ask whether we can perform this task without access to the perturbations, obtaining an estimate of them with signal processing algorithms, an approach we call "fingerprinting". We find the JPEG algorithm serves as a simple yet effective fingerprinter (85.05% accuracy), providing a strong baseline for future work. We discuss how our approach can be extended to attack agnostic, learnable fingerprints, and to open-world scenarios with unknown attacks.

Comments:	8 pages, 6 figures
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2301.13869 [cs.AI]
	(or arXiv:2301.13869v2 [cs.AI] for this version)

Submission history

From: David Nicholson [view email]
[v1] Tue, 31 Jan 2023 18:59:37 GMT (1492kb,D)
[v2] Wed, 1 Feb 2023 16:34:52 GMT (1492kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.13869

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Artificial Intelligence

Title: Reverse engineering adversarial attacks with fingerprints from adversarial examples

Submission history