Don't trust your eyes: on the (un)reliability of feature visualizations

Geirhos, Robert; Zimmermann, Roland S.; Bilodeau, Blair; Brendel, Wieland; Kim, Been

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2306

Computer Science > Computer Vision and Pattern Recognition

Title: Don't trust your eyes: on the (un)reliability of feature visualizations

Authors: Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau, Wieland Brendel, Been Kim

(Submitted on 7 Jun 2023 (v1), last revised 28 Sep 2023 (this version, v5))

Abstract: How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. This can be used as a sanity check for feature visualizations. We underpin our empirical findings by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Cite as:	arXiv:2306.04719 [cs.CV]
	(or arXiv:2306.04719v5 [cs.CV] for this version)

Submission history

From: Robert Geirhos [view email]
[v1] Wed, 7 Jun 2023 18:31:39 GMT (8394kb,D)
[v2] Wed, 21 Jun 2023 15:52:05 GMT (8394kb,D)
[v3] Tue, 4 Jul 2023 19:39:25 GMT (8250kb,D)
[v4] Thu, 6 Jul 2023 19:03:47 GMT (8250kb,D)
[v5] Thu, 28 Sep 2023 19:18:58 GMT (4397kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2306.04719

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Don't trust your eyes: on the (un)reliability of feature visualizations

Submission history