Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Defining and Quantifying the Emergence of Sparse Concepts in DNNs
(Submitted on 11 Nov 2021 (v1), last revised 3 Apr 2023 (this version, v6))
Abstract: This paper aims to illustrate the concept-emerging phenomenon in a trained DNN. Specifically, we find that the inference score of a DNN can be disentangled into the effects of a few interactive concepts. These concepts can be understood as causal patterns in a sparse, symbolic causal graph, which explains the DNN. The faithfulness of using such a causal graph to explain the DNN is theoretically guaranteed, because we prove that the causal graph can well mimic the DNN's outputs on an exponential number of different masked samples. Besides, such a causal graph can be further simplified and re-written as an And-Or graph (AOG), without losing much explanation accuracy.
Submission history
From: Quanshi Zhang [view email] [via QUANSHI proxy][v1] Thu, 11 Nov 2021 13:48:20 GMT (11431kb,D)
[v2] Sat, 27 Nov 2021 09:49:35 GMT (15017kb,D)
[v3] Tue, 30 Nov 2021 14:50:12 GMT (15017kb,D)
[v4] Mon, 17 Oct 2022 13:36:16 GMT (5546kb,D)
[v5] Sat, 25 Feb 2023 09:37:35 GMT (5045kb,D)
[v6] Mon, 3 Apr 2023 12:02:02 GMT (4843kb,D)
Link back to: arXiv, form interface, contact.