Analyzing Encoded Concepts in Transformer Language Models

Sajjad, Hassan; Durrani, Nadir; Dalvi, Fahim; Alam, Firoj; Khan, Abdul Rafae; Xu, Jia

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 2206

Computer Science > Computation and Language

Title: Analyzing Encoded Concepts in Transformer Language Models

Authors: Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Rafae Khan, Jia Xu

(Submitted on 27 Jun 2022)

Abstract: We propose a novel framework ConceptX, to analyze how latent concepts are encoded in representations learned within pre-trained language models. It uses clustering to discover the encoded concepts and explains them by aligning with a large set of human-defined concepts. Our analysis on seven transformer language models reveal interesting insights: i) the latent space within the learned representations overlap with different linguistic concepts to a varying degree, ii) the lower layers in the model are dominated by lexical concepts (e.g., affixation), whereas the core-linguistic concepts (e.g., morphological or syntactic relations) are better represented in the middle and higher layers, iii) some encoded concepts are multi-faceted and cannot be adequately explained using the existing human-defined concepts.

Comments:	20 pages, 10 figures
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Journal reference:	2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cite as:	arXiv:2206.13289 [cs.CL]
	(or arXiv:2206.13289v1 [cs.CL] for this version)

Submission history

From: Hassan Sajjad [view email]
[v1] Mon, 27 Jun 2022 13:32:10 GMT (13619kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.13289

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Analyzing Encoded Concepts in Transformer Language Models

Submission history