We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: A Probabilistic Interpretation of Transformers

Abstract: We propose a probabilistic interpretation of exponential dot product attention of transformers and contrastive learning based off of exponential families. The attention sublayer of transformers is equivalent to a gradient ascent step of the log normalizer, which is the log-sum-exp term in the Hopfield theory of attention. This ascent step induces a parallel expansion of points, which is counterbalanced by a contraction from layer normalization. We also state theoretical limitations of our theory and the Hopfield theory and suggest directions for resolution.
Comments: Accepted in ICML 2021 Workshop: Self-Supervised Learning for Reasoning and Perception
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2205.01080 [cs.LG]
  (or arXiv:2205.01080v1 [cs.LG] for this version)

Submission history

From: Alexander Shim [view email]
[v1] Thu, 28 Apr 2022 23:05:02 GMT (37kb)

Link back to: arXiv, form interface, contact.