We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions

Abstract: Typically, a linearly orthogonal transformation mapping is learned by aligning static type-level embeddings to build a shared semantic space. In view of the analysis that contextual embeddings contain richer semantic features, we investigate a context-aware and dictionary-free mapping approach by leveraging parallel corpora. We illustrate that our contextual embedding space mapping significantly outperforms previous multilingual word embedding methods on the bilingual dictionary induction (BDI) task by providing a higher degree of isomorphism. To improve the quality of mapping, we also explore sense-level embeddings that are split from type-level representations, which can align spaces in a finer resolution and yield more precise mapping. Moreover, we reveal that contextual embedding spaces suffer from their natural properties -- anisotropy and anisometry. To mitigate these two problems, we introduce the iterative normalization algorithm as an imperative preprocessing step. Our findings unfold the tight relationship between isotropy, isometry, and isomorphism in normalized contextual embedding spaces.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2107.09186 [cs.CL]
  (or arXiv:2107.09186v1 [cs.CL] for this version)

Submission history

From: Haoran Xu [view email]
[v1] Mon, 19 Jul 2021 22:57:36 GMT (489kb,D)

Link back to: arXiv, form interface, contact.