We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Blind signal decomposition of various word embeddings based on join and individual variance explained

Abstract: In recent years, natural language processing (NLP) has become one of the most important areas with various applications in human's life. As the most fundamental task, the field of word embedding still requires more attention and research. Currently, existing works about word embedding are focusing on proposing novel embedding algorithms and dimension reduction techniques on well-trained word embeddings. In this paper, we propose to use a novel joint signal separation method - JIVE to jointly decompose various trained word embeddings into joint and individual components. Through this decomposition framework, we can easily investigate the similarity and difference among different word embeddings. We conducted extensive empirical study on word2vec, FastText and GLoVE trained on different corpus and with different dimensions. We compared the performance of different decomposed components based on sentiment analysis on Twitter and Stanford sentiment treebank. We found that by mapping different word embeddings into the joint component, sentiment performance can be greatly improved for the original word embeddings with lower performance. Moreover, we found that by concatenating different components together, the same model can achieve better performance. These findings provide great insights into the word embeddings and our work offer a new of generating word embeddings by fusing.
Comments: 9 pages, 10 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as: arXiv:2011.14496 [cs.CL]
  (or arXiv:2011.14496v1 [cs.CL] for this version)

Submission history

From: Yikai Wang [view email]
[v1] Mon, 30 Nov 2020 01:36:29 GMT (1886kb,D)

Link back to: arXiv, form interface, contact.