Compressing Word Embeddings Using Syllables

Mertens, Laurent; Vennekens, Joost

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2201

Computer Science > Computation and Language

Title: Compressing Word Embeddings Using Syllables

Authors: Laurent Mertens, Joost Vennekens

(Submitted on 13 Jan 2022)

Abstract: This work examines the possibility of using syllable embeddings, instead of the often used $n$-gram embeddings, as subword embeddings. We investigate this for two languages: English and Dutch. To this end, we also translated two standard English word embedding evaluation datasets, WordSim353 and SemEval-2017, to Dutch. Furthermore, we provide the research community with data sets of syllabic decompositions for both languages. We compare our approach to full word and $n$-gram embeddings. Compared to full word embeddings, we obtain English models that are 20 to 30 times smaller while retaining 80% of the performance. For Dutch, models are 15 times smaller for 70% performance retention. Although less accurate than the $n$-gram baseline we used, our models can be trained in a matter of minutes, as opposed to hours for the $n$-gram approach. We identify a path toward upgrading performance in future work. All code is made publicly available, as well as our collected English and Dutch syllabic decompositions and Dutch evaluation set translations.

Comments:	19 pages 3 figures 11 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2201.04913 [cs.CL]
	(or arXiv:2201.04913v1 [cs.CL] for this version)

Submission history

From: Laurent Mertens [view email]
[v1] Thu, 13 Jan 2022 12:09:44 GMT (108kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.04913

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Compressing Word Embeddings Using Syllables

Submission history