Will it Unblend?

Pinter, Yuval; Jacobs, Cassandra L.; Eisenstein, Jacob

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2009

Computer Science > Computation and Language

Title: Will it Unblend?

Authors: Yuval Pinter, Cassandra L. Jacobs, Jacob Eisenstein

(Submitted on 18 Sep 2020)

Abstract: Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as "innoventor", are one particularly challenging class of OOV, as they are formed by fusing together two or more bases that relate to the intended meaning in unpredictable manners and degrees. In this work, we run experiments on a novel dataset of English OOV blends to quantify the difficulty of interpreting the meanings of blends by large-scale contextual language models such as BERT. We first show that BERT's processing of these blends does not fully access the component meanings, leaving their contextual representations semantically impoverished. We find this is mostly due to the loss of characters resulting from blend formation. Then, we assess how easily different models can recognize the structure and recover the origin of blends, and find that context-aware embedding systems outperform character-level and context-free embeddings, although their results are still far from satisfactory.

Comments:	Findings of EMNLP 2020
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2009.09123 [cs.CL]
	(or arXiv:2009.09123v1 [cs.CL] for this version)

Submission history

From: Yuval Pinter [view email]
[v1] Fri, 18 Sep 2020 23:59:15 GMT (566kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2009.09123

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Will it Unblend?

Submission history