Word-level Lexical Normalisation using Context-Dependent Embeddings

Stewart, Michael; Liu, Wei; Cardell-Oliver, Rachel

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1911

Change to browse by:

Computer Science > Computation and Language

Title: Word-level Lexical Normalisation using Context-Dependent Embeddings

Authors: Michael Stewart, Wei Liu, Rachel Cardell-Oliver

(Submitted on 13 Nov 2019)

Abstract: Lexical normalisation (LN) is the process of correcting each word in a dataset to its canonical form so that it may be more easily and more accurately analysed. Most lexical normalisation systems operate at the character-level, while word-level models are seldom used. Recent language models offer solutions to the drawbacks of word-level LN models, yet, to the best of our knowledge, no research has investigated their effectiveness on LN. In this paper we introduce a word-level GRU-based LN model and investigate the effectiveness of recent embedding techniques on word-level LN. Our results show that our GRU-based word-level model produces greater results than character-level models, and outperforms existing deep-learning based LN techniques on Twitter data. We also find that randomly-initialised embeddings are capable of outperforming pre-trained embedding models in certain scenarios. Finally, we release a substantial lexical normalisation dataset to the community.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1911.06172 [cs.CL]
	(or arXiv:1911.06172v1 [cs.CL] for this version)

Submission history

From: Michael Stewart [view email]
[v1] Wed, 13 Nov 2019 14:42:55 GMT (3235kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1911.06172

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Word-level Lexical Normalisation using Context-Dependent Embeddings

Submission history