Domain adaptation for part-of-speech tagging of noisy user-generated text

März, Luisa; Trautmann, Dietrich; Roth, Benjamin

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1905

Computer Science > Computation and Language

Title: Domain adaptation for part-of-speech tagging of noisy user-generated text

Authors: Luisa März, Dietrich Trautmann, Benjamin Roth

(Submitted on 21 May 2019)

Abstract: The performance of a Part-of-speech (POS) tagger is highly dependent on the domain ofthe processed text, and for many domains there is no or only very little training data available. This work addresses the problem of POS tagging noisy user-generated text using a neural network. We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little an-notations available. The neural network has two standard bidirectional LSTMs at its core. However, we find it crucial to also encode a set of task-specific features, and to obtain reliable (source-domain and target-domain) word representations. Experiments with different regularization techniques such as early stopping, dropout and fine-tuning the domain adaptation prior weights are conducted. Our best model uses external weights from the out-of-domain model, as well as feature embeddings, pre-trained word and sub-word embeddings and achieves a tagging accuracy of slightly over 90%, improving on the previous state of the art for this task.

Comments:	6 pages, NAACL 2019
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1905.08920 [cs.CL]
	(or arXiv:1905.08920v1 [cs.CL] for this version)

Submission history

From: Luisa März [view email]
[v1] Tue, 21 May 2019 10:33:06 GMT (318kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1905.08920

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Domain adaptation for part-of-speech tagging of noisy user-generated text

Submission history