We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics

Authors: Preslav Nakov
Abstract: An important characteristic of English written text is the abundance of noun compounds - sequences of nouns acting as a single noun, e.g., colon cancer tumor suppressor protein. While eventually mastered by domain experts, their interpretation poses a major challenge for automated analysis. Understanding noun compounds' syntax and semantics is important for many natural language applications, including question answering, machine translation, information retrieval, and information extraction. I address the problem of noun compounds syntax by means of novel, highly accurate unsupervised and lightly supervised algorithms using the Web as a corpus and search engines as interfaces to that corpus. Traditionally the Web has been viewed as a source of page hit counts, used as an estimate for n-gram word frequencies. I extend this approach by introducing novel surface features and paraphrases, which yield state-of-the-art results for the task of noun compound bracketing. I also show how these kinds of features can be applied to other structural ambiguity problems, like prepositional phrase attachment and noun phrase coordination. I address noun compound semantics by automatically generating paraphrasing verbs and prepositions that make explicit the hidden semantic relations between the nouns in a noun compound. I also demonstrate how these paraphrasing verbs can be used to solve various relational similarity problems, and how paraphrasing noun compounds can improve machine translation.
Comments: noun compounds, paraphrasing verbs, semantic interpretation, syntax, multi-word expressions, MWEs, noun compound interpretation, noun compound bracketing, prepositional phrase attachment, noun phrase coordination, machine translation
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
MSC classes: 68T50
ACM classes: I.2.7
Journal reference: PhD Thesis, University of California at Berkeley, 2007
Report number: Technical Report No. UCB/EECS-2007-173
Cite as: arXiv:1912.01113 [cs.CL]
  (or arXiv:1912.01113v1 [cs.CL] for this version)

Submission history

From: Preslav Nakov [view email]
[v1] Sat, 23 Nov 2019 21:33:31 GMT (2821kb)

Link back to: arXiv, form interface, contact.