Entity Type Recognition using an Ensemble of Distributional Semantic Models to Enhance Query Understanding

Shalaby, Walid; Jadda, Khalifeh Al; Korayem, Mohammed; Grainger, Trey

doi:10.1109/COMPSAC.2016.109

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1604

Computer Science > Computation and Language

Title: Entity Type Recognition using an Ensemble of Distributional Semantic Models to Enhance Query Understanding

Authors: Walid Shalaby, Khalifeh Al Jadda, Mohammed Korayem, Trey Grainger

(Submitted on 4 Apr 2016)

Abstract: We present an ensemble approach for categorizing search query entities in the recruitment domain. Understanding the types of entities expressed in a search query (Company, Skill, Job Title, etc.) enables more intelligent information retrieval based upon those entities compared to a traditional keyword-based search. Because search queries are typically very short, leveraging a traditional bag-of-words model to identify entity types would be inappropriate due to the lack of contextual information. Our approach instead combines clues from different sources of varying complexity in order to collect real-world knowledge about query entities. We employ distributional semantic representations of query entities through two models: 1) contextual vectors generated from encyclopedic corpora like Wikipedia, and 2) high dimensional word embedding vectors generated from millions of job postings using word2vec. Additionally, our approach utilizes both entity linguistic properties obtained from WordNet and ontological properties extracted from DBpedia. We evaluate our approach on a data set created at CareerBuilder; the largest job board in the US. The data set contains entities extracted from millions of job seekers/recruiters search queries, job postings, and resume documents. After constructing the distributional vectors of search entities, we use supervised machine learning to infer search entity types. Empirical results show that our approach outperforms the state-of-the-art word2vec distributional semantics model trained on Wikipedia. Moreover, we achieve micro-averaged F 1 score of 97% using the proposed distributional representations ensemble.

Comments:	A short version of this paper has been accepted in "COMPSAC 2016: The 40th IEEE Computer Society International Conference on Computers, Software & Applications"
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
DOI:	10.1109/COMPSAC.2016.109
Cite as:	arXiv:1604.00933 [cs.CL]
	(or arXiv:1604.00933v1 [cs.CL] for this version)

Submission history

From: Walid Shalaby [view email]
[v1] Mon, 4 Apr 2016 16:18:44 GMT (497kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1604.00933

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Entity Type Recognition using an Ensemble of Distributional Semantic Models to Enhance Query Understanding

Submission history