We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Databases

Title: Crowdsourced Collective Entity Resolution with Relational Match Propagation

Abstract: Knowledge bases (KBs) store rich yet heterogeneous entities and facts. Entity resolution (ER) aims to identify entities in KBs which refer to the same real-world object. Recent studies have shown significant benefits of involving humans in the loop of ER. They often resolve entities with pairwise similarity measures over attribute values and resort to the crowds to label uncertain ones. However, existing methods still suffer from high labor costs and insufficient labeling to some extent. In this paper, we propose a novel approach called crowdsourced collective ER, which leverages the relationships between entities to infer matches jointly rather than independently. Specifically, it iteratively asks human workers to label picked entity pairs and propagates the labeling information to their neighbors in distance. During this process, we address the problems of candidate entity pruning, probabilistic propagation, optimal question selection and error-tolerant truth inference. Our experiments on real-world datasets demonstrate that, compared with state-of-the-art methods, our approach achieves superior accuracy with much less labeling.
Comments: Accepted by the 36th IEEE International Conference on Data Engineering (ICDE 2020)
Subjects: Databases (cs.DB); Computation and Language (cs.CL)
Cite as: arXiv:2002.09361 [cs.DB]
  (or arXiv:2002.09361v1 [cs.DB] for this version)

Submission history

From: Wei Hu [view email]
[v1] Fri, 21 Feb 2020 15:33:53 GMT (412kb,D)

Link back to: arXiv, form interface, contact.