We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Databases

Title: A DNF Blocking Scheme Learner for Heterogeneous Datasets

Abstract: Entity Resolution concerns identifying co-referent entity pairs across datasets. A typical workflow comprises two steps. In the first step, a blocking method uses a one-many function called a blocking scheme to map entities to blocks. In the second step, entities sharing a block are paired and compared. Current DNF blocking scheme learners (DNF-BSLs) apply only to structurally homogeneous tables. We present an unsupervised algorithmic pipeline for learning DNF blocking schemes on RDF graph datasets, as well as structurally heterogeneous tables. Previous DNF-BSLs are admitted as special cases. We evaluate the pipeline on six real-world dataset pairs. Unsupervised results are shown to be competitive with supervised and semi-supervised baselines. To the best of our knowledge, this is the first unsupervised DNF-BSL that admits RDF graphs and structurally heterogeneous tables as inputs.
Subjects: Databases (cs.DB)
Cite as: arXiv:1501.01694 [cs.DB]
  (or arXiv:1501.01694v1 [cs.DB] for this version)

Submission history

From: Mayank Kejriwal [view email]
[v1] Thu, 8 Jan 2015 00:37:09 GMT (806kb,D)

Link back to: arXiv, form interface, contact.