We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: Maximum Entropy classification for record linkage

Abstract: By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in text mining to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is scalable and fully automatic, unlike the classical approach that generally requires clerical review to resolve the undecided cases.
Subjects: Methodology (stat.ME)
Cite as: arXiv:2009.14797 [stat.ME]
  (or arXiv:2009.14797v2 [stat.ME] for this version)

Submission history

From: Jae-Kwang Kim [view email]
[v1] Wed, 30 Sep 2020 17:11:28 GMT (166kb,D)
[v2] Thu, 11 Nov 2021 20:21:26 GMT (312kb,D)

Link back to: arXiv, form interface, contact.