We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: Bayesian Estimation of Bipartite Matchings for Record Linkage

Abstract: The bipartite record linkage task consists of merging two disparate datafiles containing information on two overlapping sets of entities. This is non-trivial in the absence of unique identifiers and it is important for a wide variety of applications given that it needs to be solved whenever we have to combine information from different sources. Most statistical techniques currently used for record linkage are derived from a seminal paper by Fellegi and Sunter (1969). These techniques usually assume independence in the matching statuses of record pairs to derive estimation procedures and optimal point estimators. We argue that this independence assumption is unreasonable and instead target a bipartite matching between the two datafiles as our parameter of interest. Bayesian implementations allow us to quantify uncertainty on the matching decisions and derive a variety of point estimators using different loss functions. We propose partial Bayes estimates that allow uncertain parts of the bipartite matching to be left unresolved. We evaluate our approach to record linkage using a variety of challenging scenarios and show that it outperforms the traditional methodology. We illustrate the advantages of our methods merging two datafiles on casualties from the civil war of El Salvador.
Comments: This is a preprint of an article accepted for publication in the Journal of the American Statistical Association. The final version contains more materials and is organized differently
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
Cite as: arXiv:1601.06630 [stat.ME]
  (or arXiv:1601.06630v1 [stat.ME] for this version)

Submission history

From: Mauricio Sadinle [view email]
[v1] Mon, 25 Jan 2016 14:58:41 GMT (398kb,D)

Link back to: arXiv, form interface, contact.