We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Databases

Title: Imputation under Differential Privacy

Abstract: The literature on differential privacy almost invariably assumes that the data to be analyzed are fully observed. In most practical applications this is an unrealistic assumption. A popular strategy to address this problem is imputation, in which missing values are replaced by estimated values given the observed data. In this paper we evaluate various approaches to answering queries on an imputed dataset in a differentially private manner, as well as discuss trade-offs as to where along the pipeline privacy is considered. We show that if imputation is done without consideration to privacy, the sensitivity of certain queries can increase linearly with the number of incomplete records. On the other hand, for a general class of imputation strategies, these worst case scenarios can be greatly reduced by ensuring privacy already during the imputation stage. We use a simulated dataset to demonstrate these results across a number of imputation schemes (both private and non-private) and examine their impact on the utility of a private query on the data.
Comments: Preliminary version, to be expanded. 4 pages, 1 figure
Subjects: Databases (cs.DB)
Cite as: arXiv:2206.15063 [cs.DB]
  (or arXiv:2206.15063v2 [cs.DB] for this version)

Submission history

From: Keith Merrill [view email]
[v1] Thu, 30 Jun 2022 06:52:17 GMT (42kb,D)
[v2] Thu, 14 Jul 2022 14:32:07 GMT (42kb,D)

Link back to: arXiv, form interface, contact.