We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Applications

Title: Bayesian Data Synthesis and Disclosure Risk Quantification: An Application to the Consumer Expenditure Surveys

Abstract: The release of synthetic data generated from a model estimated on the data helps statistical agencies disseminate respondent-level data with high utility and privacy protection. Motivated by the challenge of disseminating sensitive variables containing geographic information in the Consumer Expenditure Surveys (CE) at the U.S. Bureau of Labor Statistics, we propose two non-parametric Bayesian models as data synthesizers for the county identifier of each data record: a Bayesian latent class model and a Bayesian areal model. Both data synthesizers use Dirichlet Process priors to cluster observations of similar characteristics and allow borrowing information across observations. We develop innovative disclosure risks measures to quantify inherent risks in the confidential CE data and how those data risks are ameliorated by our proposed synthesizers. By creating a lower bound and an upper bound of disclosure risks under a minimum and a maximum disclosure risks scenarios respectively, our proposed inherent risks measures provide a range of acceptable disclosure risks for evaluating risks level in the synthetic datasets.
Subjects: Applications (stat.AP)
Cite as: arXiv:1809.10074 [stat.AP]
  (or arXiv:1809.10074v2 [stat.AP] for this version)

Submission history

From: Jingchen Hu [view email]
[v1] Wed, 26 Sep 2018 15:47:07 GMT (1247kb,D)
[v2] Tue, 2 Feb 2021 15:34:08 GMT (1788kb,D)

Link back to: arXiv, form interface, contact.