We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Generating Electronic Health Records with Multiple Data Types and Constraints

Abstract: Sharing electronic health records (EHRs) on a large scale may lead to privacy intrusions. Recent research has shown that risks may be mitigated by simulating EHRs through generative adversarial network (GAN) frameworks. Yet the methods developed to date are limited because they 1) focus on generating data of a single type (e.g., diagnosis codes), neglecting other data types (e.g., demographics, procedures or vital signs) and 2) do not represent constraints between features. In this paper, we introduce a method to simulate EHRs composed of multiple data types by 1) refining the GAN model, 2) accounting for feature constraints, and 3) incorporating utility measures for such generation tasks. The findings over 770K EHRs from Vanderbilt University Medical Center demonstrate that our model achieved higher data utilities in retaining the basic statistics, interdimensional correlation, structural properties and frequent association rules from real data. Importantly, these were done without sacrificing privacy.
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
Cite as: arXiv:2003.07904 [cs.LG]
  (or arXiv:2003.07904v1 [cs.LG] for this version)

Submission history

From: Chao Yan [view email]
[v1] Tue, 17 Mar 2020 19:25:16 GMT (8025kb,D)
[v2] Mon, 23 Mar 2020 22:01:37 GMT (8025kb,D)

Link back to: arXiv, form interface, contact.