Current browse context:
stat.ML
Change to browse by:
References & Citations
Statistics > Machine Learning
Title: Stratified cross-validation for unbiased and privacy-preserving federated learning
(Submitted on 22 Jan 2020 (v1), last revised 23 Jan 2020 (this version, v2))
Abstract: Large-scale collections of electronic records constitute both an opportunity for the development of more accurate prediction models and a threat for privacy. To limit privacy exposure new privacy-enhancing techniques are emerging such as federated learning which enables large-scale data analysis while avoiding the centralization of records in a unique database that would represent a critical point of failure. Although promising regarding privacy protection, federated learning prevents using some data-cleaning algorithms thus inducing new biases. In this work we focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances. We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings without relying on demanding deduplication algorithms.
Submission history
From: Romain Bey [view email][v1] Wed, 22 Jan 2020 15:49:34 GMT (627kb,D)
[v2] Thu, 23 Jan 2020 08:43:26 GMT (627kb,D)
Link back to: arXiv, form interface, contact.