We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization

Abstract: Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as: arXiv:2212.00700 [cs.LG]
  (or arXiv:2212.00700v3 [cs.LG] for this version)

Submission history

From: Jiahui Cheng [view email]
[v1] Thu, 1 Dec 2022 18:06:35 GMT (1277kb,D)
[v2] Mon, 5 Dec 2022 20:21:29 GMT (1276kb,D)
[v3] Thu, 8 Dec 2022 02:04:44 GMT (1228kb,D)

Link back to: arXiv, form interface, contact.