Online AUC Optimization for Sparse High-Dimensional Datasets

Zhou, Baojian; Ying, Yiming; Skiena, Steven

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2009

Computer Science > Machine Learning

Title: Online AUC Optimization for Sparse High-Dimensional Datasets

Authors: Baojian Zhou, Yiming Ying, Steven Skiena

(Submitted on 23 Sep 2020)

Abstract: The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant. In such cases, each $d$ dimensional sample has only $k$ non-zero features with $k \ll d$, and data arrives sequentially in a streaming form. Current online AUC optimization algorithms have high per-iteration cost $\mathcal{O}(d)$ and usually produce non-sparse solutions in general, and hence are not suitable for handling the data challenge mentioned above.
In this paper, we aim to directly optimize the AUC score for high-dimensional sparse datasets under online learning setting and propose a new algorithm, \textsc{FTRL-AUC}. Our proposed algorithm can process data in an online fashion with a much cheaper per-iteration cost $\mathcal{O}(k)$, making it amenable for high-dimensional sparse streaming data analysis. Our new algorithmic design critically depends on a novel reformulation of the U-statistics AUC objective function as the empirical saddle point reformulation, and the innovative introduction of the "lazy update" rule so that the per-iteration complexity is dramatically reduced from $\mathcal{O}(d)$ to $\mathcal{O}(k)$. Furthermore, \textsc{FTRL-AUC} can inherently capture sparsity more effectively by applying a generalized Follow-The-Regularized-Leader (FTRL) framework.
Experiments on real-world datasets demonstrate that \textsc{FTRL-AUC} significantly improves both run time and model sparsity while achieving competitive AUC scores compared with the state-of-the-art methods. Comparison with the online learning method for logistic loss demonstrates that \textsc{FTRL-AUC} achieves higher AUC scores especially when datasets are imbalanced.

Comments:	20th IEEE International Conference on Data Mining
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2009.10867 [cs.LG]
	(or arXiv:2009.10867v1 [cs.LG] for this version)

Submission history

From: Baojian Zhou [view email]
[v1] Wed, 23 Sep 2020 00:50:01 GMT (7738kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2009.10867

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Machine Learning

Title: Online AUC Optimization for Sparse High-Dimensional Datasets

Submission history