We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction

Abstract: We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance.
Comments: 24 pages
Subjects: Machine Learning (cs.LG)
Cite as: arXiv:2107.14432 [cs.LG]
  (or arXiv:2107.14432v1 [cs.LG] for this version)

Submission history

From: Yun Yue [view email]
[v1] Fri, 30 Jul 2021 05:33:43 GMT (281kb,D)
[v2] Wed, 22 Sep 2021 07:44:21 GMT (281kb,D)

Link back to: arXiv, form interface, contact.