We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Machine Learning

Title: Better scalability under potentially heavy-tailed gradients

Abstract: We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when the gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we choose a candidate which does not diverge too far from the majority of cheap stochastic sub-processes run for a single pass over partitioned data. In addition to formal guarantees, we also provide empirical analysis of robustness to perturbations to experimental conditions, under both sub-Gaussian and heavy-tailed data. The result is a procedure that is simple to implement, trivial to parallelize, which keeps the formal strength of RGD methods but scales much better to large learning problems.
Comments: This paper has been superseded by arXiv:2012.07346 (a merge and extension of this article and arXiv:2006.01364)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:2006.00784 [stat.ML]
  (or arXiv:2006.00784v2 [stat.ML] for this version)

Submission history

From: Matthew J. Holland [view email]
[v1] Mon, 1 Jun 2020 08:16:56 GMT (188kb,D)
[v2] Tue, 15 Dec 2020 04:45:58 GMT (0kb,I)

Link back to: arXiv, form interface, contact.