Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Roosta-Khorasani, Farbod; Mahoney, Michael W.

Full-text links:

Download:

Current browse context:

math.OC

< prev | next >

new | recent | 1601

Mathematics > Optimization and Control

Title: Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Authors: Farbod Roosta-Khorasani, Michael W. Mahoney

(Submitted on 18 Jan 2016 (v1), last revised 26 Feb 2016 (this version, v3))

Abstract: Large scale optimization problems are ubiquitous in machine learning and data analysis and there is a plethora of algorithms for solving such problems. Many of these algorithms employ sub-sampling, as a way to either speed up the computations and/or to implicitly implement a form of statistical regularization. In this paper, we consider second-order iterative optimization algorithms and we provide bounds on the convergence of the variants of Newton's method that incorporate uniform sub-sampling as a means to estimate the gradient and/or Hessian. Our bounds are non-asymptotic and quantitative. Our algorithms are global and are guaranteed to converge from any initial iterate.
Using random matrix concentration inequalities, one can sub-sample the Hessian to preserve the curvature information. Our first algorithm incorporates Hessian sub-sampling while using the full gradient. We also give additional convergence results for when the sub-sampled Hessian is regularized by modifying its spectrum or ridge-type regularization. Next, in addition to Hessian sub-sampling, we also consider sub-sampling the gradient as a way to further reduce the computational complexity per iteration. We use approximate matrix multiplication results from randomized numerical linear algebra to obtain the proper sampling strategy. In all these algorithms, computing the update boils down to solving a large scale linear system, which can be computationally expensive. As a remedy, for all of our algorithms, we also give global convergence results for the case of inexact updates where such linear system is solved only approximately.
This paper has a more advanced companion paper, [42], in which we demonstrate that, by doing a finer-grained analysis, we can get problem-independent bounds for local convergence of these algorithms and explore trade-offs to improve upon the basic results of the present paper.

Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1601.04737 [math.OC]
	(or arXiv:1601.04737v3 [math.OC] for this version)

Submission history

From: Farbod Roosta-Khorasani [view email]
[v1] Mon, 18 Jan 2016 21:59:21 GMT (33kb)
[v2] Wed, 27 Jan 2016 01:16:32 GMT (145kb,D)
[v3] Fri, 26 Feb 2016 04:04:24 GMT (145kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> math > arXiv:1601.04737

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Mathematics > Optimization and Control

Title: Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Submission history