Current browse context:
math.OC
Change to browse by:
References & Citations
Mathematics > Optimization and Control
Title: A Multi-Batch L-BFGS Method for Machine Learning
(Submitted on 19 May 2016 (this version), latest version 23 Oct 2016 (v2))
Abstract: The question of how to parallelize the stochastic gradient descent (SGD) method has received much attention in the literature. In this paper, we focus instead on batch methods that use a sizeable fraction of the training set at each iteration to facilitate parallelism, and that employ second-order information. In order to improve the learning process, we follow a multi-batch approach in which the batch changes at each iteration. This inherently gives the algorithm a stochastic flavor that can cause instability in L-BFGS, a popular batch method in machine learning. These difficulties arise because L-BFGS employs gradient differences to update the Hessian approximations; when these gradients are computed using different data points the process can be unstable. This paper shows how to perform stable quasi-Newton updating in the multi-batch setting, illustrates the behavior of the algorithm in a distributed computing platform, and studies its convergence properties for both the convex and nonconvex cases.
Submission history
From: Albert Berahas [view email][v1] Thu, 19 May 2016 16:53:50 GMT (1382kb,D)
[v2] Sun, 23 Oct 2016 22:48:01 GMT (1390kb,D)
Link back to: arXiv, form interface, contact.