Block-diagonal Hessian-free Optimization for Training Neural Networks

Zhang, Huishuai; Xiong, Caiming; Bradbury, James; Socher, Richard

Full-text links:

Download:

Computer Science > Machine Learning

Title: Block-diagonal Hessian-free Optimization for Training Neural Networks

Authors: Huishuai Zhang, Caiming Xiong, James Bradbury, Richard Socher

(Submitted on 20 Dec 2017)

Abstract: Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of the Hessian-free method that leverages a block-diagonal approximation of the generalized Gauss-Newton matrix. Our method computes the curvature approximation matrix only for pairs of parameters from the same layer or block of the neural network and performs conjugate gradient updates independently for each block. Experiments on deep autoencoders, deep convolutional networks, and multilayer LSTMs demonstrate better convergence and generalization compared to the original Hessian-free approach and the Adam method.

Comments:	10 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1712.07296 [cs.LG]
	(or arXiv:1712.07296v1 [cs.LG] for this version)

Submission history

From: Caiming Xiong Mr [view email]
[v1] Wed, 20 Dec 2017 02:52:35 GMT (3885kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1712.07296

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Block-diagonal Hessian-free Optimization for Training Neural Networks

Submission history