Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Zhang, Ziming; Brand, Matthew

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1711

Statistics > Machine Learning

Title: Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Authors: Ziming Zhang, Matthew Brand

(Submitted on 20 Nov 2017)

Abstract: By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

Comments:	NIPS 2017
Subjects:	Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1711.07354 [stat.ML]
	(or arXiv:1711.07354v1 [stat.ML] for this version)

Submission history

From: Ziming Zhang [view email]
[v1] Mon, 20 Nov 2017 15:04:45 GMT (91kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1711.07354

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Submission history