Minnorm training: an algorithm for training over-parameterized deep neural networks

Bansal, Yamini; Advani, Madhu; Cox, David D; Saxe, Andrew M

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1806

Statistics > Machine Learning

Title: Minnorm training: an algorithm for training over-parameterized deep neural networks

Authors: Yamini Bansal, Madhu Advani, David D Cox, Andrew M Saxe

(Submitted on 3 Jun 2018 (v1), last revised 21 Jun 2018 (this version, v2))

Abstract: In this work, we propose a new training method for finding minimum weight norm solutions in over-parameterized neural networks (NNs). This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting training data. It draws inspiration from support vector machines (SVMs), which are able to generalize well, despite often having an infinite number of free parameters in their primal form, and from recent theoretical generalization bounds on NNs which suggest that lower norm solutions generalize better. To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector'-like examples. The method can be implemented as a wrapper around gradient based methods and uses standard back-propagation of gradients from the NN for both regression and classification versions of the algorithm. We provide theoretical justifications for the effectiveness of this algorithm in comparison to early stopping and $L_2$-regularization using simple, analytically tractable settings. In particular, we show faster convergence to the max-margin hyperplane in a shallow network (compared to vanilla gradient descent); faster convergence to the minimum-norm solution in a linear chain (compared to $L_2$-regularization); and initialization-independent generalization performance in a deep linear network. Finally, using the MNIST dataset, we demonstrate that this algorithm can boost test accuracy and identify difficult examples in real-world datasets.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:1806.00730 [stat.ML]
	(or arXiv:1806.00730v2 [stat.ML] for this version)

Submission history

From: Yamini Bansal [view email]
[v1] Sun, 3 Jun 2018 02:33:01 GMT (4714kb,D)
[v2] Thu, 21 Jun 2018 15:26:07 GMT (4714kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1806.00730

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Minnorm training: an algorithm for training over-parameterized deep neural networks

Submission history