Learned Optimizers that Scale and Generalize

Wichrowska, Olga; Maheswaranathan, Niru; Hoffman, Matthew W.; Colmenarejo, Sergio Gomez; Denil, Misha; de Freitas, Nando; Sohl-Dickstein, Jascha

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1703

Computer Science > Machine Learning

Title: Learned Optimizers that Scale and Generalize

Authors: Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

(Submitted on 14 Mar 2017 (v1), revised 23 Jun 2017 (this version, v3), latest version 7 Sep 2017 (v4))

Abstract: Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a metatraining ensemble of small, diverse, optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its metatraining set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on.

Comments:	Final ICML paper after reviewer suggestions
Subjects:	Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Cite as:	arXiv:1703.04813 [cs.LG]
	(or arXiv:1703.04813v3 [cs.LG] for this version)

Submission history

From: Olga Wichrowska [view email]
[v1] Tue, 14 Mar 2017 23:05:54 GMT (612kb,D)
[v2] Mon, 8 May 2017 21:55:33 GMT (612kb,D)
[v3] Fri, 23 Jun 2017 22:22:38 GMT (1210kb,D)
[v4] Thu, 7 Sep 2017 23:38:09 GMT (1210kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1703.04813v3

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Learned Optimizers that Scale and Generalize

Submission history