We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Abstract: Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks, with little compromise in computational or memory efficiency. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at \url{this https URL}.
Comments: 23 pages, 9 figures
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
MSC classes: 68T07
ACM classes: I.2
Cite as: arXiv:2006.06919 [cs.LG]
  (or arXiv:2006.06919v1 [cs.LG] for this version)

Submission history

From: Bao Wang [view email]
[v1] Fri, 12 Jun 2020 03:02:29 GMT (3315kb,D)
[v2] Sun, 11 Oct 2020 17:19:06 GMT (3599kb,D)

Link back to: arXiv, form interface, contact.