ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

Tang, Yu; Sun, Dequan; Qiao, Linbo; Xiao, Jingjing; Lai, Zhiquan; Li, Dongsheng

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2006

Computer Science > Machine Learning

Title: ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

Authors: Yu Tang, Dequan Sun, Linbo Qiao, Jingjing Xiao, Zhiquan Lai, Dongsheng Li

(Submitted on 10 Jun 2020 (this version), latest version 28 Mar 2022 (v3))

Abstract: It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulty in the training phase. With the gradient-free feature and immunity to poor conditions, the Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms. However, ADMM could not be applied to train RNN directly since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously and provides novel update rules and theoretical convergence analysis. We explicitly specify key update rules in the iterations of ADMMiRNN with deliberately constructed approximation techniques and solutions to each subproblem instead of vanilla ADMM. Numerical experiments are conducted on MNIST and text classification tasks, where ADMMiRNN achieves convergent results and outperforms compared baselines. Furthermore, ADMMiRNN trains RNN in a more stable way without gradient vanishing or exploding compared to the stochastic gradient algorithms. Source code has been available at this https URL

Comments:	17 pages, 11 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.05622 [cs.LG]
	(or arXiv:2006.05622v1 [cs.LG] for this version)

Submission history

From: Yu Tang [view email]
[v1] Wed, 10 Jun 2020 02:43:11 GMT (1869kb,D)
[v2] Wed, 17 Jun 2020 04:12:35 GMT (1966kb,D)
[v3] Mon, 28 Mar 2022 11:05:37 GMT (1981kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.05622v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

Submission history