We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Coordinate descent on the orthogonal group for recurrent neural network training

Abstract: We propose to use stochastic Riemannian coordinate descent on the orthogonal group for recurrent neural network training. The algorithm rotates successively two columns of the recurrent matrix, an operation that can be efficiently implemented as a multiplication by a Givens matrix. In the case when the coordinate is selected uniformly at random at each iteration, we prove the convergence of the proposed algorithm under standard assumptions on the loss function, stepsize and minibatch noise. In addition, we numerically demonstrate that the Riemannian gradient in recurrent neural network training has an approximately sparse structure. Leveraging this observation, we propose a faster variant of the proposed algorithm that relies on the Gauss-Southwell rule. Experiments on a benchmark recurrent neural network training problem are presented to demonstrate the effectiveness of the proposed algorithm.
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as: arXiv:2108.00051 [cs.LG]
  (or arXiv:2108.00051v1 [cs.LG] for this version)

Submission history

From: Estelle Massart [view email]
[v1] Fri, 30 Jul 2021 19:27:11 GMT (323kb,D)

Link back to: arXiv, form interface, contact.