SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Ramzi, Zaccharie; Mannel, Florian; Bai, Shaojie; Starck, Jean-Luc; Ciuciu, Philippe; Moreau, Thomas

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2106

Computer Science > Machine Learning

Title: SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Authors: Zaccharie Ramzi, Florian Mannel, Shaojie Bai, Jean-Luc Starck, Philippe Ciuciu, Thomas Moreau

(Submitted on 1 Jun 2021 (v1), last revised 10 Mar 2023 (this version, v4))

Abstract: In recent years, implicit deep learning has emerged as a method to increase the effective depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach and the recent Jacobian-Free method in different settings, ranging from hyperparameter optimization to large Multiscale DEQs (MDEQs) applied to CIFAR and ImageNet. Both methods reduce significantly the computational cost of the backward pass. While SHINE has a clear advantage on hyperparameter optimization problems, both methods attain similar computational performances for larger scale problems such as MDEQs at the cost of a limited performance drop compared to the original models.

Comments:	Accepted as a spotlight to ICLR 2022
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2106.00553 [cs.LG]
	(or arXiv:2106.00553v4 [cs.LG] for this version)

Submission history

From: Zaccharie Ramzi [view email]
[v1] Tue, 1 Jun 2021 15:07:34 GMT (2305kb,D)
[v2] Thu, 24 Jun 2021 13:32:51 GMT (2304kb,D)
[v3] Sun, 30 Jan 2022 18:05:10 GMT (2736kb,D)
[v4] Fri, 10 Mar 2023 11:19:37 GMT (2869kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.00553

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Submission history