Fine-tuning Neural-Operator architectures for training and generalization

Benitez, JA Lara; Furuya, Takashi; Faucher, Florian; Tricoche, Xavier; de Hoop, Maarten V.

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2301

Computer Science > Machine Learning

Title: Fine-tuning Neural-Operator architectures for training and generalization

Authors: JA Lara Benitez, Takashi Furuya, Florian Faucher, Xavier Tricoche, Maarten V. de Hoop

(Submitted on 27 Jan 2023 (v1), revised 19 Apr 2023 (this version, v2), latest version 4 Jul 2023 (v3))

Abstract: This work provides a comprehensive analysis of the generalization properties of Neural Operators (NOs) and their derived architectures. Through empirical evaluation of the test loss, analysis of the complexity-based generalization bounds, and qualitative assessments of the visualization of the loss landscape, we investigate modifications aimed at enhancing the generalization capabilities of NOs. Inspired by the success of Transformers, we propose ${\textit{s}}{\text{NO}}+\varepsilon$, which introduces a kernel integral operator in lieu of self-Attention. Our results reveal significantly improved performance across datasets and initializations, accompanied by qualitative changes in the visualization of the loss landscape. We conjecture that the layout of Transformers enables the optimization algorithm to find better minima, and stochastic depth, improve the generalization performance. As a rigorous analysis of training dynamics is one of the most prominent unsolved problems in deep learning, our exclusive focus is on the analysis of the complexity-based generalization of the architectures. Building on statistical theory, and in particular Dudley theorem, we derive upper bounds on the Rademacher complexity of NOs, and ${\textit{s}}{\text{NO}}+\varepsilon$. For the latter, our bounds do not rely on norm control of parameters. This makes it applicable to networks of any depth, as long as the random variables in the architecture follow a decay law, which connects stochastic depth with generalization, as we have conjectured. In contrast, the bounds in NOs, solely rely on norm control of the parameters, and exhibit an exponential dependence on depth. Furthermore, our experiments also demonstrate that our proposed network exhibits remarkable generalization capabilities when subjected to perturbations in the data distribution. In contrast, NO perform poorly in out-of-distribution scenarios.

Subjects:	Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Cite as:	arXiv:2301.11509 [cs.LG]
	(or arXiv:2301.11509v2 [cs.LG] for this version)

Submission history

From: Jose Antonio Lara Benitez [view email]
[v1] Fri, 27 Jan 2023 03:02:12 GMT (18892kb,D)
[v2] Wed, 19 Apr 2023 03:06:03 GMT (18257kb,D)
[v3] Tue, 4 Jul 2023 22:42:47 GMT (44122kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.11509v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Fine-tuning Neural-Operator architectures for training and generalization

Submission history