We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Fine-tuning Neural-Operator architectures for training and generalization

Abstract: This work provides a comprehensive analysis of the generalization properties of Neural Operators (NOs) and their derived architectures. Through empirical evaluation of the test loss, analysis of the complexity-based generalization bounds, and qualitative assessments of the visualization of the loss landscape, we investigate modifications aimed at enhancing the generalization capabilities of NOs. Inspired by the success of Transformers, we propose ${\textit{s}}{\text{NO}}+\varepsilon$, which introduces a kernel integral operator in lieu of self-Attention. Our results reveal significantly improved performance across datasets and initializations, accompanied by qualitative changes in the visualization of the loss landscape. We conjecture that the layout of Transformers enables the optimization algorithm to find better minima, and stochastic depth, improve the generalization performance. As a rigorous analysis of training dynamics is one of the most prominent unsolved problems in deep learning, our exclusive focus is on the analysis of the complexity-based generalization of the architectures. Building on statistical theory, and in particular Dudley theorem, we derive upper bounds on the Rademacher complexity of NOs, and ${\textit{s}}{\text{NO}}+\varepsilon$. For the latter, our bounds do not rely on norm control of parameters. This makes it applicable to networks of any depth, as long as the random variables in the architecture follow a decay law, which connects stochastic depth with generalization, as we have conjectured. In contrast, the bounds in NOs, solely rely on norm control of the parameters, and exhibit an exponential dependence on depth. Furthermore, our experiments also demonstrate that our proposed network exhibits remarkable generalization capabilities when subjected to perturbations in the data distribution. In contrast, NO perform poorly in out-of-distribution scenarios.
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Cite as: arXiv:2301.11509 [cs.LG]
  (or arXiv:2301.11509v2 [cs.LG] for this version)

Submission history

From: Jose Antonio Lara Benitez [view email]
[v1] Fri, 27 Jan 2023 03:02:12 GMT (18892kb,D)
[v2] Wed, 19 Apr 2023 03:06:03 GMT (18257kb,D)
[v3] Tue, 4 Jul 2023 22:42:47 GMT (44122kb,D)

Link back to: arXiv, form interface, contact.