References & Citations
Computer Science > Machine Learning
Title: On the Lipschitz Constant of Deep Networks and Double Descent
(Submitted on 28 Jan 2023 (this version), latest version 27 Apr 2023 (v3))
Abstract: Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.
Submission history
From: Matteo Gamba [view email][v1] Sat, 28 Jan 2023 23:22:49 GMT (1812kb,D)
[v2] Thu, 16 Feb 2023 03:32:37 GMT (1906kb,D)
[v3] Thu, 27 Apr 2023 13:39:51 GMT (31837kb,D)
Link back to: arXiv, form interface, contact.