Representational aspects of depth and conditioning in normalizing flows

Koehler, Frederic; Mehta, Viraj; Risteski, Andrej

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2010

Computer Science > Machine Learning

Title: Representational aspects of depth and conditioning in normalizing flows

Authors: Frederic Koehler, Viraj Mehta, Andrej Risteski

(Submitted on 2 Oct 2020 (v1), last revised 25 Jun 2021 (this version, v2))

Abstract: Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: models which produce good samples typically need to be extremely deep -- which comes with accompanying vanishing/exploding gradient problems. A very related problem is that they are often poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular.
In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $\Theta(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.

Comments:	Appeared in ICML 2021
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2010.01155 [cs.LG]
	(or arXiv:2010.01155v2 [cs.LG] for this version)

Submission history

From: Viraj Mehta [view email]
[v1] Fri, 2 Oct 2020 18:15:45 GMT (1943kb,D)
[v2] Fri, 25 Jun 2021 23:48:35 GMT (3212kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.01155

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Representational aspects of depth and conditioning in normalizing flows

Submission history