We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Statistics Theory

Title: Learning with tree tensor networks: complexity estimates and model selection

Abstract: Tree tensor networks, or tree-based tensor formats, are prominent model classes for the approximation of high-dimensional functions in computational and data science. They correspond to sum-product neural networks with a sparse connectivity associated with a dimension tree and widths given by a tuple of tensor ranks. The approximation power of these models has been proved to be (near to) optimal for classical smoothness classes. However, in an empirical risk minimization framework with a limited number of observations, the dimension tree and ranks should be selected carefully to balance estimation and approximation errors. We propose and analyze a complexity-based model selection method for tree tensor networks in an empirical risk minimization framework and we analyze its performance over a wide range of smoothness classes. Given a family of model classes associated with different trees, ranks, tensor product feature spaces and sparsity patterns for sparse tensor networks, a model is selected (\`a la Barron, Birg\'e, Massart) by minimizing a penalized empirical risk, with a penalty depending on the complexity of the model class and derived from estimates of the metric entropy of tree tensor networks. This choice of penalty yields a risk bound for the selected predictor. In a least-squares setting, after deriving fast rates of convergence of the risk, we show that our strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces (with isotropic, anisotropic or mixed dominating smoothness) and analytic functions. We discuss the role of sparsity of the tensor network for obtaining optimal performance in several regimes. In practice, the amplitude of the penalty is calibrated with a slope heuristics method. Numerical experiments in a least-squares regression setting illustrate the performance of the strategy.
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
Journal reference: Bernoulli 28(2): 910-936 (May 2022)
DOI: 10.3150/21-BEJ1371
Cite as: arXiv:2007.01165 [math.ST]
  (or arXiv:2007.01165v3 [math.ST] for this version)

Submission history

From: Anthony Nouy [view email]
[v1] Thu, 2 Jul 2020 14:52:08 GMT (1494kb)
[v2] Thu, 11 Mar 2021 15:59:26 GMT (1510kb)
[v3] Wed, 19 May 2021 10:15:30 GMT (1510kb)

Link back to: arXiv, form interface, contact.