Statistics Theory
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 30 Oct 20
 [1] arXiv:2010.15351 [pdf, other]

Title: Nonparametric estimation of copulas and copula densities by orthogonal projectionsComments: 42 pages, 6 figures, 9 tablesSubjects: Statistics Theory (math.ST); Applications (stat.AP); Methodology (stat.ME)
In this paper we study nonparametric estimators of copulas and copula densities. We first focus our study on a density copula estimator based on a polynomial orthogonal projection of the joint density. A new copula estimator is then deduced. Its asymptotic properties are studied: we provide a large functional class for which this construction is optimal in the minimax and maxiset sense and we propose a method selection for the smoothing parameter. An intensive simulation study shows the very good performance of both copulas and copula densities estimators which we compare to a large panel of competitors. A real dataset in actuarial science illustrates this approach.
 [2] arXiv:2010.15515 [pdf, ps, other]

Title: Staged trees are curved exponential familiesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Staged tree models are a discrete generalization of Bayesian networks. We show that these form curved exponential families and derive their natural parameters, sufficient statistic, and cumulantgenerating function as functions of their graphical representation. We give necessary graphical criteria for classifying regular subfamilies and discuss implications for model selection.
 [3] arXiv:2010.15658 [pdf, other]

Title: Generalization bounds for deep thresholding networksComments: 19 pages, 4 figuresSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
We consider compressive sensing in the scenario where the sparsity basis (dictionary) is not known in advance, but needs to be learned from examples. Motivated by the wellknown iterative soft thresholding algorithm for the reconstruction, we define deep networks parametrized by the dictionary, which we call deep thresholding networks. Based on training samples, we aim at learning the optimal sparsifying dictionary and thereby the optimal network that reconstructs signals from their lowdimensional linear measurements. The dictionary learning is performed via minimizing the empirical risk. We derive generalization bounds by analyzing the Rademacher complexity of hypothesis classes consisting of such deep networks. We obtain estimates of the sample complexity that depend only linearly on the dimensions and on the depth.
 [4] arXiv:2010.15659 [pdf, ps, other]

Title: Postselection inference with HSICLassoSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
Detecting influential features in complex (nonlinear and/or highdimensional) datasets is key for extracting the relevant information. Most of the popular selection procedures, however, require assumptions on the underlying data  such as distributional ones , which barely agree with empirical observations. Therefore, feature selection based on nonlinear methods, such as the modelfree HSICLasso, is a more relevant approach. In order to ensure valid inference among the chosen features, the selection procedure must be accounted for. In this paper, we propose selective inference with HSICLasso using the framework of truncated Gaussians together with the polyhedral lemma. Based on these theoretical foundations, we develop an algorithm allowing for low computational costs and the treatment of the hyperparameter selection issue. The relevance of our method is illustrated using artificial and realworld datasets. In particular, our empirical findings emphasise that typeI error control at the considered level can be achieved.
Crosslists for Fri, 30 Oct 20
 [5] arXiv:2010.15530 (crosslist from eess.SY) [pdf, other]

Title: Probabilistic interval predictor based on dissimilarity functionsComments: 8 pages, 4 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Systems and Control (eess.SY); Statistics Theory (math.ST)
This work presents a new method to obtain probabilistic interval predictions of a dynamical system. The method uses stored past system measurements to estimate the future evolution of the system. The proposed method relies on the use of dissimilarity functions to estimate the conditional probability density function of the outputs. A family of empirical probability density functions, parameterized by means of two parameters, is introduced. It is shown that the the proposed family encompasses the multivariable normal probability density function as a particular case. We show that the proposed method constitutes a generalization of classical estimation methods. A crossvalidation scheme is used to tune the two parameters on which the methodology relies. In order to prove the effectiveness of the methodology presented, some numerical examples and comparisons are provided.
 [6] arXiv:2010.15539 (crosslist from math.PR) [pdf, other]

Title: Rates of convergence for Gibbs sampling in the analysis of almost exchangeable dataSubjects: Probability (math.PR); Statistics Theory (math.ST)
Motivated by de Finetti's representation theorem for partially exchangeable arrays, we want to sample $\mathbf p \in [0,1]^d$ from a distribution with density proportional to $\exp(A^2\sum_{i<j}c_{ij}(p_ip_j)^2)$. We are particularly interested in the case of an almost exchangeable array ($A$ large).
We analyze the rate of convergence of a coordinate Gibbs sampler used to simulate from these measures. We show that for every fixed matrix $C=(c_{ij})$, and large enough $A$, mixing happens in $\Theta(A^2)$ steps in a suitable Wasserstein distance. The upper and lower bounds are explicit and depend on the matrix $C$ through few relevant spectral parameters.  [7] arXiv:2010.15690 (crosslist from cs.LG) [pdf, other]

Title: Analyzing the treelayer structure of Deep ForestsSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Random forests on the one hand, and neural networks on the other hand, have met great success in the machine learning community for their predictive performance. Combinations of both have been proposed in the literature, notably leading to the socalled deep forests (DF) [25]. In this paper, we investigate the mechanisms at work in DF and outline that DF architecture can generally be simplified into more simple and computationally efficient shallow forests networks. Despite some instability, the latter may outperform standard predictive treebased methods. In order to precisely quantify the improvement achieved by these light network configurations over standard tree learners, we theoretically study the performance of a shallow tree network made of two layers, each one composed of a single centered tree. We provide tight theoretical lower and upper bounds on its excess risk. These theoretical results show the interest of treenetwork architectures for wellstructured data provided that the first layer, acting as a data encoder, is rich enough.
 [8] arXiv:2010.15764 (crosslist from stat.ML) [pdf, other]

Title: Domain adaptation under structural causal modelsComments: 75 pages, 19 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Domain adaptation (DA) arises as an important problem in statistical machine learning when the source data used to train a model is different from the target data used to test the model. Recent advances in DA have mainly been applicationdriven and have largely relied on the idea of a common subspace for source and target data. To understand the empirical successes and failures of DA methods, we propose a theoretical framework via structural causal models that enables analysis and comparison of the prediction performance of DA methods. This framework also allows us to itemize the assumptions needed for the DA methods to have a low target error. Additionally, with insights from our theory, we propose a new DA method called CIRM that outperforms existing DA methods when both the covariates and label distributions are perturbed in the target data. We complement the theoretical analysis with extensive simulations to show the necessity of the devised assumptions. Reproducible synthetic and real data experiments are also provided to illustrate the strengths and weaknesses of DA methods when parts of the assumptions of our theory are violated.
 [9] arXiv:2010.15817 (crosslist from stat.ME) [pdf, other]

Title: Groupregularized ridge regression via empirical Bayes noise level crossvalidationSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
Features in predictive models are not exchangeable, yet common supervised models treat them as such. Here we study ridge regression when the analyst can partition the features into $K$ groups based on external sideinformation. For example, in highthroughput biology, features may represent gene expression, protein abundance or clinical data and so each feature group represents a distinct modality. The analyst's goal is to choose optimal regularization parameters $\lambda = (\lambda_1, \dotsc, \lambda_K)$  one for each group. In this work, we study the impact of $\lambda$ on the predictive risk of groupregularized ridge regression by deriving limiting risk formulae under a highdimensional random effects model with $p\asymp n$ as $n \to \infty$. Furthermore, we propose a datadriven method for choosing $\lambda$ that attains the optimal asymptotic risk: The key idea is to interpret the residual noise variance $\sigma^2$, as a regularization parameter to be chosen through crossvalidation. An empirical Bayes construction maps the onedimensional parameter $\sigma$ to the $K$dimensional vector of regularization parameters, i.e., $\sigma \mapsto \widehat{\lambda}(\sigma)$. Beyond its theoretical optimality, the proposed method is practical and runs as fast as crossvalidated ridge regression without feature groups ($K=1$).
Replacements for Fri, 30 Oct 20
 [10] arXiv:1906.07514 (replaced) [pdf, other]

Title: Bayes Extended Estimators for Curved Exponential FamiliesSubjects: Statistics Theory (math.ST)
 [11] arXiv:1910.04267 (replaced) [pdf, ps, other]

Title: Subspace Estimation from Unbalanced and Incomplete Data Matrices: $\ell_{2,\infty}$ Statistical GuaranteesComments: Accepted to Annals of StatisticsSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [12] arXiv:2001.09602 (replaced) [pdf, ps, other]

Title: Bayesian Shrinkage Estimation of Negative Multinomial Parameter VectorsComments: 31 pages; the code for numerical computation of the hierarchical Bayes estimator in Section 4 has been corrected; Tables 2, 3, and 4 and the secondtothelast paragraph of Section 4 have been changedSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
 [13] arXiv:2007.11078 (replaced) [pdf, other]

Title: The Complete Lasso Tradeoff DiagramComments: To appear in the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
 [14] arXiv:1605.09124 (replaced) [pdf, ps, other]

Title: Minimax RateOptimal Estimation of Divergences between Discrete DistributionsComments: This version has been significantly revisedSubjects: Information Theory (cs.IT); Statistics Theory (math.ST)
 [15] arXiv:2006.08172 (replaced) [pdf, other]

Title: Faster Wasserstein Distance Estimation with the Sinkhorn DivergenceAuthors: Lenaic Chizat (LMO), Pierre Roussillon (DMA), Flavien Léger (DMA), FrançoisXavier Vialard (Univ Gustave Eiffel), Gabriel Peyré (DMA)Journalref: Neural Information Processing Systems, Dec 2020, Vancouver, CanadaSubjects: Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [16] arXiv:2010.10436 (replaced) [pdf, other]

Title: VarGrad: A LowVariance Gradient Estimator for Variational InferenceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, math, recent, 2010, contact, help (Access key information)