We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics Theory

New submissions

[ total of 12 entries: 1-12 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 15 Jan 21

[1]  arXiv:2101.05380 [pdf, ps, other]
Title: Breaking the curse: a dimension free computational upper-bound for smooth optimal transport estimation
Comments: 20 pages; Comments welcome
Subjects: Statistics Theory (math.ST); Optimization and Control (math.OC)

It is well-known that plug-in statistical estimation of optimal transport suffers from the curse of dimension. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexities of these recently proposed methods still degrade exponentially with the dimension. In this paper, thanks to a representation theorem, we derive a statistical estimator of smooth optimal transport which achieves in average a precision $\epsilon$ for a computational cost of $\tilde{\mathcal{O}}(\epsilon^{-2\gamma})$, where $\gamma$ is the complexity of a semidefinite program mixed with a second order cone program, hence yielding a dimension free rate. Even though our result is theoretical in nature due to the large constants involved in our estimation, it settles the question of whether the smoothness of optimal solutions can be taken advantage of from a computational and statistical point of view.

[2]  arXiv:2101.05402 [pdf, other]
Title: Optimal Clustering in Anisotropic Gaussian Mixture Models
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the clustering task under anisotropic Gaussian Mixture Models where the covariance matrices from different clusters are unknown and are not necessarily the identical matrix. We characterize the dependence of signal-to-noise ratios on the cluster centers and covariance matrices and obtain the minimax lower bound for the clustering problem. In addition, we propose a computationally feasible procedure and prove it achieves the optimal rate within a few iterations. The proposed procedure is a hard EM type algorithm, and it can also be seen as a variant of the Lloyd's algorithm that is adjusted to the anisotropic covariance matrices.

[3]  arXiv:2101.05477 [pdf, other]
Title: Optimal network online change point localisation
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG)

We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying distributions change when a change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms. In this paper, on the detection delay, we establish a minimax lower bound and two upper bounds based on NP-hard algorithms and polynomial-time algorithms, i.e., \[ \mbox{detection delay} \begin{cases} \gtrsim \log(1/\alpha) \frac{\max\{r^2/n, \, 1\}}{\kappa_0^2 n \rho},\\ \lesssim \log(\Delta/\alpha) \frac{\max\{r^2/n, \, \log(r)\}}{\kappa_0^2 n \rho}, & \mbox{with NP-hard algorithms},\\ \lesssim \log(\Delta/\alpha) \frac{r}{\kappa_0^2 n \rho}, & \mbox{with polynomial-time algorithms}, \end{cases} \] where $\kappa_0, n, \rho, r$ and $\alpha$ are the normalised jump size, network size, entrywise sparsity, rank sparsity and the overall Type-I error upper bound. All the model parameters are allowed to vary as $\Delta$, the location of the change point, diverges. The polynomial-time algorithms are novel procedures that we propose in this paper, designed for quick detection under two different forms of Type-I error control. The first is based on controlling the overall probability of a false alarm when there are no change points, and the second is based on specifying a lower bound on the expected time of the first false alarm. Extensive experiments show that, under different scenarios and the aforementioned forms of Type-I error control, our proposed approaches outperform state-of-the-art methods.

[4]  arXiv:2101.05487 [pdf, other]
Title: Kernel-based ANOVA decomposition and Shapley effects -- Application to global sensitivity analysis
Subjects: Statistics Theory (math.ST)

Global sensitivity analysis is the main quantitative technique for identifying the most influential input variables in a numerical simulation model. In particular when the inputs are independent, Sobol' sensitivity indices attribute a portion of the output of interest variance to each input and all possible interactions in the model, thanks to a functional ANOVA decomposition. On the other hand, moment-independent sensitivity indices focus on the impact of input variables on the whole output distribution instead of the variance only, thus providing complementary insight on the inputs / output relationship. Unfortunately they do not enjoy the nice decomposition property of Sobol' indices and are consequently harder to analyze. In this paper, we introduce two moment-independent indices based on kernel-embeddings of probability distributions and show that the RKHS framework used for their definition makes it possible to exhibit a kernel-based ANOVA decomposition. This is the first time such a desirable property is proved for sensitivity indices apart from Sobol' ones. When the inputs are dependent, we also use these new sensitivity indices as building blocks to design kernel-embedding Shapley effects which generalize the traditional variance-based ones used in sensitivity analysis. Several estimation procedures are discussed and illustrated on test cases with various output types such as categorical variables and probability distributions. All these examples show their potential for enhancing traditional sensitivity analysis with a kernel point of view.

[5]  arXiv:2101.05654 [pdf, ps, other]
Title: Optimal designs for comparing regression curves -- dependence within and between groups
Comments: 28 pages
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

We consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous time model. It is demonstrated that in general simultaneous estimation using the data from both groups yields more precise results than estimation of the parameters separately in the two groups. Using the BLUE from simultaneous estimation, we then construct an efficient linear estimator for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator). Finally, the optimal design points are determined by minimizing the maximal width of a simultaneous confidence band for the difference of the two regression functions. The advantages of the new approach are illustrated by means of a simulation study, where it is shown that the use of the optimal designs yields substantially narrower confidence bands than the application of uniform designs.

[6]  arXiv:2101.05728 [pdf, other]
Title: New bounds for $k$-means and information $k$-means
Subjects: Statistics Theory (math.ST)

In this paper, we derive a new dimension-free non-asymptotic upper bound for the quadratic $k$-means excess risk related to the quantization of an i.i.d sample in a separable Hilbert space. We improve the bound of order $\mathcal{O} \bigl( k / \sqrt{n} \bigr)$ of Biau, Devroye and Lugosi, by establishing a bound of order $\mathcal{O} \bigl(\log(n/k) \sqrt{k \log(k) / n} \, \bigr)$ where $k$ is the number of centers and $n$ the sample size. This is essentially optimal up to logarithmic factors since a lower bound of order $\mathcal{O} \bigl( \sqrt{k^{1 - 4/d}/n} \bigr)$ is known in dimension $d$. Our technique of proof is based on the linearization of the $k$-means criterion through a kernel trick and on PAC-Bayesian inequalities. To get a $1 / \sqrt{n}$ speed, we introduce a new PAC-Bayesian chaining method replacing the concept of $\delta$-net with the perturbation of the parameter by an infinite dimensional Gaussian process.
In the meantime, we embed the usual $k$-means criterion into a broader family built upon the Kullback divergence and its underlying properties. This results in a new algorithm that we named information $k$-means, well suited to the clustering of bags of words. Based on considerations from information theory, we also introduce a new bounded $k$-means criterion that uses a scale parameter but satisfies a generalization bound that does not require any boundedness or even integrability conditions on the sample. We describe the counterpart of Lloyd's algorithm and prove generalization bounds for these new $k$-means criteria.

Cross-lists for Fri, 15 Jan 21

[7]  arXiv:2101.05780 (cross-list from math.PR) [pdf, other]
Title: Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion
Comments: 41 pages, 3 figures
Subjects: Probability (math.PR); Econometrics (econ.EM); Statistics Theory (math.ST)

In this article, we study bounds on the uniform distance between the cumulative distribution function of a standardized sum of independent centered random variables with moments of order four and its first-order Edgeworth expansion. Existing bounds are sharpened in two frameworks: when the variables are independent but not identically distributed and in the case of independent and identically distributed random variables. Improvements of these bounds are derived if the third moment of the distribution is zero. We also provide adapted versions of these bounds under additional regularity constraints on the tail behavior of the characteristic function. We finally present an application of our results to the lack of validity of one-sided tests based on the normal approximation of the mean for a fixed sample size.

Replacements for Fri, 15 Jan 21

[8]  arXiv:1708.00145 (replaced) [pdf, other]
Title: Semiparametric Efficiency in Convexity Constrained Single Index Model
Comments: Removed the density bounded away from zero assumption in assumption (A5). Weakened assumption (B2)
Subjects: Statistics Theory (math.ST); Computation (stat.CO); Methodology (stat.ME)
[9]  arXiv:2101.03801 (replaced) [pdf, other]
Title: Hidden Markov chains and fields with observations in Riemannian manifolds
Comments: accepted for publication at MTNS 2020
Subjects: Statistics Theory (math.ST)
[10]  arXiv:2101.04039 (replaced) [pdf, ps, other]
Title: From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical Approximation and Statistical Applications
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
[11]  arXiv:1702.02982 (replaced) [pdf, other]
Title: Fixing an error in Caponnetto and de Vito (2007)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[12]  arXiv:2005.08794 (replaced) [pdf, other]
Title: Inference on the History of a Randomly Growing Tree
Authors: Harry Crane, Min Xu
Comments: 36 pages; 7 figures; 5 tables
Subjects: Probability (math.PR); Statistics Theory (math.ST); Computation (stat.CO)
[ total of 12 entries: 1-12 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, math, recent, 2101, contact, help  (Access key information)