Statistics Theory
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 15 Jan 21
 [1] arXiv:2101.05380 [pdf, ps, other]

Title: Breaking the curse: a dimension free computational upperbound for smooth optimal transport estimationComments: 20 pages; Comments welcomeSubjects: Statistics Theory (math.ST); Optimization and Control (math.OC)
It is wellknown that plugin statistical estimation of optimal transport suffers from the curse of dimension. Despite recent efforts to improve the rate of estimation with the smoothness of the problem, the computational complexities of these recently proposed methods still degrade exponentially with the dimension. In this paper, thanks to a representation theorem, we derive a statistical estimator of smooth optimal transport which achieves in average a precision $\epsilon$ for a computational cost of $\tilde{\mathcal{O}}(\epsilon^{2\gamma})$, where $\gamma$ is the complexity of a semidefinite program mixed with a second order cone program, hence yielding a dimension free rate. Even though our result is theoretical in nature due to the large constants involved in our estimation, it settles the question of whether the smoothness of optimal solutions can be taken advantage of from a computational and statistical point of view.
 [2] arXiv:2101.05402 [pdf, other]

Title: Optimal Clustering in Anisotropic Gaussian Mixture ModelsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the clustering task under anisotropic Gaussian Mixture Models where the covariance matrices from different clusters are unknown and are not necessarily the identical matrix. We characterize the dependence of signaltonoise ratios on the cluster centers and covariance matrices and obtain the minimax lower bound for the clustering problem. In addition, we propose a computationally feasible procedure and prove it achieves the optimal rate within a few iterations. The proposed procedure is a hard EM type algorithm, and it can also be seen as a variant of the Lloyd's algorithm that is adjusted to the anisotropic covariance matrices.
 [3] arXiv:2101.05477 [pdf, other]

Title: Optimal network online change point localisationSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG)
We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying distributions change when a change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms. In this paper, on the detection delay, we establish a minimax lower bound and two upper bounds based on NPhard algorithms and polynomialtime algorithms, i.e., \[ \mbox{detection delay} \begin{cases} \gtrsim \log(1/\alpha) \frac{\max\{r^2/n, \, 1\}}{\kappa_0^2 n \rho},\\ \lesssim \log(\Delta/\alpha) \frac{\max\{r^2/n, \, \log(r)\}}{\kappa_0^2 n \rho}, & \mbox{with NPhard algorithms},\\ \lesssim \log(\Delta/\alpha) \frac{r}{\kappa_0^2 n \rho}, & \mbox{with polynomialtime algorithms}, \end{cases} \] where $\kappa_0, n, \rho, r$ and $\alpha$ are the normalised jump size, network size, entrywise sparsity, rank sparsity and the overall TypeI error upper bound. All the model parameters are allowed to vary as $\Delta$, the location of the change point, diverges. The polynomialtime algorithms are novel procedures that we propose in this paper, designed for quick detection under two different forms of TypeI error control. The first is based on controlling the overall probability of a false alarm when there are no change points, and the second is based on specifying a lower bound on the expected time of the first false alarm. Extensive experiments show that, under different scenarios and the aforementioned forms of TypeI error control, our proposed approaches outperform stateoftheart methods.
 [4] arXiv:2101.05487 [pdf, other]

Title: Kernelbased ANOVA decomposition and Shapley effects  Application to global sensitivity analysisAuthors: Sébastien da VeigaSubjects: Statistics Theory (math.ST)
Global sensitivity analysis is the main quantitative technique for identifying the most influential input variables in a numerical simulation model. In particular when the inputs are independent, Sobol' sensitivity indices attribute a portion of the output of interest variance to each input and all possible interactions in the model, thanks to a functional ANOVA decomposition. On the other hand, momentindependent sensitivity indices focus on the impact of input variables on the whole output distribution instead of the variance only, thus providing complementary insight on the inputs / output relationship. Unfortunately they do not enjoy the nice decomposition property of Sobol' indices and are consequently harder to analyze. In this paper, we introduce two momentindependent indices based on kernelembeddings of probability distributions and show that the RKHS framework used for their definition makes it possible to exhibit a kernelbased ANOVA decomposition. This is the first time such a desirable property is proved for sensitivity indices apart from Sobol' ones. When the inputs are dependent, we also use these new sensitivity indices as building blocks to design kernelembedding Shapley effects which generalize the traditional variancebased ones used in sensitivity analysis. Several estimation procedures are discussed and illustrated on test cases with various output types such as categorical variables and probability distributions. All these examples show their potential for enhancing traditional sensitivity analysis with a kernel point of view.
 [5] arXiv:2101.05654 [pdf, ps, other]

Title: Optimal designs for comparing regression curves  dependence within and between groupsComments: 28 pagesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous time model. It is demonstrated that in general simultaneous estimation using the data from both groups yields more precise results than estimation of the parameters separately in the two groups. Using the BLUE from simultaneous estimation, we then construct an efficient linear estimator for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator). Finally, the optimal design points are determined by minimizing the maximal width of a simultaneous confidence band for the difference of the two regression functions. The advantages of the new approach are illustrated by means of a simulation study, where it is shown that the use of the optimal designs yields substantially narrower confidence bands than the application of uniform designs.
 [6] arXiv:2101.05728 [pdf, other]

Title: New bounds for $k$means and information $k$meansSubjects: Statistics Theory (math.ST)
In this paper, we derive a new dimensionfree nonasymptotic upper bound for the quadratic $k$means excess risk related to the quantization of an i.i.d sample in a separable Hilbert space. We improve the bound of order $\mathcal{O} \bigl( k / \sqrt{n} \bigr)$ of Biau, Devroye and Lugosi, by establishing a bound of order $\mathcal{O} \bigl(\log(n/k) \sqrt{k \log(k) / n} \, \bigr)$ where $k$ is the number of centers and $n$ the sample size. This is essentially optimal up to logarithmic factors since a lower bound of order $\mathcal{O} \bigl( \sqrt{k^{1  4/d}/n} \bigr)$ is known in dimension $d$. Our technique of proof is based on the linearization of the $k$means criterion through a kernel trick and on PACBayesian inequalities. To get a $1 / \sqrt{n}$ speed, we introduce a new PACBayesian chaining method replacing the concept of $\delta$net with the perturbation of the parameter by an infinite dimensional Gaussian process.
In the meantime, we embed the usual $k$means criterion into a broader family built upon the Kullback divergence and its underlying properties. This results in a new algorithm that we named information $k$means, well suited to the clustering of bags of words. Based on considerations from information theory, we also introduce a new bounded $k$means criterion that uses a scale parameter but satisfies a generalization bound that does not require any boundedness or even integrability conditions on the sample. We describe the counterpart of Lloyd's algorithm and prove generalization bounds for these new $k$means criteria.
Crosslists for Fri, 15 Jan 21
 [7] arXiv:2101.05780 (crosslist from math.PR) [pdf, other]

Title: Explicit nonasymptotic bounds for the distance to the firstorder Edgeworth expansionComments: 41 pages, 3 figuresSubjects: Probability (math.PR); Econometrics (econ.EM); Statistics Theory (math.ST)
In this article, we study bounds on the uniform distance between the cumulative distribution function of a standardized sum of independent centered random variables with moments of order four and its firstorder Edgeworth expansion. Existing bounds are sharpened in two frameworks: when the variables are independent but not identically distributed and in the case of independent and identically distributed random variables. Improvements of these bounds are derived if the third moment of the distribution is zero. We also provide adapted versions of these bounds under additional regularity constraints on the tail behavior of the characteristic function. We finally present an application of our results to the lack of validity of onesided tests based on the normal approximation of the mean for a fixed sample size.
Replacements for Fri, 15 Jan 21
 [8] arXiv:1708.00145 (replaced) [pdf, other]

Title: Semiparametric Efficiency in Convexity Constrained Single Index ModelComments: Removed the density bounded away from zero assumption in assumption (A5). Weakened assumption (B2)Subjects: Statistics Theory (math.ST); Computation (stat.CO); Methodology (stat.ME)
 [9] arXiv:2101.03801 (replaced) [pdf, other]

Title: Hidden Markov chains and fields with observations in Riemannian manifoldsComments: accepted for publication at MTNS 2020Subjects: Statistics Theory (math.ST)
 [10] arXiv:2101.04039 (replaced) [pdf, ps, other]

Title: From Smooth Wasserstein Distance to Dual Sobolev Norm: Empirical Approximation and Statistical ApplicationsSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
 [11] arXiv:1702.02982 (replaced) [pdf, other]

Title: Fixing an error in Caponnetto and de Vito (2007)Authors: Danica J. SutherlandSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [12] arXiv:2005.08794 (replaced) [pdf, other]

Title: Inference on the History of a Randomly Growing TreeComments: 36 pages; 7 figures; 5 tablesSubjects: Probability (math.PR); Statistics Theory (math.ST); Computation (stat.CO)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, math, recent, 2101, contact, help (Access key information)