Statistics Theory
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 24 May 22
 [1] arXiv:2205.10524 [pdf, ps, other]

Title: Robust density estimation with the $\mathbb{L}_{1}$loss. Applications to the estimation of a density on the line satisfying a shape constraintSubjects: Statistics Theory (math.ST)
We solve the problem of estimating the distribution of presumed i.i.d. observations for the total variation loss. Our approach is based on density models and is versatile enough to cope with many different ones, including some density models for which the Maximum Likelihood Estimator (MLE for short) does not exist. We mainly illustrate the properties of our estimator on models of densities on the line that satisfy a shape constraint. We show that it possesses some similar optimality properties, with regard to some global rates of convergence, as the MLE does when it exists. It also enjoys some adaptation properties with respect to some specific target densities in the model for which our estimator is proven to converge at parametric rate. More important is the fact that our estimator is robust, not only with respect to model misspecification, but also to contamination, the presence of outliers among the dataset and the equidistribution assumption. This means that the estimator performs almost as well as if the data were i.i.d. with density $p$ in a situation where these data are only independent and most of their marginals are close enough in total variation to a distribution with density $p$. Our main result on the risk of the estimator takes the form of an exponential deviation inequality which is nonasymptotic and involves explicit numerical constants. We deduce from it several global rates of convergence, including some bounds for the minimax $\mathbb{L}_{1}$risks over the sets of concave and logconcave densities. These bounds derive from some specific results on the approximation of densities which are monotone, convex, concave and logconcave. Such results may be of independent interest.
 [2] arXiv:2205.10799 [pdf, ps, other]

Title: On point estimators for Gamma and Beta distributionsAuthors: Nickos PapadatosComments: Dedicated to Professor Stavros Kourouklis (18 pages, including one Table)Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
Let $X_1,\ldots,X_n$ be a random sample from the Gamma distribution with density $f(x)=\lambda^{\alpha}x^{\alpha1}e^{\lambda x}/\Gamma(\alpha)$, $x>0$, where both $\alpha>0$ (the shape parameter) and $\lambda>0$ (the reciprocal scale parameter) are unknown. The main result shows that the uniformly minimum variance unbiased estimator (UMVUE) of the shape parameter, $\alpha$, exists if and only if $n\geq 4$; moreover, it has finite variance if and only if $n\geq 6$. More precisely, the form of the UMVUE is given for all parametric functions $\alpha$, $\lambda$, $1/\alpha$ and $1/\lambda$. Furthermore, a highly efficient estimating procedure for the twoparameter Beta distribution is also given. This is based on a Steintype covariance identity for the Beta distribution, followed by an application of the theory of $U$statistics and the deltamethod.
MSC: Primary 62F10; 62F12; Secondary 62E15.
Key words and phrases: unbiased estimation; Gamma distribution; Beta distribution; YeChentype closedform estimators; asymptotic efficiency; $U$statistics; Steintype covariance identity; deltamethod.  [3] arXiv:2205.10886 [pdf, other]

Title: Adaptive estimation for the nonparametric bivariate additive model in random design with longmemory dependent errorsSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We investigate the nonparametric bivariate additive regression estimation in the random design and longmemory errors and construct adaptive thresholding estimators based on wavelet series. The proposed approach achieves asymptotically nearoptimal convergence rates when the unknown function and its univariate additive components belong to Besov space. We consider the problem under two noise structures; (1) homoskedastic Gaussian long memory errors and (2) heteroskedastic Gaussian long memory errors. In the homoskedastic longmemory error case, the estimator is completely adaptive with respect to the longmemory parameter. In the heteroskedastic longmemory case, the estimator may not be adaptive with respect to the longmemory parameter unless the heteroskedasticity is of polynomial form. In either case, the convergence rates depend on the longmemory parameter only when longmemory is strong enough, otherwise, the rates are identical to those under i.i.d. errors. The proposed approach is extended to the general $r$dimensional additive case, with $r>2$, and the corresponding convergence rates are free from the curse of dimensionality.
 [4] arXiv:2205.11092 [pdf, ps, other]

Title: Estimation of the Hurst parameter from continuous noisy dataSubjects: Statistics Theory (math.ST)
This paper addresses the problem of estimating the Hurst exponent of the fractional Brownian motion from continuous time noisy sample. Consistent estimation in the setup under consideration is possible only if either the length of the observation interval increases to infinity or intensity of the noise decreases to zero. The main result is a proof of the Local Asymptotic Normality (LAN) of the model in these two regimes, which reveals the optimal minimax rates.
 [5] arXiv:2205.11302 [pdf, other]

Title: Exchangeable FGM copulasSubjects: Statistics Theory (math.ST)
Copulas are a powerful tool to model dependence between the components of a random vector. One wellknown class of copulas when working in two dimensions is the FarlieGumbelMorgenstern (FGM) copula since their simple analytic shape enables closedform solutions to many problems in applied probability. However, the classical definition of highdimensional FGM copula does not enable a straightforward understanding of the effect of the copula parameters on the dependence, nor a geometric understanding of their admissible range. We circumvent this issue by studying the FGM copula from a probabilistic approach based on multivariate Bernoulli distributions. This paper studies highdimensional exchangeable FGM copulas, a subclass of FGM copulas. We show that dependence parameters of exchangeable FGM can be expressed as convex hulls of a finite number of extreme points and establish partial orders for different exchangeable FGM copulas (including maximal and minimal dependence). We also leverage the probabilistic interpretation to develop efficient sampling and estimating procedures and provide a simulation study. Throughout, we discover geometric interpretations of the copula parameters that assist one in decoding the dependence of highdimensional exchangeable FGM copulas.
Crosslists for Tue, 24 May 22
 [6] arXiv:2205.10697 (crosslist from stat.ML) [pdf, ps, other]

Title: The Selectively Adaptive LassoSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
Machine learning regression methods allow estimation of functions without unrealistic parametric assumptions. Although they can perform exceptionally in prediction error, most lack theoretical convergence rates necessary for semiparametric efficient estimation (e.g. TMLE, AIPW) of parameters like average treatment effects. The Highly Adaptive Lasso (HAL) is the only regression method proven to converge quickly enough for a meaningfully large class of functions, independent of the dimensionality of the predictors. Unfortunately, HAL is not computationally scalable. In this paper we build upon the theory of HAL to construct the Selectively Adaptive Lasso (SAL), a new algorithm which retains HAL's dimensionfree, nonparametric convergence rate but which also scales computationally to massive datasets. To accomplish this, we prove some general theoretical results pertaining to empirical loss minimization in nested Donsker classes. Our resulting algorithm is a form of gradient tree boosting with an adaptive learning rate, which makes it fast and trivial to implement with offtheshelf software. Finally, we show that our algorithm retains the performance of standard gradient boosting on a diverse group of realworld datasets. SAL makes semiparametric efficient estimators practically possible and theoretically justifiable in many big data settings.
 [7] arXiv:2205.10798 (crosslist from cs.LG) [pdf, other]

Title: PACWrap: SemiSupervised PAC Anomaly DetectionComments: Accepted by SIGKDD 2022Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Anomaly detection is essential for preventing hazardous outcomes for safetycritical applications like autonomous driving. Given their safetycriticality, these applications benefit from provable bounds on various errors in anomaly detection. To achieve this goal in the semisupervised setting, we propose to provide Probably Approximately Correct (PAC) guarantees on the false negative and false positive detection rates for anomaly detection algorithms. Our method (PACWrap) can wrap around virtually any existing semisupervised and unsupervised anomaly detection method, endowing it with rigorous guarantees. Our experiments with various anomaly detectors and datasets indicate that PACWrap is broadly effective.
 [8] arXiv:2205.10810 (crosslist from math.PR) [pdf, ps, other]

Title: On the inversion of the Laplace transform (In Memory of Dimitris Gatzouras)Authors: Nickos PapadatosComments: 14 pagesSubjects: Probability (math.PR); Statistics Theory (math.ST); Methodology (stat.ME)
The Laplace transform is a useful and powerful analytic tool with applications to several areas of applied mathematics, including differential equations, probability and statistics. Similarly to the inversion of the Fourier transform, inversion formulae for the Laplace transform are of central importance; such formulae are old and wellknown (FourierMellin or Bromwich integral, PostWidder inversion). The present work is motivated from an elementary statistical problem, namely, the unbiased estimation of a parametric function of the scale in the basic model of a random sample from exponential distribution. The form of the uniformly minimum variance unbiased estimator of a parametric function $h(\lambda)$, as well as its variance, are obtained as series in Laguerre polynomials and the corresponding Fourier coefficients, and a particular application of this result yields a novel inversion formula for the Laplace transform.
MSC: Primary 44A10, 62F10.
Key words and phrases: Exponential Distribution, Unbiased Estimation; FourierLaguerre Series; Inverse Laplace Transform; Laguerre Polynomials.  [9] arXiv:2205.10895 (crosslist from cs.LG) [pdf, ps, other]

Title: Contextual InformationDirected SamplingComments: Accepted at ICML 2022Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Informationdirected sampling (IDS) has recently demonstrated its potential as a dataefficient reinforcement learning algorithm. However, it is still unclear what is the right form of information ratio to optimize when contextual information is available. We investigate the IDS design through two contextual bandit problems: contextual bandits with graph feedback and sparse linear contextual bandits. We provably demonstrate the advantage of contextual IDS over conditional IDS and emphasize the importance of considering the context distribution. The main message is that an intelligent agent should invest more on the actions that are beneficial for the future unseen contexts while the conditional IDS can be myopic. We further propose a computationallyefficient version of contextual IDS based on ActorCritic and evaluate it empirically on a neural network contextual bandit.
 [10] arXiv:2205.11078 (crosslist from stat.ML) [pdf, other]

Title: Beyond EM Algorithm on Overspecified TwoComponent LocationScale Gaussian MixturesComments: 38 pages, 4 figures. Tongzheng Ren and Fuheng Cui contributed equally to this workSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
The ExpectationMaximization (EM) algorithm has been predominantly used to approximate the maximum likelihood estimation of the locationscale Gaussian mixtures. However, when the models are overspecified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the final statistical radius; this is computationally expensive in practice. The slow convergence of EM is due to the missing of the locally strong convexity with respect to the location parameter on the negative population loglikelihood function, i.e., the limit of the negative sample loglikelihood function when the sample size goes to infinity. To efficiently explore the curvature of the negative loglikelihood functions, by specifically considering twocomponent locationscale Gaussian mixtures, we develop the Exponential Location Update (ELU) algorithm. The idea of the ELU algorithm is that we first obtain the exact optimal solution for the scale parameter and then perform an exponential stepsize gradient descent for the location parameter. We demonstrate theoretically and empirically that the ELU iterates converge to the final statistical radius of the models after a logarithmic number of iterations. To the best of our knowledge, it resolves the longstanding open question in the literature about developing an optimization algorithm that has optimal statistical and computational complexities for solving parameter estimation even under some specific settings of the overspecified Gaussian mixture models.
 [11] arXiv:2205.11121 (crosslist from cs.CR) [pdf, ps, other]

Title: A normal approximation for joint frequency estimatation under Local Differential PrivacyAuthors: Thomas CaretteComments: Preliminary development, draftSubjects: Cryptography and Security (cs.CR); Databases (cs.DB); Statistics Theory (math.ST)
In the recent years, Local Differential Privacy (LDP) has been one of the corner stone of privacy preserving data analysis. However, many challenges still opposes its widespread application. One of these problems is the scalability of LDP to high dimensional data, in particular for estimating jointdistributions. In this paper, we develop an approximate estimator for category frequency jointdistribution under socalled pure LDP protocols.
Replacements for Tue, 24 May 22
 [12] arXiv:2003.13208 (replaced) [pdf, other]

Title: Minimax optimality of permutation testsComments: Appendix D is added (Monte Carlobased permutation tests) / Several typos are fixedSubjects: Statistics Theory (math.ST)
 [13] arXiv:2109.13190 (replaced) [pdf, ps, other]

Title: Estimating the characteristics of stochastic damping Hamiltonian systems from continuous observationsSubjects: Statistics Theory (math.ST); Probability (math.PR)
 [14] arXiv:2111.03705 (replaced) [pdf, ps, other]

Title: Strong Recovery In Group SynchronizationAuthors: Bradley StichSubjects: Statistics Theory (math.ST); Probability (math.PR)
 [15] arXiv:2204.11038 (replaced) [pdf, ps, other]

Title: Dimension free nonasymptotic bounds on the accuracy of high dimensional Laplace approximationAuthors: Vladimir SpokoinySubjects: Statistics Theory (math.ST); Numerical Analysis (math.NA)
 [16] arXiv:1910.09219 (replaced) [pdf, other]

Title: A Transformation Perspective on Marginal and Conditional ModelsSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [17] arXiv:2004.04254 (replaced) [pdf, other]

Title: Posterior computation with the Gibbs zigzag samplerSubjects: Computation (stat.CO); Statistics Theory (math.ST)
 [18] arXiv:2102.03607 (replaced) [pdf, other]

Title: Bootstrapping Fitted QEvaluation for OffPolicy InferenceComments: Accepted at ICML 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [19] arXiv:2102.12225 (replaced) [pdf, other]

Title: Valid Instrumental Variables Selection Methods using Auxiliary Variable and Constructing Efficient EstimatorAuthors: Shunichiro OriharaComments: Keywords: Causal inference, Exclusion restriction, Instrumental variable, Mendelian randomization, Negative control outcome, Semiparametric efficiency, Variable selection, Unmeasured covariatesSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Other Statistics (stat.OT)
 [20] arXiv:2202.12431 (replaced) [pdf, other]

Title: Thompson Sampling with Unrestricted DelaysSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, math, recent, 2205, contact, help (Access key information)