Statistics Theory
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 30 Jul 21
 [1] arXiv:2107.14004 [pdf, ps, other]

Title: Sparse estimation for generalized exponential marked Hawkes processAuthors: Masatoshi GodaSubjects: Statistics Theory (math.ST)
We have established a sparse estimation method for the generalized exponential marked Hawkes process by the penalized method to the ordinary method (PO) estimator. Furthermore, we evaluated the probability of correct variable selection. In order to achieve this, we established a framework for a likelihood analysis and the PO estimation when there might be nuisance parameters and the true value of the parameter could be realized at the boundary of the parameter space. Numerical simulations are given for several important examples.
 [2] arXiv:2107.14021 [pdf, other]

Title: Polynomials shrinkage estimators of a multivariate normal meanSubjects: Statistics Theory (math.ST)
In this work, the estimation of the multivariate normal mean by different classes of shrinkage estimators is investigated. The risk associated with the balanced loss function is used to compare two estimators. We start by considering estimators that generalize the JamesStein estimator and show that these estimators dominate the maximum likelihood estimator (MLE), therefore are minimax, when the shrinkage function satisfies some conditions. Then, we treat estimators of polynomial form and prove the increase of the degree of the polynomial allows us to build a better estimator from the one previously constructed.
 [3] arXiv:2107.14172 [pdf, other]

Title: CAD: Debiasing the Lasso with inaccurate covariate modelSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We consider the problem of estimating a lowdimensional parameter in highdimensional linear regression. Constructing an approximately unbiased estimate of the parameter of interest is a crucial step towards performing statistical inference. Several authors suggest to orthogonalize both the variable of interest and the outcome with respect to the nuisance variables, and then regress the residual outcome with respect to the residual variable. This is possible if the covariance structure of the regressors is perfectly known, or is sufficiently structured that it can be estimated accurately from data (e.g., the precision matrix is sufficiently sparse).
Here we consider a regime in which the covariate model can only be estimated inaccurately, and hence existing debiasing approaches are not guaranteed to work. When errors in estimating the covariate model are correlated with errors in estimating the linear model parameter, an incomplete elimination of the bias occurs. We propose the Correlation Adjusted Debiased Lasso (CAD), which nearly eliminates this bias in some cases, including cases in which the estimation errors are neither negligible nor orthogonal.
We consider a setting in which some unlabeled samples might be available to the statistician alongside labeled ones (semisupervised learning), and our guarantees hold under the assumption of jointly Gaussian covariates. The new debiased estimator is guaranteed to cancel the bias in two cases: (1) when the total number of samples (labeled and unlabeled) is larger than the number of parameters, or (2) when the covariance of the nuisance (but not the effect of the nuisance on the variable of interest) is known. Neither of these cases is treated by stateoftheart methods.  [4] arXiv:2107.14184 [pdf, ps, other]

Title: Wasserstein Conditional Independence TestingAuthors: Andrew WarrenComments: 32 pagesSubjects: Statistics Theory (math.ST); Optimization and Control (math.OC)
We introduce a test for the conditional independence of random variables $X$ and $Y$ given a random variable $Z$, specifically by sampling from the joint distribution $(X,Y,Z)$, binning the support of the distribution of $Z$, and conducting multiple $p$Wasserstein twosample tests. Under a $p$Wasserstein Lipschitz assumption on the conditional distributions $\mathcal{L}_{XZ}$, $\mathcal{L}_{YZ}$, and $\mathcal{L}_{(X,Y)Z}$, we show that it is possible to control the Type I and Type II error of this test, and give examples of explicit finitesample error bounds in the case where the distribution of $Z$ has compact support.
Crosslists for Fri, 30 Jul 21
 [5] arXiv:2107.13656 (crosslist from cs.LG) [pdf, ps, other]

Title: Characterizing the Generalization Error of Gibbs Algorithm with Symmetrized KL informationComments: The first and second author have contributed equally to the paper. This paper is accepted in the ICML21 Workshop on InformationTheoretic Methods for Rigorous, Responsible, and Reliable Machine Learning: this https URLSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the wellknown Gibbs algorithm in terms of symmetrized KL information between the input training samples and the output hypothesis. Such a result can be applied to tighten existing expected generalization error bound. Our analysis provides more insight on the fundamental role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.
 [6] arXiv:2107.13756 (crosslist from stat.ME) [pdf, other]

Title: Binomial Mixture Model With Ushape ConstraintComments: 45 pages, 26 figures, 2 tablesSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
In this article, we study the binomial mixture model under the regime that the binomial size $m$ can be relatively large compared to the sample size $n$. This project is motivated by the GeneFishing method (Liu et al., 2019), whose output is a combination of the parameter of interest and the subsampling noise. To tackle the noise in the output, we utilize the observation that the density of the output has a U shape and model the output with the binomial mixture model under a U shape constraint. We first analyze the estimation of the underlying distribution F in the binomial mixture model under various conditions for F. Equipped with these theoretical understandings, we propose a simple method Ucut to identify the cutoffs of the U shape and recover the underlying distribution based on the Grenander estimator (Grenander, 1956). It has been shown that when $m = {\Omega}(n^{\frac{2}{3}})$, he identified cutoffs converge at the rate $O(n^{\frac{1}{3}})$. The $L_1$ distance between the recovered distribution and the true one decreases at the same rate. To demonstrate the performance, we apply our method to varieties of simulation studies, a GTEX dataset used in (Liu et al., 2019) and a single cell dataset from Tabula Muris.
Replacements for Fri, 30 Jul 21
 [7] arXiv:2011.05258 (replaced) [pdf, other]

Title: Group testing and local search: is there a computationalstatistical gap?Comments: Accepted for publication in COLT 2021. Various minor mistakes are correctedSubjects: Statistics Theory (math.ST); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)
 [8] arXiv:2102.09497 (replaced) [pdf, other]

Title: Regressiontype analysis for block maxima on block maximaSubjects: Statistics Theory (math.ST)
 [9] arXiv:2103.02853 (replaced) [pdf, other]

Title: A multivariate normal approximation for the Dirichlet density and some applicationsAuthors: Frédéric OuimetComments: 12 pages, 6 figuresSubjects: Statistics Theory (math.ST); Probability (math.PR)
 [10] arXiv:2106.03187 (replaced) [pdf, ps, other]

Title: Tempered Stable Autoregressive ModelsComments: 19 pages, 8 figures and 3 tablesSubjects: Statistics Theory (math.ST)
 [11] arXiv:2106.05955 (replaced) [pdf, other]

Title: Bayesian inference of a nonlocal proliferation modelSubjects: Statistics Theory (math.ST)
 [12] arXiv:2107.12525 (replaced) [pdf, ps, other]

Title: Proof: Accelerating Approximate Aggregation Queries with Expensive PredicatesSubjects: Statistics Theory (math.ST); Databases (cs.DB); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [13] arXiv:2107.13494 (replaced) [pdf, ps, other]

Title: Limit Distribution Theory for the Smooth 1Wasserstein Distance with ApplicationsSubjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)
 [14] arXiv:1701.08083 (replaced) [pdf, other]

Title: Ensemble Estimation of Generalized Mutual Information with Applications to GenomicsComments: Published in IEEE Transactions on Information Theory in 2021; 42 pages, 3 figures; a shorter version of this paper was published at IEEE ISIT 2017 under the title "Ensemble estimation of mutual information"Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)
 [15] arXiv:1910.10692 (replaced) [pdf, other]

Title: Deterministic tensor completion with hypergraph expandersComments: 35 pages, 4 figures. To appear in SIAM Journal on Mathematics of Data ScienceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
 [16] arXiv:2007.00830 (replaced) [pdf, other]

Title: Unlinked monotone regressionComments: 60 pages; 3 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [17] arXiv:2101.11815 (replaced) [pdf, other]

Title: Interpolating Classifiers Make Few MistakesComments: 23 pages, 2 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Statistics Theory (math.ST)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, math, recent, 2107, contact, help (Access key information)