We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics Theory

New submissions

[ total of 10 entries: 1-10 ]
[ showing up to 500 entries per page: fewer | more ]

New submissions for Fri, 1 Jul 22

[1]  arXiv:2206.14896 [pdf, ps, other]
Title: Threshold for Detecting High Dimensional Geometry in Anisotropic Random Geometric Graphs
Comments: 11 pages, comments welcome
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR)

In the anisotropic random geometric graph model, vertices correspond to points drawn from a high-dimensional Gaussian distribution and two vertices are connected if their distance is smaller than a specified threshold. We study when it is possible to hypothesis test between such a graph and an Erd\H{o}s-R\'enyi graph with the same edge probability. If $n$ is the number of vertices and $\alpha$ is the vector of eigenvalues, Eldan and Mikulincer show that detection is possible when $n^3 \gg (\|\alpha\|_2/\|\alpha\|_3)^6$ and impossible when $n^3 \ll (\|\alpha\|_2/\|\alpha\|_4)^4$. We show detection is impossible when $n^3 \ll (\|\alpha\|_2/\|\alpha\|_3)^6$, closing this gap and affirmatively resolving the conjecture of Eldan and Mikulincer.

[2]  arXiv:2206.15178 [pdf, other]
Title: Likelihood Asymptotics in Nonregular Settings: A Review with Emphasis on the Likelihood Ratio
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

This paper reviews the most common situations where one or more regularity conditions which underlie classical likelihood-based parametric inference fail. We identify three main classes of problems: boundary problems, indeterminate parameter problems--which include non-identifiable parameters and singular information matrices--and change-point problems. The review focuses on the large-sample properties of the likelihood ratio statistic, though other approaches to hypothesis testing and connections to estimation may be mentioned in passing. We emphasize analytical solutions and acknowledge software implementations where available. Some summary insight about the possible tools to derivate the key results is given.

[3]  arXiv:2206.15209 [pdf, other]
Title: Designing to detect heteroscedasticity in a regression model
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

We consider the problem of designing experiments to detect the presence of a specified heteroscedastity in a non-linear Gaussian regression model. In this framework, we focus on the ${\rm D}_s$- and KL-criteria and study their relationship with the noncentrality parameter of the asymptotic chi-squared distribution of a likelihood-based test, for local alternatives. Specifically, we found that when the variance function depends just on one parameter, the two criteria coincide asymptotically and in particular, the ${\rm D}_1$-criterion is proportional to the noncentrality parameter. Differently, if the variance function depends on a vector of parameters, then the KL-optimum design converges to the design that maximizes the noncentrality parameter. Furthermore, we confirm our theoretical findings through a simulation study concerning the computation of asymptotic and exact powers of the log-likelihood ratio statistic.

Cross-lists for Fri, 1 Jul 22

[4]  arXiv:2206.15273 (cross-list from cs.LG) [pdf, ps, other]
Title: Invariance Properties of the Natural Gradient in Overparametrised Systems
Journal-ref: Information Geometry, Springer, 2022
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)

The natural gradient field is a vector field that lives on a model equipped with a distinguished Riemannian metric, e.g. the Fisher-Rao metric, and represents the direction of steepest ascent of an objective function on the model with respect to this metric. In practice, one tries to obtain the corresponding direction on the parameter space by multiplying the ordinary gradient by the inverse of the Gram matrix associated with the metric. We refer to this vector on the parameter space as the natural parameter gradient. In this paper we study when the pushforward of the natural parameter gradient is equal to the natural gradient. Furthermore we investigate the invariance properties of the natural parameter gradient. Both questions are addressed in an overparametrised setting.

[5]  arXiv:2206.15335 (cross-list from cs.DC) [pdf, ps, other]
Title: Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)

Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected latency $2^{\Theta(n)}$, a bound that has never been improved in this model with $f=\lfloor (n-1)/3 \rfloor$ corruptions.
In this paper we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corrupt $f<n/3$ parties, while incurring only polynomial latency with high probability. Our protocol follows earlier polynomial latency protocols of King and Saia and Huang, Pettie, and Zhu, which had suboptimal resilience, namely $f \approx n/10^9$ and $f<n/4$, respectively.
Resilience $f=(n-1)/3$ is uniquely difficult as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flipping protocol that eventually lets us flip a coin with an unambiguous outcome. In the beginning the influence of the Byzantine players is too powerful to overcome and they can essentially fix the coin's behavior at will. We guarantee that after just a polynomial number of executions of the coin-flipping protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can ``blacklist'' players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test of fraud detection.

[6]  arXiv:2206.15348 (cross-list from stat.CO) [pdf, ps, other]
Title: kStatistics: Unbiased Estimates of Joint Cumulant Products from the Multivariate Faà Di Bruno's Formula
Comments: In press
Journal-ref: (2022) The R Journal
Subjects: Computation (stat.CO); Statistics Theory (math.ST)

kStatistics is a package in R that serves as a unified framework for estimating univariate and multivariate cumulants as well as products of univariate and multivariate cumulants of a random sample, using unbiased estimators with minimum variance. The main computational machinery of kStatistics is an algorithm for computing multi-index partitions. The same algorithm underlies the general-purpose multivariate Fa\`a di Bruno's formula, which has been therefore included in the last release of the package. This formula gives the coefficients of formal power series compositions as well as the partial derivatives of multivariable function compositions. One of the most significant applications of this formula is the possibility to generate many well-known polynomial families as special cases. So, in the package, there are special functions for generating very popular polynomial families, such as the Bell polynomials. However further families can be obtained, for suitable choices of the formal power series involved in the composition or when suitable symbolic strategies are employed. In both cases, we give examples on how to modify the R codes of the package to accomplish this task. Future developments are addressed at the end of the paper.

Replacements for Fri, 1 Jul 22

[7]  arXiv:2111.00949 (replaced) [pdf, ps, other]
Title: Bounds for the chi-square approximation of Friedman's statistic by Stein's method
Comments: 39 pages. This is a technical report. This version differs from the original version by correcting some numerical mistakes and presentation issues
Subjects: Statistics Theory (math.ST); Probability (math.PR)
[8]  arXiv:2005.12395 (replaced) [pdf, other]
Title: Fair Policy Targeting
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
[9]  arXiv:2201.07628 (replaced) [pdf, other]
Title: A quantitative Heppes Theorem and multivariate Bernoulli distributions
Comments: 22 pages, 10 figures, 3 tables
Subjects: Probability (math.PR); Statistics Theory (math.ST)
[10]  arXiv:2206.12722 (replaced) [pdf, other]
Title: Random Processes With Power Law Spectral Density
Authors: Robert Kimberk (1), Keara Carter (1), Todd Hunter (2) ( (1) Smithsonian Astrophysical Observatory, (2) National Radio Astronomy Observatory)
Comments: 11 pages, and 2 figures. Comments welcomed and encouraged Typos and layout corrected 6/30/22
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Statistics Theory (math.ST)
[ total of 10 entries: 1-10 ]
[ showing up to 500 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, math, recent, 2206, contact, help  (Access key information)