We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics Theory

New submissions

[ total of 18 entries: 1-18 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 23 Apr 24

[1]  arXiv:2404.13248 [pdf, ps, other]
Title: Pure Significance Tests for Multinomial and Binomial Distributions: the Uniform Alternative
Comments: 32 pages, 3 tables
Subjects: Statistics Theory (math.ST)

A {\it pure significance test} (PST) tests a simple null hypothesis $H_f:Y\sim f$ {\it without specifying an alternative hypothesis} by rejecting $H_f$ for {\it small} values of $f(Y)$. When the sample space supports a proper uniform pmf $f_\mathrm{unif}$, the PST can be viewed as a classical likelihood ratio test for testing $H_f$ against this uniform alternative. Under this interpretation, standard test features such as power, Kullback-Leibler divergence, and expected $p$-value can be considered. This report focuses on PSTs for multinomial and binomial distributions, and for the related goodness-of-fit testing problems with the uniform alternative. The case of repeated observations cannot be reduced to the single observation case via sufficiency. The {\it ordered binomial distribution}, apparently new, arises in the course of this study.

[2]  arXiv:2404.13822 [pdf, other]
Title: Higher-Order Graphon Theory: Fluctuations, Degeneracies, and Inference
Comments: Abstract shortened to meet arXiv requirements
Subjects: Statistics Theory (math.ST); Probability (math.PR)

Exchangeable random graphs, which include some of the most widely studied network models, have emerged as the mainstay of statistical network analysis in recent years. Graphons, which are the central objects in graph limit theory, provide a natural way to sample exchangeable random graphs. It is well known that network moments (motif/subgraph counts) identify a graphon (up to an isomorphism), hence, understanding the sampling distribution of subgraph counts in random graphs sampled from a graphon is pivotal for nonparametric network inference. In this paper, we derive the joint asymptotic distribution of any finite collection of network moments in random graphs sampled from a graphon, that includes both the non-degenerate case (where the distribution is Gaussian) as well as the degenerate case (where the distribution has both Gaussian or non-Gaussian components). This provides the higher-order fluctuation theory for subgraph counts in the graphon model. We also develop a novel multiplier bootstrap for graphons that consistently approximates the limiting distribution of the network moments (both in the Gaussian and non-Gaussian regimes). Using this and a procedure for testing degeneracy, we construct joint confidence sets for any finite collection of motif densities. This provides a general framework for statistical inference based on network moments in the graphon model. To illustrate the broad scope of our results we also consider the problem of detecting global structure (that is, testing whether the graphon is a constant function) based on small subgraphs. We propose a consistent test for this problem, invoking celebrated results on quasi-random graphs, and derive its limiting distribution both under the null and the alternative.

[3]  arXiv:2404.13960 [pdf, other]
Title: A Geometric Perspective on Double Robustness by Semiparametric Theory and Information Geometry
Authors: Andrew Ying
Subjects: Statistics Theory (math.ST)

Double robustness (DR) is a widely-used property of estimators that provides protection against model misspecification and slow convergence of nuisance functions. While DR is a global property on the probability distribution manifold, it often coincides with influence curves, which only ensure orthogonality to nuisance directions locally. This apparent discrepancy raises fundamental questions about the theoretical underpinnings of DR.
In this short communication, we address two key questions: (1) Why do influence curves frequently imply DR "for free"? (2) Under what conditions do DR estimators exist for a given statistical model and parameterization? Using tools from semiparametric theory, we show that convexity is the crucial property that enables influence curves to imply DR. We then derive necessary and sufficient conditions for the existence of DR estimators under a mean squared differentiable path-connected parameterization.
Our main contribution also lies in the novel geometric interpretation of DR using information geometry. By leveraging concepts such as parallel transport, m-flatness, and m-curvature freeness, we characterize DR in terms of invariance along submanifolds. This geometric perspective deepens the understanding of when and why DR estimators exist.
The results not only resolve apparent mysteries surrounding DR but also have practical implications for the construction and analysis of DR estimators. The geometric insights open up new connections and directions for future research. Our findings aim to solidify the theoretical foundations of a fundamental concept and contribute to the broader understanding of robust estimation in statistics.

[4]  arXiv:2404.14227 [pdf, ps, other]
Title: Estimation for SLS models: finite sample guarantees
Subjects: Statistics Theory (math.ST)

This note continues and extends the study from Spokoiny (2023a) about estimation for parametric models with possibly large or even infinite parameter dimension. We consider a special class of stochastically linear smooth (SLS) models satisfying three major conditions: the stochastic component of the log-likelihood is linear in the model parameter, while the expected log-likelihood is a smooth and concave function. For the penalized maximum likelihood estimators (pMLE), we establish several finite sample bounds about its concentration and large deviations as well as the Fisher and Wilks expansions and risk bounds. In all results, the remainder is given explicitly and can be evaluated in terms of the effective sample size $ n $ and effective parameter dimension $ \mathbb{p} $ which allows us to identify the so-called \emph{critical parameter dimension}. The results are also dimension and coordinate-free. Despite generality, all the presented bounds are nearly sharp and the classical asymptotic results can be obtained as simple corollaries. Our results indicate that the use of advanced fourth-order expansions allows to relax the critical dimension condition $ \mathbb{p}^{3} \ll n $ from Spokoiny (2023a) to $ \mathbb{p}^{3/2} \ll n $. Examples for classical models like logistic regression, log-density and precision matrix estimation illustrate the applicability of general results.

[5]  arXiv:2404.14229 [pdf, other]
Title: Colored Stochastic Multiplicative Processes with Additive Noise Unveil a Third-Order PDE, Defying Conventional FPE and Fick-Law Paradigms
Comments: 25 pages, 4 figures
Subjects: Statistics Theory (math.ST); Statistical Mechanics (cond-mat.stat-mech); Probability (math.PR)

Research on stochastic differential equations (SDE) involving both additive and multiplicative noise has been extensive. In situations where the primary process is driven by a multiplicative stochastic process, additive white noise typically represents an intrinsic and unavoidable fast factor, including phenomena like thermal fluctuations, inherent uncertainties in measurement processes, or rapid wind forcing in ocean dynamics. This work focuses on a significant class of such systems, particularly those characterized by linear drift and multiplicative noise, extensively explored in the literature. Conventionally, multiplicative stochastic processes are also treated as white noise in existing studies. However, when considering colored multiplicative noise, the emphasis has been on characterizing the far tails of the probability density function (PDF), regardless of the spectral properties of the noise. In the absence of additive noise and with a general colored multiplicative SDE, standard perturbation approaches lead to a second-order PDE known as the Fokker-Planck Equation (FPE), consistent with Fick's law. This investigation unveils a notable departure from this standard behavior when introducing additive white noise. At the leading order of the stochastic process strength, perturbation approaches yield a \textit{third-order PDE}, irrespective of the white noise intensity. The breakdown of the FPE further signifies the breakdown of Fick's law. Additionally, we derive the explicit solution for the equilibrium PDF corresponding to this third-order PDE Master Equation. Through numerical simulations, we demonstrate significant deviations from outcomes derived using the FPE obtained through the application of Fick's law.

Cross-lists for Tue, 23 Apr 24

[6]  arXiv:2404.13355 (cross-list from math.NA) [pdf, other]
Title: Extrapolation and generative algorithms for three applications in finance
Comments: 9 pages
Subjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)

For three applications of central interest in finance, we demonstrate the relevance of numerical algorithms based on reproducing kernel Hilbert space (RKHS) techniques. Three use cases are investigated. First, we show that extrapolating from few pricer examples leads to sufficiently accurate and computationally efficient results so that our algorithm can serve as a pricing framework. The second use case concerns reverse stress testing, which is formulated as an inversion function problem and is treated here via an optimal transport technique in combination with the notions of kernel-based encoders, decoders, and generators. Third, we show that standard techniques for time series analysis can be enhanced by using the proposed generative algorithms. Namely, we use our algorithm in order to extend the validity of any given quantitative model. Our approach allows for conditional analysis as well as for escaping the `Gaussian world'. This latter property is illustrated here with a portfolio investment strategy.

[7]  arXiv:2404.14136 (cross-list from q-fin.ST) [pdf, ps, other]
Title: Elicitability and identifiability of tail risk measures
Comments: 31 pages
Subjects: Statistical Finance (q-fin.ST); Statistics Theory (math.ST); Risk Management (q-fin.RM); Methodology (stat.ME)

Tail risk measures are fully determined by the distribution of the underlying loss beyond its quantile at a certain level, with Value-at-Risk and Expected Shortfall being prime examples. They are induced by law-based risk measures, called their generators, evaluated on the tail distribution. This paper establishes joint identifiability and elicitability results of tail risk measures together with the corresponding quantile, provided that their generators are identifiable and elicitable, respectively. As an example, we establish the joint identifiability and elicitability of the tail expectile together with the quantile. The corresponding consistent scores constitute a novel class of weighted scores, nesting the known class of scores of Fissler and Ziegel for the Expected Shortfall together with the quantile. For statistical purposes, our results pave the way to easier model fitting for tail risk measures via regression and the generalized method of moments, but also model comparison and model validation in terms of established backtesting procedures.

Replacements for Tue, 23 Apr 24

[8]  arXiv:2302.06739 (replaced) [pdf, ps, other]
Title: Asymptotic Theory for Doubly Robust Estimators with Continuous-Time Nuisance Parameters
Authors: Andrew Ying
Subjects: Statistics Theory (math.ST)
[9]  arXiv:2303.03649 (replaced) [pdf, ps, other]
Title: PanIC: consistent information criteria for general model selection problems
Authors: Hien Duy Nguyen
Subjects: Statistics Theory (math.ST)
[10]  arXiv:2310.01374 (replaced) [pdf, other]
Title: Corrected generalized cross-validation for finite ensembles of penalized estimators
Comments: 91 pages, 34 figures; this version adds general proof outlines (in Sections 4.3 and 5.3), add more experiments with non-Gaussian data (in Sections D and E), relaxes an assumption (in Section A.7), clarifies explanations at several places, and corrects minor typos at several places
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
[11]  arXiv:2404.05006 (replaced) [pdf, other]
Title: High-dimensional bootstrap and asymptotic expansion
Authors: Yuta Koike
Comments: 60 pages, 1 figure, 2 tables. In Corollary 3.1, the sphericity of the covariance matrix was relaxed to the boundedness of the eigenvalues
Subjects: Statistics Theory (math.ST); Probability (math.PR)
[12]  arXiv:2404.06850 (replaced) [src]
Title: From naive trees to Random Forests: A general approach for proving consistency of tree-based methods
Comments: Incorrect Proof
Subjects: Statistics Theory (math.ST)
[13]  arXiv:2109.02204 (replaced) [pdf, other]
Title: On the edge eigenvalues of the precision matrices of nonstationary autoregressive processes
Authors: Junho Yang
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[14]  arXiv:2209.10128 (replaced) [pdf, other]
Title: Efficient Integrated Volatility Estimation in the Presence of Infinite Variation Jumps via Debiased Truncated Realized Variations
Comments: An earlier version of this manuscript was circulated under the title "Efficient Volatility Estimation for L\'evy Processes with Jumps of Unbounded Variation". The results therein were constrained to L\'evy processes, whereas here we consider a much larger class of It\^o semimartingales. arXiv admin note: text overlap with arXiv:2202.00877
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST); Statistical Finance (q-fin.ST)
[15]  arXiv:2303.08987 (replaced) [pdf, other]
Title: Generalized Score Matching
Comments: arXiv admin note: substantial text overlap with arXiv:2203.09864
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
[16]  arXiv:2307.00017 (replaced) [pdf, ps, other]
Title: Explicit mutual information for simple networks and neurons with lognormal activities
Comments: Corrected version: corrected and expanded discussion of neural mutual information (lognormal vs. gaussian distributions)
Journal-ref: Phys. Rev. E 109, 014117 (2024)
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistics Theory (math.ST)
[17]  arXiv:2311.07454 (replaced) [pdf, other]
Title: Discrete Nonparametric Causal Discovery Under Latent Class Confounding
Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Statistics Theory (math.ST)
[18]  arXiv:2404.12696 (replaced) [pdf, other]
Title: Gaussian dependence structure pairwise goodness-of-fit testing based on conditional covariance and the 20/60/20 rule
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[ total of 18 entries: 1-18 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2404, contact, help  (Access key information)