Statistics Theory

New submissions

Submissions received from Fri 19 Apr 24 to Mon 22 Apr 24, announced Tue, 23 Apr 24

New submissions
Cross-lists
Replacements

[ total of 18 entries: 1-18 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 23 Apr 24

[1] arXiv:2404.13248 [pdf, ps, other]: Title: Pure Significance Tests for Multinomial and Binomial Distributions: the Uniform Alternative

Authors: Michael D. Perlman

Comments: 32 pages, 3 tables

Subjects: Statistics Theory (math.ST)

A {\it pure significance test} (PST) tests a simple null hypothesis $H_f:Y\sim f$ {\it without specifying an alternative hypothesis} by rejecting $H_f$ for {\it small} values of $f(Y)$. When the sample space supports a proper uniform pmf $f_\mathrm{unif}$, the PST can be viewed as a classical likelihood ratio test for testing $H_f$ against this uniform alternative. Under this interpretation, standard test features such as power, Kullback-Leibler divergence, and expected $p$-value can be considered. This report focuses on PSTs for multinomial and binomial distributions, and for the related goodness-of-fit testing problems with the uniform alternative. The case of repeated observations cannot be reduced to the single observation case via sufficiency. The {\it ordered binomial distribution}, apparently new, arises in the course of this study.
[2] arXiv:2404.13822 [pdf, other]: Title: Higher-Order Graphon Theory: Fluctuations, Degeneracies, and Inference

Authors: Anirban Chatterjee, Soham Dan, Bhaswar B. Bhattacharya

Comments: Abstract shortened to meet arXiv requirements

Subjects: Statistics Theory (math.ST); Probability (math.PR)

Exchangeable random graphs, which include some of the most widely studied network models, have emerged as the mainstay of statistical network analysis in recent years. Graphons, which are the central objects in graph limit theory, provide a natural way to sample exchangeable random graphs. It is well known that network moments (motif/subgraph counts) identify a graphon (up to an isomorphism), hence, understanding the sampling distribution of subgraph counts in random graphs sampled from a graphon is pivotal for nonparametric network inference. In this paper, we derive the joint asymptotic distribution of any finite collection of network moments in random graphs sampled from a graphon, that includes both the non-degenerate case (where the distribution is Gaussian) as well as the degenerate case (where the distribution has both Gaussian or non-Gaussian components). This provides the higher-order fluctuation theory for subgraph counts in the graphon model. We also develop a novel multiplier bootstrap for graphons that consistently approximates the limiting distribution of the network moments (both in the Gaussian and non-Gaussian regimes). Using this and a procedure for testing degeneracy, we construct joint confidence sets for any finite collection of motif densities. This provides a general framework for statistical inference based on network moments in the graphon model. To illustrate the broad scope of our results we also consider the problem of detecting global structure (that is, testing whether the graphon is a constant function) based on small subgraphs. We propose a consistent test for this problem, invoking celebrated results on quasi-random graphs, and derive its limiting distribution both under the null and the alternative.
[3] arXiv:2404.13960 [pdf, other]: Title: A Geometric Perspective on Double Robustness by Semiparametric Theory and Information Geometry

Authors: Andrew Ying

Subjects: Statistics Theory (math.ST)

Double robustness (DR) is a widely-used property of estimators that provides protection against model misspecification and slow convergence of nuisance functions. While DR is a global property on the probability distribution manifold, it often coincides with influence curves, which only ensure orthogonality to nuisance directions locally. This apparent discrepancy raises fundamental questions about the theoretical underpinnings of DR.
In this short communication, we address two key questions: (1) Why do influence curves frequently imply DR "for free"? (2) Under what conditions do DR estimators exist for a given statistical model and parameterization? Using tools from semiparametric theory, we show that convexity is the crucial property that enables influence curves to imply DR. We then derive necessary and sufficient conditions for the existence of DR estimators under a mean squared differentiable path-connected parameterization.
Our main contribution also lies in the novel geometric interpretation of DR using information geometry. By leveraging concepts such as parallel transport, m-flatness, and m-curvature freeness, we characterize DR in terms of invariance along submanifolds. This geometric perspective deepens the understanding of when and why DR estimators exist.
The results not only resolve apparent mysteries surrounding DR but also have practical implications for the construction and analysis of DR estimators. The geometric insights open up new connections and directions for future research. Our findings aim to solidify the theoretical foundations of a fundamental concept and contribute to the broader understanding of robust estimation in statistics.
[4] arXiv:2404.14227 [pdf, ps, other]: Title: Estimation for SLS models: finite sample guarantees

Authors: Vladimir Spokoiny

Subjects: Statistics Theory (math.ST)

This note continues and extends the study from Spokoiny (2023a) about estimation for parametric models with possibly large or even infinite parameter dimension. We consider a special class of stochastically linear smooth (SLS) models satisfying three major conditions: the stochastic component of the log-likelihood is linear in the model parameter, while the expected log-likelihood is a smooth and concave function. For the penalized maximum likelihood estimators (pMLE), we establish several finite sample bounds about its concentration and large deviations as well as the Fisher and Wilks expansions and risk bounds. In all results, the remainder is given explicitly and can be evaluated in terms of the effective sample size $ n $ and effective parameter dimension $ \mathbb{p} $ which allows us to identify the so-called \emph{critical parameter dimension}. The results are also dimension and coordinate-free. Despite generality, all the presented bounds are nearly sharp and the classical asymptotic results can be obtained as simple corollaries. Our results indicate that the use of advanced fourth-order expansions allows to relax the critical dimension condition $ \mathbb{p}^{3} \ll n $ from Spokoiny (2023a) to $ \mathbb{p}^{3/2} \ll n $. Examples for classical models like logistic regression, log-density and precision matrix estimation illustrate the applicability of general results.
[5] arXiv:2404.14229 [pdf, other]: Title: Colored Stochastic Multiplicative Processes with Additive Noise Unveil a Third-Order PDE, Defying Conventional FPE and Fick-Law Paradigms

Authors: Marco Bianucci, Mauro Bologna, Riccardo Mannella

Comments: 25 pages, 4 figures

Subjects: Statistics Theory (math.ST); Statistical Mechanics (cond-mat.stat-mech); Probability (math.PR)

Research on stochastic differential equations (SDE) involving both additive and multiplicative noise has been extensive. In situations where the primary process is driven by a multiplicative stochastic process, additive white noise typically represents an intrinsic and unavoidable fast factor, including phenomena like thermal fluctuations, inherent uncertainties in measurement processes, or rapid wind forcing in ocean dynamics. This work focuses on a significant class of such systems, particularly those characterized by linear drift and multiplicative noise, extensively explored in the literature. Conventionally, multiplicative stochastic processes are also treated as white noise in existing studies. However, when considering colored multiplicative noise, the emphasis has been on characterizing the far tails of the probability density function (PDF), regardless of the spectral properties of the noise. In the absence of additive noise and with a general colored multiplicative SDE, standard perturbation approaches lead to a second-order PDE known as the Fokker-Planck Equation (FPE), consistent with Fick's law. This investigation unveils a notable departure from this standard behavior when introducing additive white noise. At the leading order of the stochastic process strength, perturbation approaches yield a \textit{third-order PDE}, irrespective of the white noise intensity. The breakdown of the FPE further signifies the breakdown of Fick's law. Additionally, we derive the explicit solution for the equilibrium PDF corresponding to this third-order PDE Master Equation. Through numerical simulations, we demonstrate significant deviations from outcomes derived using the FPE obtained through the application of Fick's law.

Cross-lists for Tue, 23 Apr 24

[6] arXiv:2404.13355 (cross-list from math.NA) [pdf, other]: Title: Extrapolation and generative algorithms for three applications in finance

Authors: Philippe G. LeFloch, Jean-Marc Mercier, Shohruh Miryusupov

Comments: 9 pages

Subjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)

For three applications of central interest in finance, we demonstrate the relevance of numerical algorithms based on reproducing kernel Hilbert space (RKHS) techniques. Three use cases are investigated. First, we show that extrapolating from few pricer examples leads to sufficiently accurate and computationally efficient results so that our algorithm can serve as a pricing framework. The second use case concerns reverse stress testing, which is formulated as an inversion function problem and is treated here via an optimal transport technique in combination with the notions of kernel-based encoders, decoders, and generators. Third, we show that standard techniques for time series analysis can be enhanced by using the proposed generative algorithms. Namely, we use our algorithm in order to extend the validity of any given quantitative model. Our approach allows for conditional analysis as well as for escaping the `Gaussian world'. This latter property is illustrated here with a portfolio investment strategy.
[7] arXiv:2404.14136 (cross-list from q-fin.ST) [pdf, ps, other]: Title: Elicitability and identifiability of tail risk measures

Authors: Tobias Fissler, Fangda Liu, Ruodu Wang, Linxiao Wei

Comments: 31 pages

Subjects: Statistical Finance (q-fin.ST); Statistics Theory (math.ST); Risk Management (q-fin.RM); Methodology (stat.ME)

Tail risk measures are fully determined by the distribution of the underlying loss beyond its quantile at a certain level, with Value-at-Risk and Expected Shortfall being prime examples. They are induced by law-based risk measures, called their generators, evaluated on the tail distribution. This paper establishes joint identifiability and elicitability results of tail risk measures together with the corresponding quantile, provided that their generators are identifiable and elicitable, respectively. As an example, we establish the joint identifiability and elicitability of the tail expectile together with the quantile. The corresponding consistent scores constitute a novel class of weighted scores, nesting the known class of scores of Fissler and Ziegel for the Expected Shortfall together with the quantile. For statistical purposes, our results pave the way to easier model fitting for tail risk measures via regression and the generalized method of moments, but also model comparison and model validation in terms of established backtesting procedures.

Replacements for Tue, 23 Apr 24

[8] arXiv:2302.06739 (replaced) [pdf, ps, other]: Title: Asymptotic Theory for Doubly Robust Estimators with Continuous-Time Nuisance Parameters

Authors: Andrew Ying

Subjects: Statistics Theory (math.ST)
[9] arXiv:2303.03649 (replaced) [pdf, ps, other]: Title: PanIC: consistent information criteria for general model selection problems

Authors: Hien Duy Nguyen

Subjects: Statistics Theory (math.ST)
[10] arXiv:2310.01374 (replaced) [pdf, other]: Title: Corrected generalized cross-validation for finite ensembles of penalized estimators

Authors: Pierre C. Bellec, Jin-Hong Du, Takuya Koriyama, Pratik Patil, Kai Tan

Comments: 91 pages, 34 figures; this version adds general proof outlines (in Sections 4.3 and 5.3), add more experiments with non-Gaussian data (in Sections D and E), relaxes an assumption (in Section A.7), clarifies explanations at several places, and corrects minor typos at several places

Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
[11] arXiv:2404.05006 (replaced) [pdf, other]: Title: High-dimensional bootstrap and asymptotic expansion

Authors: Yuta Koike

Comments: 60 pages, 1 figure, 2 tables. In Corollary 3.1, the sphericity of the covariance matrix was relaxed to the boundedness of the eigenvalues

Subjects: Statistics Theory (math.ST); Probability (math.PR)
[12] arXiv:2404.06850 (replaced) [src]: Title: From naive trees to Random Forests: A general approach for proving consistency of tree-based methods

Authors: Nico Föge, Markus Pauly, Lena Schmid, Marc Ditzhaus

Comments: Incorrect Proof

Subjects: Statistics Theory (math.ST)
[13] arXiv:2109.02204 (replaced) [pdf, other]: Title: On the edge eigenvalues of the precision matrices of nonstationary autoregressive processes

Authors: Junho Yang

Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[14] arXiv:2209.10128 (replaced) [pdf, other]: Title: Efficient Integrated Volatility Estimation in the Presence of Infinite Variation Jumps via Debiased Truncated Realized Variations

Authors: B. Cooper Boniece, José E. Figueroa-López, Yuchen Han

Comments: An earlier version of this manuscript was circulated under the title "Efficient Volatility Estimation for L\'evy Processes with Jumps of Unbounded Variation". The results therein were constrained to L\'evy processes, whereas here we consider a much larger class of It\^o semimartingales. arXiv admin note: text overlap with arXiv:2202.00877

Subjects: Econometrics (econ.EM); Statistics Theory (math.ST); Statistical Finance (q-fin.ST)
[15] arXiv:2303.08987 (replaced) [pdf, other]: Title: Generalized Score Matching

Authors: Jiazhen Xu, Janice L. Scealy, Andrew T. A. Wood, Tao Zou

Comments: arXiv admin note: substantial text overlap with arXiv:2203.09864

Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO)
[16] arXiv:2307.00017 (replaced) [pdf, ps, other]: Title: Explicit mutual information for simple networks and neurons with lognormal activities

Authors: Maurycy Chwiłka, Jan Karbowski

Comments: Corrected version: corrected and expanded discussion of neural mutual information (lognormal vs. gaussian distributions)

Journal-ref: Phys. Rev. E 109, 014117 (2024)

Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistics Theory (math.ST)
[17] arXiv:2311.07454 (replaced) [pdf, other]: Title: Discrete Nonparametric Causal Discovery Under Latent Class Confounding

Authors: Bijan Mazaheri, Spencer Gordon, Yuval Rabani, Leonard Schulman

Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Statistics Theory (math.ST)
[18] arXiv:2404.12696 (replaced) [pdf, other]: Title: Gaussian dependence structure pairwise goodness-of-fit testing based on conditional covariance and the 20/60/20 rule

Authors: Jakub Woźny, Piotr Jaworski, Damian Jelito, Marcin Pitera, Agnieszka Wyłomańska

Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

New submissions
Cross-lists
Replacements

[ total of 18 entries: 1-18 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2404, contact, help (Access key information)

> stat > stat.TH

Statistics Theory

New submissions

New submissions for Tue, 23 Apr 24

Cross-lists for Tue, 23 Apr 24

Replacements for Tue, 23 Apr 24