Statistics Theory
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Tue, 19 Mar 24
- [1] arXiv:2403.11400 [pdf, other]
-
Title: Spatially Randomized Designs Can Enhance Policy EvaluationSubjects: Statistics Theory (math.ST)
This article studies the benefits of using spatially randomized experimental designs which partition the experimental area into distinct, non-overlapping units with treatments assigned randomly. Such designs offer improved policy evaluation in online experiments by providing more precise policy value estimators and more effective A/B testing algorithms than traditional global designs, which apply the same treatment across all units simultaneously. We examine both parametric and nonparametric methods for estimating and inferring policy values based on this randomized approach. Our analysis includes evaluating the mean squared error of the treatment effect estimator and the statistical power of the associated tests. Additionally, we extend our findings to experiments with spatio-temporal dependencies, where treatments are allocated sequentially over time, and account for potential temporal carryover effects. Our theoretical insights are supported by comprehensive numerical experiments.
- [2] arXiv:2403.11489 [pdf, other]
-
Title: New energy distances for statistical inference on infinite dimensional Hilbert spaces without moment conditionsComments: 73 pages, 3 figuresSubjects: Statistics Theory (math.ST)
For statistical inference on an infinite-dimensional Hilbert space $\H $ with no moment conditions we introduce a new class of energy distances on the space of probability measures on $\H$. The proposed distances consist of the integrated squared modulus of the corresponding difference of the characteristic functionals with respect to a reference probability measure on the Hilbert space. Necessary and sufficient conditions are established for the reference probability measure to be {\em characteristic}, the property that guarantees that the distance defines a metric on the space of probability measures on $\H$. We also use these results to define new distance covariances, which can be used to measure the dependence between the marginals of a two dimensional distribution of $\H^2$ without existing moments.
On the basis of the new distances we develop statistical inference for Hilbert space valued data, which does not require any moment assumptions. As a consequence, our methods are robust with respect to heavy tails in finite dimensional data. In particular, we consider the problem of comparing the distributions of two samples and the problem of testing for independence and construct new minimax optimal tests for the corresponding hypotheses. We also develop aggregated (with respect to the reference measure) procedures for power enhancement and investigate the finite-sample properties by means of a simulation study. - [3] arXiv:2403.11704 [pdf, ps, other]
-
Title: Sharp phase transitions in high-dimensional changepoint detectionSubjects: Statistics Theory (math.ST)
We study a hypothesis testing problem in the context of high-dimensional changepoint detection. Given a matrix $X \in \mathbb{R}^{p \times n}$ with independent Gaussian entries, the goal is to determine whether or not a sparse, non-null fraction of rows in $X$ exhibits a shift in mean at a common index between $1$ and $n$. We focus on three aspects of this problem: the sparsity of non-null rows, the presence of a single, common changepoint in the non-null rows, and the signal strength associated with the changepoint. Within an asymptotic regime relating the data dimensions $n$ and $p$ to the signal sparsity and strength, we characterize the information-theoretic limits of the testing problem by a formula that determines whether the sum of Type I and II errors tends to zero or is bounded away from zero. The formula, called the \emph{detection boundary}, is a curve that separates the parameter space into a detectable region and an undetectable region. We show that a Berk--Jones type test statistic can detect the presence of a sparse non-null fraction of rows, and does so adaptively throughout the detectable region. Conversely, within the undetectable region, no test is able to consistently distinguish the signal from noise.
- [4] arXiv:2403.11860 [pdf, other]
-
Title: Flexible control function approach under different types of dependent censoringSubjects: Statistics Theory (math.ST)
In this paper, we consider the problem of estimating the causal effect of an endogenous variable $Z$ on a survival time $T$ that can be subject to different types of dependent censoring. Firstly, we extend the current literature by simultaneously allowing for both independent ($A$) and dependent ($C$) censoring. Moreover, we have different parametric transformations for $T$ and $C$ that result in a more additive structure with approximately normal and homoscedastic error terms. The model is shown to be identified and a two-step estimation method is specified. It is shown that this estimator results in consistent and asymptotically normal estimates. Secondly, a goodness-of-fit test is developed to check the model's validity. To estimate the distribution of the statistic, a parametric bootstrap approach is used. Lastly, we show how the model naturally extends to a competing risks setting. Simulations are used to evaluate the finite-sample performance of the proposed methods and approaches. Moreover, we investigate two data applications regarding the effect of job training programs on unemployment duration and the effect of periodic screenings on breast cancer mortality rates.
- [5] arXiv:2403.12012 [pdf, other]
-
Title: Convergence of Kinetic Langevin Monte Carlo on Lie groupsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Numerical Analysis (math.NA); Probability (math.PR); Machine Learning (stat.ML)
Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the momentum variable is Euclidean despite that the potential function lives on a manifold. We then propose a Lie-group MCMC sampler, by delicately discretizing the resulting kinetic-Langevin-type sampling dynamics. The Lie group structure is exactly preserved by this discretization. Exponential convergence with explicit convergence rate for both the continuous dynamics and the discrete sampler are then proved under W2 distance. Only compactness of the Lie group and geodesically L-smoothness of the potential function are needed. To the best of our knowledge, this is the first convergence result for kinetic Langevin on curved spaces, and also the first quantitative result that requires no convexity or, at least not explicitly, any common relaxation such as isoperimetry.
Cross-lists for Tue, 19 Mar 24
- [6] arXiv:2403.10711 (cross-list from math.PR) [pdf, ps, other]
-
Title: Gaussian universality for approximately polynomial functions of high-dimensional dataSubjects: Probability (math.PR); Statistics Theory (math.ST)
We establish an invariance principle for polynomial functions of $n$ independent high-dimensional random vectors, and also show that the obtained rates are nearly optimal. Both the dimension of the vectors and the degree of the polynomial are permitted to grow with $n$. Specifically, we obtain a finite sample upper bound for the error of approximation by a polynomial of Gaussians, measured in Kolmogorov distance, and extend it to functions that are approximately polynomial in a mean squared error sense. We give a corresponding lower bound that shows the invariance principle holds up to polynomial degree $o(\log n)$. The proof is constructive and adapts an asymmetrisation argument due to V. V. Senatov. As applications, we obtain a higher-order delta method with possibly non-Gaussian limits, and generalise a number of known results on high-dimensional and infinite-order U-statistics, and on fluctuations of subgraph counts.
- [7] arXiv:2403.11013 (cross-list from cs.LG) [pdf, other]
-
Title: Improved Algorithm and Bounds for Successive ProjectionComments: 32 pages, 5 figuresSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
Given a $K$-vertex simplex in a $d$-dimensional space, suppose we measure $n$ points on the simplex with noise (hence, some of the observed points fall outside the simplex). Vertex hunting is the problem of estimating the $K$ vertices of the simplex. A popular vertex hunting algorithm is successive projection algorithm (SPA). However, SPA is observed to perform unsatisfactorily under strong noise or outliers. We propose pseudo-point SPA (pp-SPA). It uses a projection step and a denoise step to generate pseudo-points and feed them into SPA for vertex hunting. We derive error bounds for pp-SPA, leveraging on extreme value theory of (possibly) high-dimensional random vectors. The results suggest that pp-SPA has faster rates and better numerical performances than SPA. Our analysis includes an improved non-asymptotic bound for the original SPA, which is of independent interest.
- [8] arXiv:2403.11163 (cross-list from stat.ME) [pdf, ps, other]
-
Title: A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch TechniquesAuthors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, Jing Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng WangSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)
This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first class of literature is about distributed computing and focuses on the situation, where the dataset size is too huge to be comfortably handled by one single computer. In this case, a distributed computation system with multiple computers has to be utilized. The second class of literature is about subsampling methods and concerns about the situation, where the sample size of dataset is small enough to be placed on one single computer but too large to be easily processed by its memory as a whole. The last class of literature studies those minibatch gradient related optimization techniques, which have been extensively used for optimizing various deep learning models.
- [9] arXiv:2403.11175 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Prior-dependent analysis of posterior sampling reinforcement learning with function approximationComments: Published in the 27th International Conference on Artificial Intelligence and Statistics (AISTATS)Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)
This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine the Bayesian regret analysis for posterior sampling reinforcement learning (PSRL), presenting an upper bound of ${\mathcal{O}}(d\sqrt{H^3 T \log T})$, where $d$ represents the dimensionality of the transition kernel, $H$ the planning horizon, and $T$ the total number of interactions. This signifies a methodological enhancement by optimizing the $\mathcal{O}(\sqrt{\log T})$ factor over the previous benchmark (Osband and Van Roy, 2014) specified to linear mixture MDPs. Our approach, leveraging a value-targeted model learning perspective, introduces a decoupling argument and a variance reduction technique, moving beyond traditional analyses reliant on confidence sets and concentration inequalities to formalize Bayesian regret bounds more effectively.
- [10] arXiv:2403.11343 (cross-list from cs.LG) [pdf, other]
-
Title: Federated Transfer Learning with Differential PrivacyComments: 76 pages, 3 figuresSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Federated learning is gaining increasing popularity, with data heterogeneity and privacy being two prominent challenges. In this paper, we address both issues within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of \textit{federated differential privacy}, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy constraint, we study three classical statistical problems, namely univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and identifying the costs of privacy for these problems, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses incorporate data heterogeneity and privacy, highlighting the fundamental costs of both in federated learning and underscoring the benefit of knowledge transfer across data sets.
- [11] arXiv:2403.11356 (cross-list from stat.ME) [pdf, other]
-
Title: Multiscale Quantile Regression with Local Error ControlComments: The implementation is in R package muscle, available at \url{this https URL}Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
For robust and efficient detection of change points, we introduce a novel methodology MUSCLE (multiscale quantile segmentation controlling local error) that partitions serial data into multiple segments, each sharing a common quantile. It leverages multiple tests for quantile changes over different scales and locations, and variational estimation. Unlike the often adopted global error control, MUSCLE focuses on local errors defined on individual segments, significantly improving detection power in finding change points. Meanwhile, due to the built-in model complexity penalty, it enjoys the finite sample guarantee that its false discovery rate (or the expected proportion of falsely detected change points) is upper bounded by its unique tuning parameter. Further, we obtain the consistency and the localisation error rates in estimating change points, under mild signal-to-noise-ratio conditions. Both match (up to log factors) the minimax optimality results in the Gaussian setup. All theories hold under the only distributional assumption of serial independence. Incorporating the wavelet tree data structure, we develop an efficient dynamic programming algorithm for computing MUSCLE. Extensive simulations as well as real data applications in electrophysiology and geophysics demonstrate its competitiveness and effectiveness. An implementation via R package muscle is available from GitHub.
- [12] arXiv:2403.11374 (cross-list from math.NA) [pdf, other]
-
Title: Quasi-Monte Carlo and importance sampling methods for Bayesian inverse problemsSubjects: Numerical Analysis (math.NA); Statistics Theory (math.ST)
Importance Sampling (IS), an effective variance reduction strategy in Monte Carlo (MC) simulation, is frequently utilized for Bayesian inference and other statistical challenges. Quasi-Monte Carlo (QMC) replaces the random samples in MC with low discrepancy points and has the potential to substantially enhance error rates. In this paper, we integrate IS with a randomly shifted rank-1 lattice rule, a widely used QMC method, to approximate posterior expectations arising from Bayesian Inverse Problems (BIPs) where the posterior density tends to concentrate as the intensity of noise diminishes. Within the framework of weighted Hilbert spaces, we first establish the convergence rate of the lattice rule for a large class of unbounded integrands. This method extends to the analysis of QMC combined with IS in BIPs. Furthermore, we explore the robustness of the IS-based randomly shifted rank-1 lattice rule by determining the quadrature error rate with respect to the noise level. The effects of using Gaussian distributions and $t$-distributions as the proposal distributions on the error rate of QMC are comprehensively investigated. We find that the error rate may deteriorate at low intensity of noise when using improper proposals, such as the prior distribution. To reclaim the effectiveness of QMC, we propose a new IS method such that the lattice rule with $N$ quadrature points achieves an optimal error rate close to $O(N^{-1})$, which is insensitive to the noise level. Numerical experiments are conducted to support the theoretical results.
- [13] arXiv:2403.11438 (cross-list from stat.ME) [pdf, other]
-
Title: Models of linkage error for capture-recapture estimation without clerical reviewsComments: 42 pagesSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
The capture-recapture method can be applied to measure the coverage of administrative and big data sources, in official statistics. In its basic form, it involves the linkage of two sources while assuming a perfect linkage and other standard assumptions. In practice, linkage errors arise and are a potential source of bias, where the linkage is based on quasi-identifiers. These errors include false positives and false negatives, where the former arise when linking a pair of records from different units, and the latter arise when not linking a pair of records from the same unit. So far, the existing solutions have resorted to costly clerical reviews, or they have made the restrictive conditional independence assumption. In this work, these requirements are relaxed by modeling the number of links from a record instead. The same approach may be taken to estimate the linkage accuracy without clerical reviews, when linking two sources that each have some undercoverage.
- [14] arXiv:2403.11564 (cross-list from stat.ME) [pdf, other]
-
Title: Spatio-temporal point process intensity estimation using zero-deflated subsampling applied to a lightning strikes dataset in FranceAuthors: Jean-François Coeurjolly (LJK, SVH), Anne-Laure Fougères (ICJ, MODAL'X), Thibault Espinasse (PSPM, UCBL), Mathieu Ribatet (I3M)Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
Cloud-to-ground lightning strikes observed in a specific geographical domain over time can be naturally modeled by a spatio-temporal point process. Our focus lies in the parametric estimation of its intensity function, incorporating both spatial factors (such as altitude) and spatio-temporal covariates (such as field temperature, precipitation, etc.). The events are observed in France over a span of three years. Spatio-temporal covariates are observed with resolution $0.1^\circ \times 0.1^\circ$ ($\approx 100$km$^2$) and six-hour periods. This results in an extensive dataset, further characterized by a significant excess of zeroes (i.e., spatio-temporal cells with no observed events). We reexamine composite likelihood methods commonly employed for spatial point processes, especially in situations where covariates are piecewise constant. Additionally, we extend these methods to account for zero-deflated subsampling, a strategy involving dependent subsampling, with a focus on selecting more cells in regions where events are observed. A simulation study is conducted to illustrate these novel methodologies, followed by their application to the dataset of lightning strikes.
- [15] arXiv:2403.11954 (cross-list from stat.ME) [pdf, other]
-
Title: Robust Estimation and Inference in Categorical DataAuthors: Max WelzComments: 63 pages, 7 figures, 6 tablesSubjects: Methodology (stat.ME); Econometrics (econ.EM); Statistics Theory (math.ST)
In empirical science, many variables of interest are categorical. Like any model, models for categorical responses can be misspecified, leading to possibly large biases in estimation. One particularly troublesome source of misspecification is inattentive responding in questionnaires, which is well-known to jeopardize the validity of structural equation models (SEMs) and other survey-based analyses. I propose a general estimator that is designed to be robust to misspecification of models for categorical responses. Unlike hitherto approaches, the estimator makes no assumption whatsoever on the degree, magnitude, or type of misspecification. The proposed estimator generalizes maximum likelihood estimation, is strongly consistent, asymptotically Gaussian, has the same time complexity as maximum likelihood, and can be applied to any model for categorical responses. In addition, I develop a novel test that tests whether a given response can be fitted well by the assumed model, which allows one to trace back possible sources of misspecification. I verify the attractive theoretical properties of the proposed methodology in Monte Carlo experiments, and demonstrate its practical usefulness in an empirical application on a SEM of personality traits, where I find compelling evidence for the presence of inattentive responding whose adverse effects the proposed estimator can withstand, unlike maximum likelihood.
- [16] arXiv:2403.11963 (cross-list from cs.LG) [pdf, other]
-
Title: Transfer Learning Beyond Bounded Density RatiosComments: Abstract shortened to fit ArXiv requirementsSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution $P$ but needs to perform well with respect to a different target distribution $Q$. A standard change of measure argument implies that transfer learning happens when the density ratio $dQ/dP$ is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio $dQ/dP$ is unbounded, but transfer learning is possible.
In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain $\mathbb{R}^n$, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that $dQ/dP$ is bounded. For instance, it always applies if $Q$ is a log-concave measure and the inverse ratio $dP/dQ$ is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where $dQ/dP$ equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube $\{-1,1\}^n$, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator $\widehat{f}-f^*$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^*)$, acts as a sufficient condition for transferability; when $\mathrm{I}_{\max}(\widehat{f}-f^*)$ is appropriately bounded, transfer is possible over the Boolean domain. - [17] arXiv:2403.11968 (cross-list from cs.LG) [pdf, other]
-
Title: Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical TheoryComments: 92 pages, 5 figuresSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. In these applications, conditional diffusion models incorporate various conditional information, such as prompt input, to guide the sample generation towards desired properties. Despite the empirical success, theory of conditional diffusion models is largely missing. This paper bridges this gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models. Our analysis yields a sample complexity bound that adapts to the smoothness of the data distribution and matches the minimax lower bound. The key to our theoretical development lies in an approximation result for the conditional score function, which relies on a novel diffused Taylor approximation technique. Moreover, we demonstrate the utility of our statistical theory in elucidating the performance of conditional diffusion models across diverse applications, including model-based transition kernel estimation in reinforcement learning, solving inverse problems, and reward conditioned sample generation.
Replacements for Tue, 19 Mar 24
- [18] arXiv:2210.08571 (replaced) [pdf, other]
-
Title: Dimension free ridge regressionComments: 86 pages, 3 figuresSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
- [19] arXiv:2303.16843 (replaced) [pdf, other]
-
Title: An Optimal Design Framework for Lasso Sign RecoverySubjects: Statistics Theory (math.ST)
- [20] arXiv:2108.06713 (replaced) [pdf, other]
-
Title: Channel State Acquisition in Uplink NOMA for Cellular-Connected UAV: Exploitation of Doppler and Modulation DiversitiesComments: 15 pages, 7 figures, submitted to IEEE Open Journal of the Communications SocietySubjects: Signal Processing (eess.SP); Statistics Theory (math.ST)
- [21] arXiv:2206.12525 (replaced) [pdf, other]
-
Title: Causality for Complex Continuous-time Functional Longitudinal StudiesAuthors: Andrew YingSubjects: Methodology (stat.ME); Probability (math.PR); Statistics Theory (math.ST)
- [22] arXiv:2209.08307 (replaced) [pdf, ps, other]
-
Title: A review of predictive uncertainty estimation with machine learningComments: 89 pages, 5 figuresJournal-ref: Artificial Intelligence Review 57(94) (2024)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
- [23] arXiv:2210.03617 (replaced) [pdf, other]
-
Title: Type $1$, $2$, $3$ and $4$ $q$-negative binomial distribution of order $k$Authors: Jungtaek OhComments: arXiv admin note: text overlap with arXiv:2206.13053Subjects: Probability (math.PR); Combinatorics (math.CO); Statistics Theory (math.ST); Other Statistics (stat.OT)
- [24] arXiv:2312.12741 (replaced) [pdf, other]
-
Title: Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown VariancesAuthors: Masahiro KatoSubjects: Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, math, recent, 2403, contact, help (Access key information)