We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics Theory

New submissions

[ total of 14 entries: 1-14 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Thu, 26 May 22

[1]  arXiv:2205.12489 [pdf, other]
Title: Bayesian Multiscale Analysis of the Cox Model
Comments: 82 pages, 6 figures, 2 tables
Subjects: Statistics Theory (math.ST)

Piecewise constant priors are routinely used in the Bayesian Cox proportional hazards model for survival analysis. Despite its popularity, large sample properties of this Bayesian method are not yet well understood. This work provides a unified theory for posterior distributions in this setting, not requiring the priors to be conjugate. We first derive contraction rate results for wide classes of histogram priors on the unknown hazard function and prove asymptotic normality of linear functionals of the posterior hazard in the form of Bernstein--von Mises theorems. Second, using recently developed multiscale techniques, we derive functional limiting results for the cumulative hazard and survival function. Frequentist coverage properties of Bayesian credible sets are investigated: we prove that certain easily computable credible bands for the survival function are optimal frequentist confidence bands. We conduct simulation studies that confirm these predictions, with an excellent behavior particularly in finite samples, showing that even simplest possible Bayesian credible bands for the survival function can outperform state-of-the-art frequentist bands in terms of coverage.

[2]  arXiv:2205.12744 [pdf, ps, other]
Title: High dimensional Bernoulli distributions: algebraic representation and applications
Subjects: Statistics Theory (math.ST)

The main contribution of this paper is to find a representation of the class $\mathcal{F}_d(p)$ of multivariate Bernoulli distributions with the same mean $p$ that allows us to find its generators analytically in any dimension. We map $\mathcal{F}_d(p)$ to an ideal of points and we prove that the class $\mathcal{F}_d(p)$ can be generated from a finite set of simple polynomials. We present two applications. Firstly, we show that polynomial generators help to find extremal points of the convex polytope $\mathcal{F}_d(p)$ in high dimensions. Secondly, we solve the problem of determining the lower bounds in the convex order for sums of multivariate Bernoulli distributions with given margins, but with an unspecified dependence structure.

[3]  arXiv:2205.12924 [pdf, ps, other]
Title: Clustering consistency with Dirichlet process mixtures
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.

[4]  arXiv:2205.12937 [pdf, other]
Title: Mitigating multiple descents: A model-agnostic framework for risk monotonization
Comments: 110 pages, 15 figures
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for risk monotonization based on cross-validation that takes as input a generic prediction procedure and returns a modified procedure whose out-of-sample prediction risk is, asymptotically, monotonic in the limiting aspect ratio. As part of our framework, we propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting, respectively, and show that, under very mild assumptions, they provably achieve monotonic asymptotic risk behavior. Our results are applicable to a broad variety of prediction procedures and loss functions, and do not require a well-specified (parametric) model. We exemplify our framework with concrete analyses of the minimum $\ell_2$, $\ell_1$-norm least squares prediction procedures. As one of the ingredients in our analysis, we also derive novel additive and multiplicative forms of oracle risk inequalities for split cross-validation that are of independent interest.

Cross-lists for Thu, 26 May 22

[5]  arXiv:2205.12431 (cross-list from stat.ME) [pdf, other]
Title: Detecting Abrupt Changes in Sequential Pairwise Comparison Data
Comments: 31 pages, 2 figures, 2 tables
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

The Bradley-Terry-Luce (BTL) model is a classic and very popular statistical approach for eliciting a global ranking among a collection of items using pairwise comparison data. In applications in which the comparison outcomes are observed as a time series, it is often the case that data are non-stationary, in the sense that the true underlying ranking changes over time. In this paper we are concerned with localizing the change points in a high-dimensional BTL model with piece-wise constant parameters. We propose novel and practicable algorithms based on dynamic programming that can consistently estimate the unknown locations of the change points. We provide consistency rates for our methodology that depend explicitly on the model parameters, the temporal spacing between two consecutive change points and the magnitude of the change. We corroborate our findings with extensive numerical experiments and a real-life example.

[6]  arXiv:2205.12695 (cross-list from stat.ML) [pdf, other]
Title: Surprises in adversarially-trained linear regression
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Signal Processing (eess.SP); Statistics Theory (math.ST)

State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is one of the most effective approaches to defend against such examples. We show that for linear regression problems, adversarial training can be formulated as a convex problem. This fact is then used to show that $\ell_\infty$-adversarial training produces sparse solutions and has many similarities to the lasso method. Similarly, $\ell_2$-adversarial training has similarities with ridge regression. We use a robust regression framework to analyze and understand these similarities and also point to some differences. Finally, we show how adversarial training behaves differently from other regularization methods when estimating overparameterized models (i.e., models with more parameters than datapoints). It minimizes a sum of three terms which regularizes the solution, but unlike lasso and ridge regression, it can sharply transition into an interpolation mode. We show that for sufficiently many features or sufficiently small regularization parameters, the learned model perfectly interpolates the training data while still exhibiting good out-of-sample performance.

Replacements for Thu, 26 May 22

[7]  arXiv:2003.03886 (replaced) [pdf, other]
Title: Divided Differences, Falling Factorials, and Discrete Splines: Another Look at Trend Filtering and Related Problems
Comments: 75 pages, 9 figures; 1 table
Subjects: Statistics Theory (math.ST); Numerical Analysis (math.NA); Methodology (stat.ME)
[8]  arXiv:2003.13208 (replaced) [pdf, other]
Title: Minimax optimality of permutation tests
Comments: Typo in Eq.(38) is fixed
Subjects: Statistics Theory (math.ST)
[9]  arXiv:2008.09787 (replaced) [pdf, ps, other]
Title: Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces
Comments: To appear in Communications in Statistics - Theory and Methods
Subjects: Statistics Theory (math.ST)
[10]  arXiv:2106.09387 (replaced) [pdf, other]
Title: Taming Nonconvexity in Kernel Feature Selection -- Favorable Properties of the Laplace Kernel
Comments: 26 pages main text; 74 pages total; appendix rewritten (typo fixed; proof structure reorganized)
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
[11]  arXiv:2107.01120 (replaced) [pdf, ps, other]
Title: Asymptotic Analysis of Statistical Estimators related to MultiGraphex Processes under Misspecification
Subjects: Statistics Theory (math.ST)
[12]  arXiv:2104.08279 (replaced) [pdf, other]
Title: Testing for Outliers with Conformal p-values
Comments: Revision May 24, 2022: added "asymptotic" and "Monte Carlo" conditional calibration methods; added power analyses; updated numerical experiments to include new methods
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
[13]  arXiv:2111.15546 (replaced) [pdf, ps, other]
Title: Black box tests for algorithmic stability
Comments: 26 pages. Updates to Section 2.1.1 and Sections B.1 & B.2
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)
[14]  arXiv:2201.11211 (replaced) [pdf, other]
Title: Learning Mixtures of Linear Dynamical Systems
Comments: Accepted to ICML 2022. arXiv v2 update: add references and experiments
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY); Statistics Theory (math.ST)
[ total of 14 entries: 1-14 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, math, recent, 2205, contact, help  (Access key information)