We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics Theory

New submissions

[ total of 27 entries: 1-27 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 19 Oct 21

[1]  arXiv:2110.08523 [pdf, ps, other]
Title: Spectral measures of empirical autocovariance matrices of high dimensional Gaussian stationary processes
Subjects: Statistics Theory (math.ST); Probability (math.PR)

Consider the empirical autocovariance matrix at a given non-zero time lag based on observations from a multivariate complex Gaussian stationary time series. The spectral analysis of these autocovariance matrices can be useful in certain statistical problems, such as those related to testing for white noise. We study the behavior of their spectral measures in the asymptotic regime where the time series dimension and the observation window length both grow to infinity, and at the same rate. Following a general framework in the field of the spectral analysis of large random non-Hermitian matrices, at first the probabilistic behavior of the small singular values of the shifted versions of the autocovariance matrix are obtained. This is then used to infer about the large sample behaviour of the empirical spectral measure of the autocovariance matrices at any lag. Matrix orthogonal polynomials on the unit circle play a crucial role in our study.

[2]  arXiv:2110.08766 [pdf, ps, other]
Title: On minimax estimation problem for stationary stochastic sequences from observations in special sets of points
Comments: arXiv admin note: text overlap with arXiv:1804.08408
Subjects: Statistics Theory (math.ST)

The problem of the mean-square optimal estimation of the linear functionals which depend on the unknown values of a stochastic stationary sequence from observations of the sequence in special sets of points is considered. Formulas for calculating the mean-square error and the spectral characteristic of the optimal linear estimate of the functionals are derived under the condition of spectral certainty, where the spectral density of the sequence is exactly known. The minimax (robust) method of estimation is applied in the case where the spectral density of the sequence is not known exactly while some sets of admissible spectral densities are given. Formulas that determine the least favourable spectral densities and the minimax spectral characteristics are derived for some special sets of admissible densities.

[3]  arXiv:2110.09042 [pdf, other]
Title: Kernel-based estimation for partially functional linear model: Minimax rates and randomized sketches
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)

This paper considers the partially functional linear model (PFLM) where all predictive features consist of a functional covariate and a high dimensional scalar vector. Over an infinite dimensional reproducing kernel Hilbert space, the proposed estimation for PFLM is a least square approach with two mixed regularizations of a function-norm and an $\ell_1$-norm. Our main task in this paper is to establish the minimax rates for PFLM under high dimensional setting, and the optimal minimax rates of estimation is established by using various techniques in empirical process theory for analyzing kernel classes. In addition, we propose an efficient numerical algorithm based on randomized sketches of the kernel matrix. Several numerical experiments are implemented to support our method and optimization strategy.

[4]  arXiv:2110.09333 [pdf, other]
Title: Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)

In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and describe our new algorithm performanceas well as its algorithmic complexity. A variety of missing value mechanisms (such as MCAR,MAR, MNAR) are considered and simulated. We study the quadratic errors and the bias ofour algorithm and compare it to the most popular missing values random forests algorithms inthe literature. In particular, we compare those techniques for both a regression and predictionpurpose. This work follows a first paper Gomez-Mendez and Joly (2020) on the consistency ofthis new algorithm.

[5]  arXiv:2110.09502 [pdf, other]
Title: Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple descent
Authors: Yue Li, Yuting Wei
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators -- the ones that achieve zero training error -- may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum $\ell_{1}$-norm interpolator, which is motivated by the observation that several learning algorithms favor low $\ell_1$-norm solutions in the over-parameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and high-dimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size).
We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon; that is, the generalization risk of the minimum $\ell_1$-norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum $\ell_1$-norm interpolator as well as the delicate interplay between the over-parameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum $\ell_2$-norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.

Cross-lists for Tue, 19 Oct 21

[6]  arXiv:2110.08348 (cross-list from q-bio.PE) [pdf, other]
Title: Estimating individual admixture from finite reference databases
Comments: 17 pages, 3 figures
Subjects: Populations and Evolution (q-bio.PE); Statistics Theory (math.ST)

The concept of individual admixture (IA) assumes that the genome of individuals is composed of alleles inherited from $K$ ancestral populations. Each copy of each allele has the same chance $q_k$ to originate from population $k$, and together with the allele frequencies in all populations $p$ comprises the admixture model, which is the basis for software like {\sc STRUCTURE} and {\sc ADMIXTURE}. Here, we assume that $p$ is given through a finite reference database, and $q$ is estimated via maximum likelihood. Above all, we are interested in efficient estimation of $q$, and the variance of the estimator which originates from finiteness of the reference database, i.e.\ a variance in $p$. We provide a central limit theorem for the maximum-likelihood estimator, give simulation results, and discuss applications in forensic genetics.

[7]  arXiv:2110.08500 (cross-list from stat.ML) [pdf, other]
Title: On Model Selection Consistency of Lasso for High-Dimensional Ising Models on Tree-like Graphs
Comments: 30 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We consider the problem of high-dimensional Ising model selection using neighborhood-based least absolute shrinkage and selection operator (Lasso). It is rigorously proved that under some mild coherence conditions on the population covariance matrix of the Ising model, consistent model selection can be achieved with sample sizes $n=\Omega{(d^3\log{p})}$ for any tree-like graph in the paramagnetic phase, where $p$ is the number of variables and $d$ is the maximum node degree. When the same conditions are imposed directly on the sample covariance matrices, it is shown that a reduced sample size $n=\Omega{(d^2\log{p})}$ suffices. The obtained sufficient conditions for consistent model selection with Lasso are the same in the scaling of the sample complexity as that of $\ell_1$-regularized logistic regression. Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.

[8]  arXiv:2110.08570 (cross-list from stat.ME) [pdf, other]
Title: A Reduced-Bias Weighted least square estimation of the Extreme Value Index
Comments: 24 pages
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)

In this paper, we propose a reduced-bias estimator of the EVI for Pareto-type tails (heavy-tailed) distributions. This is derived using the weighted least squares method. It is shown that the estimator is unbiased, consistent and asymptotically normal under the second-order conditions on the underlying distribution of the data. The finite sample properties of the proposed estimator are studied through a simulation study. The results show that it is competitive to the existing estimators of the extreme value index in terms of bias and Mean Square Error. In addition, it yields estimates of $\gamma>0$ that are less sensitive to the number of top-order statistics, and hence, can be used for selecting an optimal tail fraction. The proposed estimator is further illustrated using practical datasets from pedochemical and insurance.

[9]  arXiv:2110.08665 (cross-list from stat.ME) [pdf, other]
Title: Quantile Regression by Dyadic CART
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

In this paper we propose and study a version of the Dyadic Classification and Regression Trees (DCART) estimator from Donoho (1997) for (fixed design) quantile regression in general dimensions. We refer to this proposed estimator as the QDCART estimator. Just like the mean regression version, we show that a) a fast dynamic programming based algorithm with computational complexity $O(N \log N)$ exists for computing the QDCART estimator and b) an oracle risk bound (trading off squared error and a complexity parameter of the true signal) holds for the QDCART estimator. This oracle risk bound then allows us to demonstrate that the QDCART estimator enjoys adaptively rate optimal estimation guarantees for piecewise constant and bounded variation function classes. In contrast to existing results for the DCART estimator which requires subgaussianity of the error distribution, for our estimation guarantees to hold we do not need any restrictive tail decay assumptions on the error distribution. For instance, our results hold even when the error distribution has no first moment such as the Cauchy distribution. Apart from the Dyadic CART method, we also consider other variant methods such as the Optimal Regression Tree (ORT) estimator introduced in Chatterjee and Goswami (2019). In particular, we also extend the ORT estimator to the quantile setting and establish that it enjoys analogous guarantees. Thus, this paper extends the scope of these globally optimal regression tree based methodologies to be applicable for heavy tailed data. We then perform extensive numerical experiments on both simulated and real data which illustrate the usefulness of the proposed methods.

[10]  arXiv:2110.08884 (cross-list from stat.ML) [pdf, other]
Title: Persuasion by Dimension Reduction
Comments: arXiv admin note: text overlap with arXiv:2102.10909
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); General Economics (econ.GN); Statistics Theory (math.ST); Methodology (stat.ME)

How should an agent (the sender) observing multi-dimensional data (the state vector) persuade another agent to take the desired action? We show that it is always optimal for the sender to perform a (non-linear) dimension reduction by projecting the state vector onto a lower-dimensional object that we call the "optimal information manifold." We characterize geometric properties of this manifold and link them to the sender's preferences. Optimal policy splits information into "good" and "bad" components. When the sender's marginal utility is linear, revealing the full magnitude of good information is always optimal. In contrast, with concave marginal utility, optimal information design conceals the extreme realizations of good information and only reveals its direction (sign). We illustrate these effects by explicitly solving several multi-dimensional Bayesian persuasion problems.

[11]  arXiv:2110.08905 (cross-list from stat.AP) [pdf, other]
Title: Exploitation of error correlation in a large analysis validation: GlobCurrent case study
Comments: 24 pages, 14 figures
Journal-ref: Remote Sens. Environ., 217, 476-490 (2018)
Subjects: Applications (stat.AP); Statistics Theory (math.ST); Methodology (stat.ME)

An assessment of variance in ocean current signal and noise shared by in situ observations (drifters) and a large gridded analysis (GlobCurrent) is sought as a function of day of the year for 1993-2015 and across a broad spectrum of current speed. Regardless of the division of collocations, it is difficult to claim that any synoptic assessment can be based on independent observations. Instead, a measurement model that departs from ordinary linear regression by accommodating error correlation is proposed. The interpretation of independence is explored by applying Fuller's (1987) concept of equation and measurement error to a division of error into shared (correlated) and unshared (uncorrelated) components, respectively. The resulting division of variance in the new model favours noise. Ocean current shared (equation) error is of comparable magnitude to unshared (measurement) error and the latter is, for GlobCurrent and drifters respectively, comparable to ordinary and reverse linear regression. Although signal variance appears to be small, its utility as a measure of agreement between two variates is highlighted.
Sparse collocations that sample a dense grid permit a first order autoregressive form of measurement model to be considered, including parameterizations of analysis-in situ error cross-correlation and analysis temporal error autocorrelation. The former (cross-correlation) is an equation error term that accommodates error shared by both GlobCurrent and drifters. The latter (autocorrelation) facilitates an identification and retrieval of all model parameters. Solutions are sought using a prescribed calibration between GlobCurrent and drifters (by variance matching). Because the true current variance of GlobCurrent and drifters is small, signal to noise ratio is near zero at best. This is particularly evident for moderate current speed and meridional current component.

[12]  arXiv:2110.08969 (cross-list from stat.AP) [pdf, ps, other]
Title: On completing a measurement model by symmetry
Comments: 4 pages
Subjects: Applications (stat.AP); Statistics Theory (math.ST); Methodology (stat.ME)

An appeal for symmetry is made to build established notions of specific representation and specific nonlinearity of measurement (often called model error) into a canonical linear regression model. Additive components are derived from the trivially complete model M = m. Factor analysis and equation error motivate corresponding notions of representation and nonlinearity in an errors-in-variables framework, with a novel interpretation of terms. It is suggested that a modern interpretation of correlation involves both linear and nonlinear association.

Replacements for Tue, 19 Oct 21

[13]  arXiv:2008.08275 (replaced) [pdf, other]
Title: Asymptotic Analysis for Data-Driven Inventory Policies
Subjects: Statistics Theory (math.ST)
[14]  arXiv:2104.14023 (replaced) [pdf, ps, other]
Title: Measuring dependence between random vectors via optimal transport
Subjects: Statistics Theory (math.ST)
[15]  arXiv:2106.09769 (replaced) [pdf, other]
Title: Generalized regression operator estimation for continuous time functional data processes with missing at random response
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
[16]  arXiv:1904.11060 (replaced) [pdf, ps, other]
Title: Normal Approximation in Large Network Models
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST)
[17]  arXiv:2003.10323 (replaced) [pdf, ps, other]
Title: Monte Carlo integration of non-differentiable functions on $[0,1]^ι$, $ι=1,\dots,d$, using a single determinantal point pattern defined on $[0,1]^d$
Subjects: Computation (stat.CO); Classical Analysis and ODEs (math.CA); Numerical Analysis (math.NA); Statistics Theory (math.ST)
[18]  arXiv:2007.04803 (replaced) [pdf, other]
Title: A Global Stochastic Optimization Particle Filter Algorithm
Comments: 61 pages, 4 figures
Subjects: Machine Learning (stat.ML); Statistics Theory (math.ST); Computation (stat.CO)
[19]  arXiv:2008.11140 (replaced) [pdf, other]
Title: Powerful Inference
Comments: 29 pages, 4 figures, 3 tables
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST)
[20]  arXiv:2101.12353 (replaced) [pdf, other]
Title: On the capacity of deep generative networks for approximating distributions
Subjects: Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)
[21]  arXiv:2104.04910 (replaced) [pdf, other]
Title: Semi-$G$-normal: a Hybrid between Normal and $G$-normal (Full Version)
Comments: 109 pages, 8 figures, a comprehensive document for conference and open discussions, to be divided later for publications, readers may navigate to the parts they are interested in by the table of contents
Subjects: Probability (math.PR); Statistics Theory (math.ST)
[22]  arXiv:2105.03425 (replaced) [pdf, other]
Title: Kernel Two-Sample Tests for Manifold Data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[23]  arXiv:2105.08024 (replaced) [pdf, other]
Title: Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
[24]  arXiv:2106.03227 (replaced) [pdf, other]
Title: Neural Tangent Kernel Maximum Mean Discrepancy
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[25]  arXiv:2109.05578 (replaced) [pdf, other]
Title: Kernel PCA with the Nyström method
Authors: Fredrik Hallgren
Comments: 44 pages, 6 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[26]  arXiv:2109.11307 (replaced) [pdf, other]
Title: Semiparametric bivariate extreme-value copulas
Comments: 23 pages, 22 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)
[27]  arXiv:2110.01593 (replaced) [pdf, other]
Title: Generalized Kernel Thinning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
[ total of 27 entries: 1-27 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, math, recent, 2110, contact, help  (Access key information)