[1]
Title: Fisher transformation via Edgeworth expansion
Authors: Jan Vrbik
Subjects: Statistics Theory (math.ST)

We show how to calculate individual terms of the Edgeworth series to approximate the distribution of the Pearson correlation coefficient with the help of a simple Mathematica program. We also demonstrate how to eliminate the corresponding skewness, thus making the approximation substantially more accurate. This leads, in a rather natural way, to deriving a superior (in terms of its accuracy) version of Fisher's z transformation. The code can be easily modified to deal with any sample statistics defined as a function of several sample means, based on a random independent sample from a multivariate distribution.

[2]
Title: Dispersion Parameter Extension of Precise Generalized Linear Mixed Model Asymptotics
Subjects: Statistics Theory (math.ST)

We extend a recently established asymptotic normality theorem for generalized linear mixed models to include the dispersion parameter. The new results show that the maximum likelihood estimators of all model parameters have asymptotically normal distributions with asymptotic mutual independence between fixed effects, random effects covariance and dispersion parameters. The dispersion parameter maximum likelihood estimator has a particularly simple asymptotic distribution which enables straightforward valid likelihood-based inference.

[3]
Title: Trace Moments of the Sample Covariance Matrix with Graph-Coloring
Authors: Ben Deitmar
Subjects: Statistics Theory (math.ST)

Let $S_{p,n}$ denote the sample covariance matrix based on $n$ independent identically distributed $p$-dimensional random vectors in the null-case. The main result of this paper is an expansion of trace moments and power-trace covariances of $S_{p,n}$ simultaneously for both high- and low-dimensional data. To this end we develop a graph theory oriented ansatz of describing trace moments as weighted sums over colored graphs. Specifically, explicit formulas for the highest order coefficients in the expansion are deduced by restricting attention to graphs with either no or one cycle. The novelty is a color-preserving decomposition of graphs into a tree-structure and their seed graphs, which allows for the identification of Euler circuits from graphs with the same tree-structure but different seed graphs. This approach may also be used to approximate the mean and covariance to even higher degrees of accuracy.

### Cross-lists for Thu, 11 Aug 22

[4]  arXiv:2208.05344 (cross-list from econ.EM) [pdf, other]
Title: Testing for error invariance in separable instrumental variable models
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST)

The hypothesis of error invariance is central to the instrumental variable literature. It means that the error term of the model is the same across all potential outcomes. In other words, this assumption signifies that treatment effects are constant across all subjects. It allows to interpret instrumental variable estimates as average treatment effects over the whole population of the study. When this assumption does not hold, the bias of instrumental variable estimators can be larger than that of naive estimators ignoring endogeneity. This paper develops two tests for the assumption of error invariance when the treatment is endogenous, an instrumental variable is available and the model is separable. The first test assumes that the potential outcomes are linear in the regressors and is computationally simple. The second test is nonparametric and relies on Tikhonov regularization. The treatment can be either discrete or continuous. We show that the tests have asymptotically correct level and asymptotic power equal to one against a range of alternatives. Simulations demonstrate that the proposed tests attain excellent finite sample performances. The methodology is also applied to the evaluation of returns to schooling and the effect of price on demand in a fish market.

[5]  arXiv:2208.05406 (cross-list from cs.LG) [pdf, other]
Title: Active Sampling of Multiple Sources for Sequential Estimation
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST)

Consider $K$ processes, each generating a sequence of identical and independent random variables. The probability measures of these processes have random parameters that must be estimated. Specifically, they share a parameter $\theta$ common to all probability measures. Additionally, each process $i\in\{1, \dots, K\}$ has a private parameter $\alpha_i$. The objective is to design an active sampling algorithm for sequentially estimating these parameters in order to form reliable estimates for all shared and private parameters with the fewest number of samples. This sampling algorithm has three key components: (i)~data-driven sampling decisions, which dynamically over time specifies which of the $K$ processes should be selected for sampling; (ii)~stopping time for the process, which specifies when the accumulated data is sufficient to form reliable estimates and terminate the sampling process; and (iii)~estimators for all shared and private parameters. Owing to the sequential estimation being known to be analytically intractable, this paper adopts \emph {conditional} estimation cost functions, leading to a sequential estimation approach that was recently shown to render tractable analysis. Asymptotically optimal decision rules (sampling, stopping, and estimation) are delineated, and numerical experiments are provided to compare the efficacy and quality of the proposed procedure with those of the relevant approaches.

