We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 11 Dec 19

[1]  arXiv:1912.04432 [pdf]
Title: Variable selection for transportability
Comments: Under Review
Subjects: Methodology (stat.ME); Other Statistics (stat.OT)

Transportability provides a principled framework to address the problem of applying study results to new populations. Here, we consider the problem of selecting variables to include in transport estimators. We provide a brief overview of the transportability framework and illustrate that while selection diagrams are a vital first step in variable selection, these graphs alone identify a sufficient but not strictly necessary set of variables for generating an unbiased transport estimate. Next, we conduct a simulation experiment assessing the impact of including unnecessary variables on the performance of the parametric g-computation transport estimator. Our results highlight that the types of variables included can affect the bias, variance, and mean squared error of the estimates. We find that addition of variables that are not causes of the outcome but whose distributions differ between the source and target populations can increase the variance and mean squared error of the transported estimates. On the other hand, inclusion of variables that are causes of the outcome (regardless of whether they modify the causal contrast of interest or differ in distribution between the populations) reduces the variance of the estimates without increasing the bias. Finally, exclusion of variables that cause the outcome but do not modify the causal contrast of interest does not increase bias. These findings suggest that variable selection approaches for transport should prioritize identifying and including all causes of the outcome in the study population rather than focusing on variables whose distribution may differ between the study sample and target population.

[2]  arXiv:1912.04542 [pdf, ps, other]
Title: What is the best predictor that you can compute in five minutes using a given Bayesian hierarchical model?
Subjects: Methodology (stat.ME)

The goal of this paper is to provide a way for statisticians to answer the question posed in the title of this article using any Bayesian hierarchical model of their choosing and without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of ``big data'' has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a ``data subset model'' to the popular ``data model, process model, and parameter model'' framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies pre-defined computational constraints. Thus, these hyperparameters effectively calibrates the statistical model to the computer itself to obtain predictions/estimations in a pre-specified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semi-parametric properties. Furthermore, we show that subsets of normally distributed data are asymptotically partially sufficient under reasonable constraints. Results from a simulated dataset will be presented across different computers, to show the effect of the computer on the statistical analysis. Additionally, we provide a joint spatial analysis of two different environmental datasets.

[3]  arXiv:1912.04571 [pdf, other]
Title: Spatial hierarchical modeling of threshold exceedances using rate mixtures
Subjects: Methodology (stat.ME)

We develop new flexible univariate models for light-tailed and heavy-tailed data, which extend a hierarchical representation of the generalized Pareto (GP) limit for threshold exceedances. These models can accommodate departure from asymptotic threshold stability in finite samples while keeping the asymptotic GP distribution as a special (or boundary) case and can capture the tails and the bulk jointly without losing much flexibility. Spatial dependence is modeled through a latent process, while the data are assumed to be conditionally independent. We design penalized complexity priors for crucial model parameters, shrinking our proposed spatial Bayesian hierarchical model toward a simpler reference whose marginal distributions are GP with moderately heavy tails. Our model can be fitted in fairly high dimensions using Markov chain Monte Carlo by exploiting the Metropolis-adjusted Langevin algorithm (MALA), which guarantees fast convergence of Markov chains with efficient block proposals for the latent variables. We also develop an adaptive scheme to calibrate the MALA tuning parameters. Moreover, our models avoid the expensive numerical evaluations of multifold integrals in censored likelihood expressions. We demonstrate our new methodology by simulation and application to a dataset of extreme rainfall episodes that occurred in Germany. Our fitted model provides a satisfactory performance and can be successfully used to predict rainfall extremes at unobserved locations.

[4]  arXiv:1912.04607 [pdf, other]
Title: Controlling false discovery exceedance for heterogeneous tests
Subjects: Methodology (stat.ME)

Several classical methods exist for controlling the false discovery exceedance (FDX) for large scale multiple testing problems, among them the Lehmann-Romano procedure ([LR] below) and the Guo-Romano procedure ([GR] below). While these two procedures are the most prominent, they were originally designed for homogeneous test statistics, that is, when the null distribution functions of the $p$-values $F_i$, $1\leq i\leq m$, are all equal. In many applications, however, the data are heterogeneous which leads to heterogeneous null distribution functions. Ignoring this heterogeneity usually induces a conservativeness for the aforementioned procedures. In this paper, we develop three new procedures that incorporate the $F_i$'s, while ensuring the FDX control. The heterogeneous version of [LR], denoted [HLR], is based on the arithmetic average of the $F_i$'s, while the heterogeneous version of [GR], denoted [HGR], is based on the geometric average of the $F_i$'s. We also introduce a procedure [PB], that is based on the Poisson-binomial distribution and that uniformly improves [HLR] and [HGR], at the price of a higher computational complexity. Perhaps surprisingly, this shows that, contrary to the known theory of false discovery rate (FDR) control under heterogeneity, the way to incorporate the $F_i$'s can be particularly simple in the case of FDX control, and does not require any further correction term. The performances of the new proposed procedures are illustrated by real and simulated data in two important heterogeneous settings: first, when the test statistics are continuous but the $p$-values are weighted by some known independent weight vector, e.g., coming from co-data sets; second, when the test statistics are discretely distributed, as is the case for data representing frequencies or counts.

[5]  arXiv:1912.04758 [pdf, other]
Title: Generalised Network Autoregressive Processes and the GNAR package
Subjects: Methodology (stat.ME)

This article introduces the GNAR package, which fits, predicts, and simulates from a powerful new class of generalised network autoregressive processes. Such processes consist of a multivariate time series along with a real, or inferred, network that provides information about inter-variable relationships. The GNAR model relates values of a time series for a given variable and time to earlier values of the same variable and of neighbouring variables, with inclusion controlled by the network structure. The GNAR package is designed to fit this new model, while working with standard ts objects and the igraph package for ease of use.

Cross-lists for Wed, 11 Dec 19

[6]  arXiv:1912.04629 (cross-list from math.ST) [pdf, ps, other]
Title: Classification under local differential privacy
Comments: 12 pages
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

We consider the binary classification problem in a setup that preserves the privacy of the original sample. We provide a privacy mechanism that is locally differentially private and then construct a classifier based on the private sample that is universally consistent in Euclidean spaces. Under stronger assumptions, we establish the minimax rates of convergence of the excess risk and see that they are slower than in the case when the original sample is available.

[7]  arXiv:1912.04661 (cross-list from econ.EM) [pdf, other]
Title: Adaptive Dynamic Model Averaging with an Application to House Price Forecasting
Subjects: Econometrics (econ.EM); Methodology (stat.ME)

Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a state-of-the-art non-parametric model combination algorithm from the prediction with expert advice literature, which offers finite-time performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications.

Replacements for Wed, 11 Dec 19

[8]  arXiv:1906.00696 (replaced) [pdf, other]
Title: Transformed Central Quantile Subspace
Authors: Eliana Christou
Comments: arXiv admin note: text overlap with arXiv:1906.00694
Subjects: Methodology (stat.ME)
[9]  arXiv:1907.06560 (replaced) [pdf]
Title: Eliciting Priors for Bayesian Prediction of Daily Response Propensity in Responsive Survey Design: Historical Data Analysis vs. Literature Review
Comments: 47 pages, 10 figures, two tables
Subjects: Methodology (stat.ME); Applications (stat.AP)
[10]  arXiv:1907.09617 (replaced) [pdf, other]
Title: Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many Locations
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 1912, contact, help  (Access key information)