We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 21 entries: 1-21 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 15 Jan 21

[1]  arXiv:2101.05352 [pdf, other]
Title: Bayesian Multiple Index Models for Environmental Mixtures
Subjects: Methodology (stat.ME)

An important goal of environmental health research is to assess the risk posed by mixtures of environmental exposures. Two popular classes of models for mixtures analyses are response-surface methods and exposure-index methods. Response-surface methods estimate high-dimensional surfaces and are thus highly flexible but difficult to interpret. In contrast, exposure-index methods decompose coefficients from a linear model into an overall mixture effect and individual index weights; these models yield easily interpretable effect estimates and efficient inferences when model assumptions hold, but, like most parsimonious models, incur bias when these assumptions do not hold. In this paper we propose a Bayesian multiple index model framework that combines the strengths of each, allowing for non-linear and non-additive relationships between exposure indices and a health outcome, while reducing the dimensionality of the exposure vector and estimating index weights with variable selection. This framework contains response-surface and exposure-index models as special cases, thereby unifying the two analysis strategies. This unification increases the range of models possible for analyzing environmental mixtures and health, allowing one to select an appropriate analysis from a spectrum of models varying in flexibility and interpretability. In an analysis of the association between telomere length and 18 organic pollutants in the National Health and Nutrition Examination Survey (NHANES), the proposed approach fits the data as well as more complex response-surface methods and yields more interpretable results.

[2]  arXiv:2101.05394 [pdf, other]
Title: The $\mathcal{F}$-family of covariance functions: A Matérn analogue for modeling random fields on spheres
Subjects: Methodology (stat.ME)

The Mat{\'e}rn family of isotropic covariance functions has been central to the theoretical development and application of statistical models for geospatial data. For global data defined over the whole sphere representing planet Earth, the natural distance between any two locations is the great circle distance. In this setting, the Mat{\'e}rn family of covariance functions has a restriction on the smoothness parameter, making it an unappealing choice to model smooth data. Finding a suitable analogue for modelling data on the sphere is still an open problem. This paper proposes a new family of isotropic covariance functions for random fields defined over the sphere. The proposed family has a parameter that indexes the mean square differentiability of the corresponding Gaussian field, and allows for any admissible range of fractal dimension. Our simulation study mimics the fixed domain asymptotic setting, which is the most natural regime for sampling on a closed and bounded set. As expected, our results support the analogous results (under the same asymptotic scheme) for planar processes that not all parameters can be estimated consistently. We apply the proposed model to a dataset of precipitable water content over a large portion of the Earth, and show that the model gives more precise predictions of the underlying process at unsampled locations than does the Mat{\'e}rn model using chordal distances.

[3]  arXiv:2101.05539 [pdf, other]
Title: Integrative Learning for Population of Dynamic Networks with Covariates
Comments: 52 pages, 4 figures
Subjects: Methodology (stat.ME)

Although there is a rapidly growing literature on dynamic connectivity methods, the primary focus has been on separate network estimation for each individual, which fails to leverage common patterns of information. We propose novel graph-theoretic approaches for estimating a population of dynamic networks that are able to borrow information across multiple heterogeneous samples in an unsupervised manner and guided by covariate information. Specifically, we develop a Bayesian product mixture model that imposes independent mixture priors at each time scan and uses covariates to model the mixture weights, which results in time-varying clusters of samples designed to pool information. The computation is carried out using an efficient Expectation-Maximization algorithm. Extensive simulation studies illustrate sharp gains in recovering the true dynamic network over existing dynamic connectivity methods. An analysis of fMRI block task data with behavioral interventions reveal sub-groups of individuals having similar dynamic connectivity, and identifies intervention-related dynamic network changes that are concentrated in biologically interpretable brain regions. In contrast, existing dynamic connectivity approaches are able to detect minimal or no changes in connectivity over time, which seems biologically unrealistic and highlights the challenges resulting from the inability to systematically borrow information across samples.

[4]  arXiv:2101.05568 [pdf, other]
Title: Enhanced Cube Implementation For Highly Stratified Population
Subjects: Methodology (stat.ME)

A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the estimators. We propose here a new method to handle the selection of a balanced sample in a highly stratified population. The method improves substantially the commonly used sampling design and reduces the time-consuming problem that could arise if inclusion probabilities within strata do not sum to an integer.

[5]  arXiv:2101.05630 [pdf, other]
Title: Adaptive shrinkage of smooth functional effects towards a predefined functional subspace
Subjects: Methodology (stat.ME)

In this paper, we propose a new horseshoe-type prior hierarchy for adaptively shrinking spline-based functional effects towards a predefined vector space of parametric functions. Instead of shrinking each spline coefficient towards zero, we use an adapted horseshoe prior to control the deviation from the predefined vector space. For this purpose, the modified horseshoe prior is set up with one scale parameter per spline and not one per coefficient. The presented prior allows for a large number of basis functions to capture all kinds of functional effects while the estimated functional effect is prevented from a highly oscillating overfit. We achieve this by integrating a smoothing penalty similar to the random walk prior commonly applied in Bayesian P-spline priors. In a simulation study, we demonstrate the properties of the new prior specification and compare it to other approaches from the literature. Furthermore, we showcase the applicability of the proposed method by estimating the energy consumption in Germany over the course of a day. For inference, we rely on Markov chain Monte Carlo simulations combining Gibbs sampling for the spline coefficients with slice sampling for all scale parameters in the model.

[6]  arXiv:2101.05635 [pdf, other]
Title: Bayesian inference with tmbstan for a state-space model with VAR(1) state equation
Subjects: Methodology (stat.ME)

When using R package tmbstan for Bayesian inference, the built-in feature Laplace approximation to the marginal likelihood with random effects integrated out can be switched on and off. There exists no guideline on whether Laplace approximation should be used to achieve better efficiency especially when the statistical model for estimating selection is complicated. To answer this question, we conducted simulation studies under different scenarios with a state-space model employing a VAR(1) state equation. We found that turning on Laplace approximation in tmbstan would probably lower the computational efficiency, and only when there is a good amount of data, both tmbstan with and without Laplace approximation are worth trying since in this case, Laplace approximation is more likely to be accurate and may also lead to slightly higher computational efficiency. The transition parameters and scale parameters in a VAR(1) process are hard to be estimated accurately and increasing the sample size at each time point do not help in the estimation, only more time points in the data contain more information on these parameters and make the likelihood dominate the posterior likelihood, thus lead to accurate estimates for them.

[7]  arXiv:2101.05644 [pdf, ps, other]
Title: A new volatility model: GQARCH-Itô model
Comments: 25 pages, 1 figures, 4 tables
Subjects: Methodology (stat.ME)

Volatility asymmetry is a hot topic in high-frequency financial market. In this paper, we propose a new econometric model, which could describe volatility asymmetry based on high-frequency historical data and low-frequency historical data. After providing the quasi-maximum likelihood estimators for the parameters, we establish their asymptotic properties. We also conduct a series of simulation studies to check the finite sample performance and volatility forecasting performance of the proposed methodologies. And an empirical application is demonstrated that the new model has stronger volatility prediction power than GARCH-It\^{o} model in the literature.

[8]  arXiv:2101.05769 [pdf, ps, other]
Title: P-spline smoothed functional ICA of EEG data
Comments: 18 pages, 3 figures
Subjects: Methodology (stat.ME); Applications (stat.AP)

We propose a novel functional data framework for artifact extraction and removal to estimate brain electrical activity sources from EEG signals. Our methodology is derived on the basis of event related potential (ERP) analysis, and motivated by mapping adverse artifactual events caused by body movements and physiological activity originated outside the brain. A functional independent component analysis (FICA) based on the use of fourth moments is conducted on the principal component expansion in terms of B-spline basis functions. We extend this model setup by introducing a discrete roughness penalty in the orthonormality constraint of the functional principal component decomposition to later compute estimates of FICA. Compared to other ICA algorithms, our method combines a regularization mechanism stemmed from the principal eigendirections with a discrete penalization given by the $d$-order difference operator. In this regard, it allows to naturally control high-frequency remnants of neural origin overlapping latent artifactual eigenfunctions and thus to preserve this persistent activity at artifact extraction level. Furthermore, we introduce a new cross-validation method for the selection of the penalization parameter which uses shrinkage to asses the performance of the estimators for functional representations with larger basis dimension and excess of roughness. This method is used in combination with a kurtosis measure in order to provide the optimal number of independent components.The FICA model is illustrated at functional and longitudinal dimensions by an example on real EEG data where a subject willingly performs arm gestures and stereotyped physiological artifacts. Our method can be relevant in neurocognitive research and related fields, particularlly in situations where movement can bias the estimation of brain potentials.

[9]  arXiv:2101.05774 [pdf, other]
Title: Agglomerative Hierarchical Clustering for Selecting Valid Instrumental Variables
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

We propose an instrumental variable (IV) selection procedure which combines the agglomerative hierarchical clustering method and the Hansen-Sargan overidentification test for selecting valid instruments for IV estimation from a large set of candidate instruments. Some of the instruments may be invalid in the sense that they may fail the exclusion restriction. We show that under the plurality rule, our method can achieve oracle selection and estimation results. Compared to the previous IV selection methods, our method has the advantages that it can deal with the weak instruments problem effectively, and can be easily extended to settings where there are multiple endogenous regressors and heterogenous treatment effects. We conduct Monte Carlo simulations to examine the performance of our method, and compare it with two existing methods, the Hard Thresholding method (HT) and the Confidence Interval method (CIM). The simulation results show that our method achieves oracle selection and estimation results in both single and multiple endogenous regressors settings in large samples when all the instruments are strong. Also, our method works well when some of the candidate instruments are weak, outperforming HT and CIM. We apply our method to the estimation of the effect of immigration on wages in the US.

Cross-lists for Fri, 15 Jan 21

[10]  arXiv:2101.05350 (cross-list from stat.AP) [pdf, other]
Title: Estimating functional parameters for understanding the impact of weather and government interventions on COVID-19 outbreak
Authors: Chih-Li Sung
Subjects: Applications (stat.AP); Methodology (stat.ME)

As the coronavirus disease 2019 (COVID-19) has shown profound effects on public health and the economy worldwide, it becomes crucial to assess the impact on the virus transmission and develop effective strategies to address the challenge. A new statistical model derived from the SIR epidemic model with functional parameters is proposed to understand the impact of weather and government interventions on the virus spread and also provide the forecasts of COVID-19 infections among eight metropolitan areas in the United States. The model uses Bayesian inference with Gaussian process priors to study the functional parameters nonparametrically, and sensitivity analysis is adopted to investigate the main and interaction effects of these factors. This analysis reveals several important results including the potential interaction effects between weather and government interventions, which shed new light on the effective strategies for policymakers to mitigate the COVID-19 outbreak.

[11]  arXiv:2101.05654 (cross-list from math.ST) [pdf, ps, other]
Title: Optimal designs for comparing regression curves -- dependence within and between groups
Comments: 28 pages
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

We consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous time model. It is demonstrated that in general simultaneous estimation using the data from both groups yields more precise results than estimation of the parameters separately in the two groups. Using the BLUE from simultaneous estimation, we then construct an efficient linear estimator for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator). Finally, the optimal design points are determined by minimizing the maximal width of a simultaneous confidence band for the difference of the two regression functions. The advantages of the new approach are illustrated by means of a simulation study, where it is shown that the use of the optimal designs yields substantially narrower confidence bands than the application of uniform designs.

Replacements for Fri, 15 Jan 21

[12]  arXiv:1802.08579 (replaced) [pdf, other]
Title: Nonparametric Estimation of a distribution function from doubly truncated data under dependence
Subjects: Methodology (stat.ME)
[13]  arXiv:1906.06463 (replaced) [pdf, other]
Title: Linear Aggregation in Tree-based Estimators
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
[14]  arXiv:1909.02878 (replaced) [pdf, other]
Title: Bayesian Semiparametric Modeling of Response Mechanism for Nonignorable Missing Data
Comments: 25 pages; The title has been changed from "Bayesian semiparametric estimation under nonignorable nonresponse"
Subjects: Methodology (stat.ME)
[15]  arXiv:1910.10512 (replaced) [pdf, other]
Title: A Stochastic Block Model Approach for the Analysis of Multilevel Networks: an Application to the Sociology of Organizations
Subjects: Methodology (stat.ME); Applications (stat.AP)
[16]  arXiv:2007.00725 (replaced) [pdf, other]
Title: Robust Inference for Mediated Effects in Partially Linear Models
Subjects: Methodology (stat.ME)
[17]  arXiv:2009.00148 (replaced) [pdf, other]
Title: Design and Analysis of Switchback Experiments
Subjects: Methodology (stat.ME); Applications (stat.AP)
[18]  arXiv:2010.04492 (replaced) [pdf, other]
Title: Autoregressive Networks
Subjects: Methodology (stat.ME)
[19]  arXiv:1708.00145 (replaced) [pdf, other]
Title: Semiparametric Efficiency in Convexity Constrained Single Index Model
Comments: Removed the density bounded away from zero assumption in assumption (A5). Weakened assumption (B2)
Subjects: Statistics Theory (math.ST); Computation (stat.CO); Methodology (stat.ME)
[20]  arXiv:1811.08357 (replaced) [pdf, other]
Title: Learning deep kernels for exponential family densities
Journal-ref: Proceedings of the 36th International Conference on Machine Learning (ICML 2019), PMLR 97:6737-6746
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[21]  arXiv:2002.09116 (replaced) [pdf, other]
Title: Learning Deep Kernels for Non-Parametric Two-Sample Tests
Journal-ref: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119:6316-6326
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[ total of 21 entries: 1-21 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2101, contact, help  (Access key information)