We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 18 entries: 1-18 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 26 Jan 23

[1]  arXiv:2301.10387 [pdf, other]
Title: mcGP: mesh-clustered Gaussian process emulator for partial differential equation systems
Subjects: Methodology (stat.ME); Applications (stat.AP)

Partial differential equations (PDEs) have become an essential tool for modeling complex physical systems. Such equations are typically solved numerically via mesh-based methods, such as the finite element method, the outputs of which consist of the solutions on a set of mesh nodes over the spatial domain. However, these simulations are often prohibitively costly to survey the input space. In this paper, we propose an efficient emulator that simultaneously predicts the outputs on a set of mesh nodes, with theoretical justification of its uncertainty quantification. The novelty of the proposed method lies in the incorporation of the mesh node coordinates into the statistical model. In particular, the proposed method segments the mesh nodes into multiple clusters via a Dirichlet process prior and fits a Gaussian process model in each. Most importantly, by revealing the underlying clustering structures, the proposed method can extract valuable flow physics present in the systems that can be used to guide further investigations. Real examples are demonstrated to show that our proposed method has smaller prediction errors than its main competitors, with competitive computation time, and provides valuable insights about the underlying physics of the systems. An R package for the proposed methodology is provided in an open repository.

[2]  arXiv:2301.10392 [pdf, other]
Title: Statistical Inference and Large-scale Multiple Testing for High-dimensional Regression Models
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

This paper presents a selective survey of recent developments in statistical inference and multiple testing for high-dimensional regression models, including linear and logistic regression. We examine the construction of confidence intervals and hypothesis tests for various low-dimensional objectives such as regression coefficients and linear and quadratic functionals. The key technique is to generate debiased and desparsified estimators for the targeted low-dimensional objectives and estimate their uncertainty. In addition to covering the motivations for and intuitions behind these statistical methods, we also discuss their optimality and adaptivity in the context of high-dimensional inference. In addition, we review the recent development of statistical inference based on multiple regression models and the advancement of large-scale multiple testing for high-dimensional regression. The R package SIHR has implemented some of the high-dimensional inference methods discussed in this paper.

[3]  arXiv:2301.10468 [pdf, other]
Title: Model selection-based estimation for generalized additive models using mixtures of g-priors: Towards systematization
Subjects: Methodology (stat.ME)

We consider estimation of generalized additive models using basis expansions with Bayesian model selection. Although Bayesian model selection is an intuitively appealing tool for regression splines caused by the flexible knot placement and model-averaged function estimates, its use has traditionally been limited to Gaussian additive regression, as posterior search of the model space requires a tractable form of the marginal model likelihood. We introduce an extension of the method to distributions belonging to the exponential family using the Laplace approximation to the likelihood. Although the Laplace approximation is successful with all Gaussian-type prior distributions in providing a closed-form expression of the marginal likelihood, there is no broad consensus on the best prior distribution to be used for nonparametric regression via model selection. We observe that the classical unit information prior distribution for variable selection may not be suitable for nonparametric regression using basis expansions. Instead, our study reveals that mixtures of g-priors are more suitable. A large family of mixtures of g-priors is considered for a detailed examination of how various mixture priors perform in estimating generalized additive models. Furthermore, we compare several priors of knots for model selection-based spline approaches to determine the most practically effective scheme. The model selection-based estimation methods are also compared with other Bayesian approaches to function estimation. Extensive simulation studies demonstrate the validity of the model selection-based approaches. We provide an R package for the proposed method.

[4]  arXiv:2301.10640 [pdf, other]
Title: Adaptive enrichment trial designs using joint modeling of longitudinal and time-to-event data
Comments: Main paper consists of 14 pages (inclusive of 3 figures and 2 tables). Supplementary materials consist of 14 pages
Subjects: Methodology (stat.ME)

Adaptive enrichment allows for pre-defined patient subgroups of interest to be investigated throughout the course of a clinical trial. Many trials which measure a long-term time-to-event endpoint often also routinely collect repeated measures on biomarkers which may be predictive of the primary endpoint. Although these data may not be leveraged directly to support subgroup selection decisions and early stopping decisions, we aim to make greater use of these data to increase efficiency and improve interim decision making. In this work, we present a joint model for longitudinal and time-to-event data and two methods for creating standardised statistics based on this joint model. We can use the estimates to define enrichment rules and efficacy and futility early stopping rules for a flexible efficient clinical trial with possible enrichment. Under this framework, we show asymptotically that the familywise error rate is protected in the strong sense. To assess the results, we consider a trial for the treatment of metastatic breast cancer where repeated ctDNA measurements are available and the subgroup criteria is defined by patients' ER and HER2 status. Using simulation, we show that incorporating biomarker information leads to accurate subgroup identification and increases in power.

[5]  arXiv:2301.10715 [pdf, ps, other]
Title: Regression Models for Directional Data Based on Nonnegative Trigonometric Sums
Comments: 39 pages, 5 figures
Subjects: Methodology (stat.ME)

The parameter space of nonnegative trigonometric sums (NNTS) models for circular data is the surface of a hypersphere; thus, constructing regression models for a circular-dependent variable using NNTS models can comprise fitting great (small) circles on the parameter hypersphere that can identify different regions (rotations) along the great (small) circle. We propose regression models for circular- (angular-) dependent random variables in which the original circular random variable, which is assumed to be distributed (marginally) as an NNTS model, is transformed into a linear random variable such that common methods for linear regression can be applied. The usefulness of NNTS models with skewness and multimodality is shown in examples with simulated and real data.

Cross-lists for Thu, 26 Jan 23

[6]  arXiv:2301.10592 (cross-list from econ.EM) [pdf, other]
Title: Hierarchical Regularizers for Reverse Unrestricted Mixed Data Sampling Regressions
Subjects: Econometrics (econ.EM); Methodology (stat.ME)

Reverse Unrestricted MIxed DAta Sampling (RU-MIDAS) regressions are used to model high-frequency responses by means of low-frequency variables. However, due to the periodic structure of RU-MIDAS regressions, the dimensionality grows quickly if the frequency mismatch between the high- and low-frequency variables is large. Additionally the number of high-frequency observations available for estimation decreases. We propose to counteract this reduction in sample size by pooling the high-frequency coefficients and further reduce the dimensionality through a sparsity-inducing convex regularizer that accounts for the temporal ordering among the different lags. To this end, the regularizer prioritizes the inclusion of lagged coefficients according to the recency of the information they contain. We demonstrate the proposed method on an empirical application for daily realized volatility forecasting where we explore whether modeling high-frequency volatility data in terms of low-frequency macroeconomic data pays off.

Replacements for Thu, 26 Jan 23

[7]  arXiv:2008.12807 (replaced) [pdf, other]
Title: Causal mediation analysis decomposition of between-hospital variance
Journal-ref: Health Services and Outcomes Research Methodology volume 22, pages 118-144 (2022)
Subjects: Methodology (stat.ME)
[8]  arXiv:2106.13694 (replaced) [pdf, other]
Title: Posterior Covariance Information Criterion for Weighted Inference
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
[9]  arXiv:2110.12316 (replaced) [pdf, other]
Title: Semiparametric discrete data regression with Monte Carlo inference and prediction
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO); Machine Learning (stat.ML)
[10]  arXiv:2207.03597 (replaced) [pdf, other]
Title: Nonparametric Estimation of the Potential Impact Fraction and Population Attributable Fraction with Individual-Level and Aggregated Data
Subjects: Methodology (stat.ME)
[11]  arXiv:2208.06552 (replaced) [pdf, other]
Title: Sensitivity to Unobserved Confounding in Studies with Factor-structured Outcomes
Subjects: Methodology (stat.ME)
[12]  arXiv:2301.01616 (replaced) [pdf, other]
Title: Locally Private Causal Inference
Comments: 24 pages
Subjects: Methodology (stat.ME)
[13]  arXiv:2301.02853 (replaced) [pdf, other]
Title: Using a Penalized Likelihood to Detect Mortality Deceleration
Subjects: Methodology (stat.ME); Applications (stat.AP)
[14]  arXiv:1903.04059 (replaced) [pdf, other]
Title: Hidden tail chains and recurrence equations for dependence parameters associated with extremes of higher-order Markov chains
Comments: 33 pages
Subjects: Statistics Theory (math.ST); Probability (math.PR); Methodology (stat.ME)
[15]  arXiv:2011.00373 (replaced) [pdf, other]
Title: Causal Inference for Spatial Treatments
Authors: Michael Pollmann
Comments: complete rewrite with additional results; includes online appendix
Subjects: Econometrics (econ.EM); Methodology (stat.ME)
[16]  arXiv:2206.14674 (replaced) [pdf, other]
Title: Signature Methods in Machine Learning
Comments: Updated figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Classical Analysis and ODEs (math.CA); Numerical Analysis (math.NA); Statistics Theory (math.ST); Methodology (stat.ME)
[17]  arXiv:2210.10530 (replaced) [pdf, other]
Title: Adversarial De-confounding in Individualised Treatment Effects Estimation
Comments: accepted to AISTATS 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
[18]  arXiv:2301.05761 (replaced) [pdf, other]
Title: Local Model Explanations and Uncertainty Without Model Access
Comments: 13 pages, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
[ total of 18 entries: 1-18 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2301, contact, help  (Access key information)