New submissions for Tue, 7 Feb 23

[1]  arXiv:2302.02110 [pdf, other]
Title: A Scalar-on-Quantile-Function Approach for Estimating Short-term Health Effects of Environmental Exposures
Subjects: Applications (stat.AP)

Environmental epidemiologic studies routinely utilize aggregate health outcomes to estimate effects of short-term (e.g., daily) exposures that are available at increasingly fine spatial resolutions. However, areal averages are typically used to derive population-level exposure, which cannot capture the spatial variation and individual heterogeneity in exposures that may occur within the spatial and temporal unit of interest (e.g., within day or ZIP code). We propose a general modeling approach to incorporate within-unit exposure heterogeneity in health analyses via exposure quantile functions. Furthermore, by viewing the exposure quantile function as a functional covariate, our approach provides additional flexibility in characterizing associations at different quantile levels. We apply the proposed approach to an analysis of air pollution and emergency department (ED) visits in Atlanta over four years. The analysis utilizes daily ZIP code-level distributions of personal exposures to four traffic-related ambient air pollutants simulated from the Stochastic Human Exposure and Dose Simulator. Our analyses find that effects of carbon monoxide on respiratory and cardiovascular disease ED visits are more pronounced with changes in lower quantiles of the population-level exposure. Software for implement is provided in the R package nbRegQF.

[2]  arXiv:2302.02488 [pdf, other]
Title: A three-state coupled Markov switching model for COVID-19 outbreaks across Quebec based on hospital admissions
Subjects: Applications (stat.AP); Populations and Evolution (q-bio.PE)

Recurrent COVID-19 outbreaks have placed immense strain on the hospital system in Quebec. We develop a Bayesian three-state coupled Markov switching model to analyze COVID-19 outbreaks across Quebec based on admissions in the 30 largest hospitals. Within each catchment area we assume the existence of three states for the disease: absence, a new state meant to account for many zeroes in some of the smaller areas, endemic and outbreak. Then we assume the disease switches between the three states in each area through a series of coupled nonhomogeneous hidden Markov chains. Unlike previous approaches, the transition probabilities may depend on covariates and the occurrence of outbreaks in neighboring areas, to account for geographical outbreak spread. Additionally, to prevent rapid switching between endemic and outbreak periods we introduce clone states into the model which enforce minimum endemic and outbreak durations. We make some interesting findings such as that mobility in retail and recreation venues had a strong positive association with the development and persistence of new COVID-19 outbreaks in Quebec. Based on model comparison our contributions show promise in improving state estimation retrospectively and in real-time, especially when there are smaller areas and highly spatially synchronized outbreaks, and they offer new and interesting epidemiological interpretations.

Cross-lists for Tue, 7 Feb 23

[3]  arXiv:2302.01982 (cross-list from stat.ME) [pdf, other]
Title: multi-GPA-Tree: Statistical Approach for Pleiotropy Informed and Functional Annotation Tree Guided Prioritization of GWAS Results
Comments: 25 pages, 6 figures, 1 table
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)

Genome-wide association studies (GWAS) have successfully identified over two hundred thousand genotype-trait associations. Yet some challenges remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), most with small or moderate effect sizes, making them difficult to detect. Second, many complex traits share a common genetic basis due to `pleiotropy' and and though few methods consider it, leveraging pleiotropy can improve statistical power to detect genotype-trait associations with weaker effect sizes. Third, currently available statistical methods are limited in explaining the functional mechanisms through which genetic variants are associated with specific or multiple traits. We propose multi-GPA-Tree to address these challenges. The multi-GPA-Tree approach can identify risk SNPs associated with single as well as multiple traits while also identifying the combinations of functional annotations that can explain the mechanisms through which risk-associated SNPs are linked with the traits.
First, we implemented simulation studies to evaluate the proposed multi-GPA-Tree method and compared its performance with an existing statistical approach.The results indicate that multi-GPA-Tree outperforms the existing statistical approach in detecting risk-associated SNPs for multiple traits. Second, we applied multi-GPA-Tree to a systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), and to a Crohn's disease (CD) and ulcertive colitis (UC) GWAS, and functional annotation data including GenoSkyline and GenoSkylinePlus. Our results demonstrate that multi-GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits.

[4]  arXiv:2302.02288 (cross-list from stat.ME) [pdf, other]
Title: Efficient Adaptive Sobel and Joint Significance Tests for Mediation Effects
Authors: Haixiang Zhang
Subjects: Methodology (stat.ME); Applications (stat.AP)

Mediation analysis is an important statistical tool in many research fields. Particularly, the Sobel test and joint significance test are two popular statistical test methods for mediation effects when we perform mediation analysis in practice. However, the drawback of both mediation testing methods is arising from the conservative type I error, which has reduced their powers and imposed restrictions on their popularity and usefulness. As a matter of fact, this limitation is long-standing for both methods in the medation analysis literature. To deal with this issue, we propose the adaptive Sobel test and adaptive joint significance test for mediation effects, which have significant improvements over the traditional Sobel and joint significance test methods. Meanwhile, our method is user-friendly and intelligible without involving more complicated procedures. The explicit expressions for sizes and powers are derived, which ensure the theoretical rationality of our method. Furthermore, we extend the proposed adaptive Sobel and joint significance tests for multiple mediators with family-wise error rate control. Extensive simulations are conducted to evaluate the performance of our mediation testing procedure. Finally, we illustrate the usefulness of our method by analysing three real-world datasets with continuous, binary and time-to-event outcomes, respectively.

[5]  arXiv:2302.02310 (cross-list from stat.ME) [pdf, other]
Title: $\ell_1$-penalized Multinomial Regression: Estimation, inference, and prediction, with an application to risk factor identification for different dementia subtypes
Comments: 23 pages, 3 figures, 20 tables
Subjects: Methodology (stat.ME); Applications (stat.AP)

High-dimensional multinomial regression models are very useful in practice but receive less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast-based $\ell_1$-penalized multinomial regression model and extend the debiasing method to the multinomial case, which provides a valid confidence interval for each coefficient and $p$-value of the individual hypothesis test. We apply the debiasing method to identify some important predictors in the progression into dementia of different subtypes. Results of intensive simulations show the superiority of the debiasing method compared to some other inference methods.

[6]  arXiv:2302.02859 (cross-list from stat.ME) [pdf, other]
Title: A Fast Bootstrap Algorithm for Causal Inference with Large Data
Comments: 46 pages
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)

Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this paper, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.

[7]  arXiv:2302.02895 (cross-list from cs.CG) [pdf, other]
Title: Flexible and Probabilistic Topology Tracking with Partial Optimal Transport
Subjects: Computational Geometry (cs.CG); Applications (stat.AP)

In this paper, we present a flexible and probabilistic framework for tracking topological features in time-varying scalar fields using merge trees and partial optimal transport. Merge trees are topological descriptors that record the evolution of connected components in the sublevel sets of scalar fields. We present a new technique for modeling and comparing merge trees using tools from partial optimal transport. In particular, we model a merge tree as a measure network, that is, a network equipped with a probability distribution, and define a notion of distance on the space of merge trees inspired by partial optimal transport. Such a distance offers a new and flexible perspective for encoding intrinsic and extrinsic information in the comparative measures of merge trees. More importantly, it gives rise to a partial matching between topological features in time-varying data, thus enabling flexible topology tracking for scientific simulations. Furthermore, such partial matching may be interpreted as probabilistic coupling between features at adjacent time steps, which gives rise to probabilistic tracking graphs. We derive a stability result for our distance and provide numerous experiments indicating the efficacy of distance in extracting meaningful feature tracks.

Replacements for Tue, 7 Feb 23

[8]  arXiv:2105.03529 (replaced) [pdf, other]
Title: Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data
Subjects: Applications (stat.AP)
[9]  arXiv:2203.10563 (replaced) [pdf, other]
Title: Application of de-shape synchrosqueezing to estimate gait cadence from a single-sensor accelerometer placed in different body locations
Subjects: Applications (stat.AP)
[10]  arXiv:2204.00180 (replaced) [pdf, other]
Title: Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification Approach
Authors: Filip Obradović
Comments: For associated code, see: this https URL
Subjects: Applications (stat.AP); Econometrics (econ.EM)
[11]  arXiv:2208.00959 (replaced) [pdf, other]
Title: HUG model: an interaction point process for Bayesian detection of multiple sources in groundwaters from hydrochemical data
Authors: Christophe Reype (IECL, PASTA), Radu S. Stoica (IECL, PASTA), Antonin Richard, Madalina Deaconu (IECL, PASTA)
Subjects: Applications (stat.AP); Statistics Theory (math.ST); Methodology (stat.ME)
[12]  arXiv:2209.15414 (replaced) [pdf, other]
Title: Predicting the power grid frequency of European islands
Comments: 17 pages
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Systems and Control (eess.SY); Data Analysis, Statistics and Probability (physics.data-an)
[13]  arXiv:2101.02094 (replaced) [pdf, ps, other]
Title: Bernstein-Type Bounds for Beta Distribution
Authors: Maciej Skorski
Comments: major revision - fixed a mistake in the proof
Subjects: Probability (math.PR); Statistics Theory (math.ST); Applications (stat.AP)
[14]  arXiv:2106.14045 (replaced) [pdf, other]
Title: The mbsts package: Multivariate Bayesian Structural Time Series Models in R
Authors: Ning Ning, Jinwen Qiu
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Mathematical Software (cs.MS); Applications (stat.AP); Computation (stat.CO)
[15]  arXiv:2210.14860 (replaced) [pdf, other]
Title: Dependence matters: Statistical models to identify the drivers of tie formation in economic networks
Subjects: Methodology (stat.ME); Applications (stat.AP)
[16]  arXiv:2211.13793 (replaced) [pdf, other]
Title: Tensor Decomposition of Large-scale Clinical EEGs Reveals Interpretable Patterns of Brain Physiology
Comments: 4 pages, 3 Figures, 2 Tables; Accepted at IEEE NER 2023
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Applications (stat.AP)
[17]  arXiv:2301.09861 (replaced) [pdf]
Title: A convolutional neural network of low complexity for tumor anomaly detection
Comments: This article has been accepted for publication in the 8th International Congress on Information and Communication Technology (ICICT 2023, Springer)
Subjects: Image and Video Processing (eess.IV); Applications (stat.AP)
