Methodology

New submissions

Submissions received from Mon 6 May 24 to Tue 7 May 24, announced Wed, 8 May 24

New submissions
Cross-lists
Replacements

[ total of 26 entries: 1-26 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Wed, 8 May 24

[1] arXiv:2405.03778 [pdf, other]: Title: An Autoregressive Model for Time Series of Random Objects

Authors: Matthieu Bulté, Helle Sørensen

Subjects: Methodology (stat.ME)

Random variables in metric spaces indexed by time and observed at equally spaced time points are receiving increased attention due to their broad applicability. However, the absence of inherent structure in metric spaces has resulted in a literature that is predominantly non-parametric and model-free. To address this gap in models for time series of random objects, we introduce an adaptation of the classical linear autoregressive model tailored for data lying in a Hadamard space. The parameters of interest in this model are the Fr\'echet mean and a concentration parameter, both of which we prove can be consistently estimated from data. Additionally, we propose a test statistic and establish its asymptotic normality, thereby enabling hypothesis testing for the absence of serial dependence. Finally, we introduce a bootstrap procedure to obtain critical values for the test statistic under the null hypothesis. Theoretical results of our method, including the convergence of the estimators as well as the size and power of the test, are illustrated through simulations, and the utility of the model is demonstrated by an analysis of a time series of consumer inflation expectations.
[2] arXiv:2405.03815 [pdf, other]: Title: Statistical inference for a stochastic generalized logistic differential equation

Authors: Fernando Baltazar-Larios, Francisco Delgado-Vences, Saul Diaz-Infante, Eduardo Lince Gomez

Comments: 22 pages, 3 figures

Subjects: Methodology (stat.ME); Probability (math.PR); Statistics Theory (math.ST)

This research aims to estimate three parameters in a stochastic generalized logistic differential equation. We assume the intrinsic growth rate and shape parameters are constant but unknown. To estimate these two parameters, we use the maximum likelihood method and establish that the estimators for these two parameters are strongly consistent. We estimate the diffusion parameter by using the quadratic variation processes. To test our results, we evaluate two data scenarios, complete and incomplete, with fixed values assigned to the three parameters. In the incomplete data scenario, we apply an Expectation Maximization algorithm.
[3] arXiv:2405.03834 [pdf, other]: Title: Covariance-free Multifidelity Control Variates Importance Sampling for Reliability Analysis of Rare Events

Authors: Promit Chakroborty (Dept. of Civil and Systems Engg, Johns Hopkins University), Somayajulu L. N. Dhulipala (Idaho National Laboratory), Michael D. Shields (Dept. of Civil and Systems Engg, Johns Hopkins University)

Comments: 32 pages, 6 figures

Subjects: Methodology (stat.ME)

Multifidelity modeling has been steadily gaining attention as a tool to address the problem of exorbitant model evaluation costs that makes the estimation of failure probabilities a significant computational challenge for complex real-world problems, particularly when failure is a rare event. To implement multifidelity modeling, estimators that efficiently combine information from multiple models/sources are necessary. In past works, the variance reduction techniques of Control Variates (CV) and Importance Sampling (IS) have been leveraged for this task. In this paper, we present the CVIS framework; a creative take on a coupled Control Variates and Importance Sampling estimator for bifidelity reliability analysis. The framework addresses some of the practical challenges of the CV method by using an estimator for the control variate mean and side-stepping the need to estimate the covariance between the original estimator and the control variate through a clever choice for the tuning constant. The task of selecting an efficient IS distribution is also considered, with a view towards maximally leveraging the bifidelity structure and maintaining expressivity. Additionally, a diagnostic is provided that indicates both the efficiency of the algorithm as well as the relative predictive quality of the models utilized. Finally, the behavior and performance of the framework is explored through analytical and numerical examples.
[4] arXiv:2405.03985 [pdf, other]: Title: Bayesian Multilevel Compositional Data Analysis: Introduction, Evaluation, and Application

Authors: Flora Le, Tyman E. Stanford, Dorothea Dumuid, Joshua F. Wiley

Comments: 36 pages and 7 figures (6 pdf files)

Subjects: Methodology (stat.ME); Computation (stat.CO)

Multilevel compositional data commonly occur in various fields, particularly in intensive, longitudinal studies using ecological momentary assessments. Examples include data repeatedly measured over time that are non-negative and sum to a constant value, such as sleep-wake movement behaviours in a 24-hour day. This article presents a novel methodology for analysing multilevel compositional data using a Bayesian inference approach. This method can be used to investigate how reallocation of time between sleep-wake movement behaviours may be associated with other phenomena (e.g., emotions, cognitions) at a daily level. We explain the theoretical details of the data and the models, and outline the steps necessary to implement this method. We introduce the R package multilevelcoda to facilitate the application of this method and illustrate using a real data example. An extensive parameter recovery simulation study verified the robust performance of the method. Across all simulation conditions investigated in the simulation study, the model had minimal convergence issues (convergence rate > 99%) and achieved excellent quality of parameter estimates and inference, with an average bias of 0.00 (range -0.09, 0.05) and coverage of 0.95 (range 0.93, 0.97). We conclude the article with recommendations on the use of the Bayesian compositional multilevel modelling approach, and hope to promote wider application of this method to answer robust questions using the increasingly available data from intensive, longitudinal studies.
[5] arXiv:2405.04193 [pdf, ps, other]: Title: A generalized ordinal quasi-symmetry model and its separability for analyzing multi-way tables

Authors: Hisaya Okahara, Kouji Tahata

Subjects: Methodology (stat.ME)

This paper addresses the challenge of modeling multi-way contingency tables for matched set data with ordinal categories. Although the complete symmetry and marginal homogeneity models are well established, they may not always provide a satisfactory fit to the data. To address this issue, we propose a generalized ordinal quasi-symmetry model that offers increased flexibility when the complete symmetry model fails to capture the underlying structure. We investigate the properties of this new model and provide an information-theoretic interpretation, elucidating its relationship to the ordinal quasi-symmetry model. Moreover, we revisit Agresti's findings and present a new necessary and sufficient condition for the complete symmetry model, proving that the proposed model and the marginal moment equality model are separable hypotheses. The separability of the proposed model and marginal moment equality model is a significant development in the analysis of multi-way contingency tables. It enables researchers to examine the symmetry structure in the data with greater precision, providing a more thorough understanding of the underlying patterns. This powerful framework equips researchers with the necessary tools to explore the complexities of ordinal variable relationships in matched set data, paving the way for new discoveries and insights.
[6] arXiv:2405.04226 [pdf, other]: Title: NEST: Neural Estimation by Sequential Testing

Authors: Sjoerd Bruin, Jiří Kosinka, Cara Tursun

Subjects: Methodology (stat.ME)

Adaptive psychophysical procedures aim to increase the efficiency and reliability of measurements. With increasing stimulus and experiment complexity in the last decade, estimating multi-dimensional psychometric functions has become a challenging task for adaptive procedures. If the experimenter has limited information about the underlying psychometric function, it is not possible to use parametric techniques developed for the multi-dimensional stimulus space. Although there are non-parametric approaches that use Gaussian process methods and specific hand-crafted acquisition functions, their performance is sensitive to proper selection of the kernel function, which is not always straightforward. In this work, we use a neural network as the psychometric function estimator and introduce a novel acquisition function for stimulus selection. We thoroughly benchmark our technique both using simulations and by conducting psychovisual experiments under realistic conditions. We show that our method outperforms the state of the art without the need to select a kernel function and significantly reduces the experiment duration.
[7] arXiv:2405.04238 [pdf, other]: Title: Homogeneity of multinomial populations when data are classified into a large number of groups

Authors: M.V. Alba-Fernández, M.D. Jiménez--Gamero, F.J. Ariza-López

Comments: 37 pages, 1 figure

Subjects: Methodology (stat.ME)

Suppose that we are interested in the comparison of two independent categorical variables. Suppose also that the population is divided into subpopulations or groups. Notice that the distribution of the target variable may vary across subpopulations, moreover, it may happen that the two independent variables have the same distribution in the whole population, but their distributions could differ in some groups. So, instead of testing the homogeneity of the two categorical variables, one may be interested in simultaneously testing the homogeneity in all groups. A novel procedure is proposed for carrying out such a testing problem. The test statistic is shown to be asymptotically normal, avoiding the use of complicated resampling methods to get $p$-values. Here by asymptotic we mean when the number of groups increases; the sample sizes of the data from each group can either stay bounded or grow with the number of groups. The finite sample performance of the proposal is empirically evaluated through an extensive simulation study. The usefulness of the proposal is illustrated by three data sets coming from diverse experimental fields such as education, the COVID-19 pandemic and digital elevation models.
[8] arXiv:2405.04254 [pdf, ps, other]: Title: Distributed variable screening for generalized linear models

Authors: Tianbo Diao, Lianqiang Qu, Bo Li, Liuquan Sun

Subjects: Methodology (stat.ME)

In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.
[9] arXiv:2405.04419 [pdf, other]: Title: Transportability of Principal Causal Effects

Authors: Justin M. Clark, Kollin W. Rott, James S. Hodges, Jared D. Huling

Subjects: Methodology (stat.ME)

Recent research in causal inference has made important progress in addressing challenges to the external validity of trial findings. Such methods weight trial participant data to more closely resemble the distribution of effect-modifying covariates in a well-defined target population. In the presence of participant non-adherence to study medication, these methods effectively transport an intention-to-treat effect that averages over heterogeneous compliance behaviors. In this paper, we develop a principal stratification framework to identify causal effects conditioning on both on compliance behavior and membership in the target population. We also develop non-parametric efficiency theory for and construct efficient estimators of such "transported" principal causal effects and characterize their finite-sample performance in simulation experiments. While this work focuses on treatment non-adherence, the framework is applicable to a broad class of estimands that target effects in clinically-relevant, possibly latent subsets of a target population.
[10] arXiv:2405.04446 [pdf, other]: Title: Causal Inference in the Multiverse of Hazard

Authors: En-Yu Lai, Yen-Tsung Huang

Subjects: Methodology (stat.ME)

Hazard serves as a pivotal estimand in both practical applications and methodological frameworks. However, its causal interpretation poses notable challenges, including inherent selection biases and ill-defined populations to be compared between different treatment groups. In response, we propose a novel definition of counterfactual hazard within the framework of possible worlds. Instead of conditioning on prior survival status as a conditional probability, our new definition involves intervening in the prior status, treating it as a marginal probability. Using single-world intervention graphs, we demonstrate that the proposed counterfactual hazard is a type of controlled direct effect. Conceptually, intervening in survival status at each time point generates a new possible world, where the proposed hazards across time points represent risks in these hypothetical scenarios, forming a "multiverse of hazard." The cumulative and average counterfactual hazards correspond to the sum and average of risks across this multiverse, respectively, with the actual world's risk lying between the two. This conceptual shift reframes hazards in the actual world as a collection of risks across possible worlds, marking a significant advancement in the causal interpretation of hazards.
[11] arXiv:2405.04475 [pdf, other]: Title: Bayesian Copula Density Estimation Using Bernstein Yett-Uniform Priors

Authors: Nicolás Kuschinski, Richard Warr, Alejandro Jara

Comments: arXiv admin note: text overlap with arXiv:2109.03768

Subjects: Methodology (stat.ME)

Probability density estimation is a central task in statistics. Copula-based models provide a great deal of flexibility in modelling multivariate distributions, allowing for the specifications of models for the marginal distributions separately from the dependence structure (copula) that links them to form a joint distribution. Choosing a class of copula models is not a trivial task and its misspecification can lead to wrong conclusions. We introduce a novel class of random Bernstein copula functions, and studied its support and the behavior of its posterior distribution. The proposal is based on a particular class of random grid-uniform copulas, referred to as yett-uniform copulas. Alternative Markov chain Monte Carlo algorithms for exploring the posterior distribution under the proposed model are also studied. The methodology is illustrated by means of simulated and real data.
[12] arXiv:2405.04531 [pdf, other]: Title: Stochastic Gradient MCMC for Massive Geostatistical Data

Authors: Mohamed A. Abba, Brian J. Reich, Reetam Majumder, Brandon Feng

Subjects: Methodology (stat.ME); Computation (stat.CO)

Gaussian processes (GPs) are commonly used for prediction and inference for spatial data analyses. However, since estimation and prediction tasks have cubic time and quadratic memory complexity in number of locations, GPs are difficult to scale to large spatial datasets. The Vecchia approximation induces sparsity in the dependence structure and is one of several methods proposed to scale GP inference. Our work adds to the substantial research in this area by developing a stochastic gradient Markov chain Monte Carlo (SGMCMC) framework for efficient computation in GPs. At each step, the algorithm subsamples a minibatch of locations and subsequently updates process parameters through a Vecchia-approximated GP likelihood. Since the Vecchia-approximated GP has a time complexity that is linear in the number of locations, this results in scalable estimation in GPs. Through simulation studies, we demonstrate that SGMCMC is competitive with state-of-the-art scalable GP algorithms in terms of computational time and parameter estimation. An application of our method is also provided using the Argo dataset of ocean temperature measurements.

Cross-lists for Wed, 8 May 24

[13] arXiv:2405.03720 (cross-list from cs.LG) [pdf, other]: Title: Spatial Transfer Learning with Simple MLP

Authors: Hongjian Yang

Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

First step to investigate the potential of transfer learning applied to the field of spatial statistics
[14] arXiv:2405.03723 (cross-list from cs.LG) [pdf, other]: Title: Generative adversarial learning with optimal input dimension and its adaptive generator architecture

Authors: Zhiyao Tan, Ling Zhou, Huazhen Lin

Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We investigate the impact of the input dimension on the generalization error in generative adversarial networks (GANs). In particular, we first provide both theoretical and practical evidence to validate the existence of an optimal input dimension (OID) that minimizes the generalization error. Then, to identify the OID, we introduce a novel framework called generalized GANs (G-GANs), which includes existing GANs as a special case. By incorporating the group penalty and the architecture penalty developed in the paper, G-GANs have several intriguing features. First, our framework offers adaptive dimensionality reduction from the initial dimension to a dimension necessary for generating the target distribution. Second, this reduction in dimensionality also shrinks the required size of the generator network architecture, which is automatically identified by the proposed architecture penalty. Both reductions in dimensionality and the generator network significantly improve the stability and the accuracy of the estimation and prediction. Theoretical support for the consistent selection of the input dimension and the generator network is provided. Third, the proposed algorithm involves an end-to-end training process, and the algorithm allows for dynamic adjustments between the input dimension and the generator network during training, further enhancing the overall performance of G-GANs. Extensive experiments conducted with simulated and benchmark data demonstrate the superior performance of G-GANs. In particular, compared to that of off-the-shelf methods, G-GANs achieves an average improvement of 45.68% in the CT slice dataset, 43.22% in the MNIST dataset and 46.94% in the FashionMNIST dataset in terms of the maximum mean discrepancy or Frechet inception distance. Moreover, the features generated based on the input dimensions identified by G-GANs align with visually significant features.
[15] arXiv:2405.03910 (cross-list from econ.EM) [pdf, other]: Title: A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances

Authors: Yuehao Bai, Azeem M. Shaikh, Max Tabord-Meehan

Subjects: Econometrics (econ.EM); Methodology (stat.ME)

The past two decades have witnessed a surge of new research in the analysis of randomized experiments. The emergence of this literature may seem surprising given the widespread use and long history of experiments as the "gold standard" in program evaluation, but this body of work has revealed many subtle aspects of randomized experiments that may have been previously unappreciated. This article provides an overview of some of these topics, primarily focused on stratification, regression adjustment, and cluster randomization.
[16] arXiv:2405.04043 (cross-list from stat.CO) [pdf, other]: Title: Scalable Vertical Federated Learning via Data Augmentation and Amortized Inference

Authors: Conor Hassan, Matthew Sutton, Antonietta Mira, Kerrie Mengersen

Comments: 30 pages, 5 figures, 3 tables

Subjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Vertical federated learning (VFL) has emerged as a paradigm for collaborative model estimation across multiple clients, each holding a distinct set of covariates. This paper introduces the first comprehensive framework for fitting Bayesian models in the VFL setting. We propose a novel approach that leverages data augmentation techniques to transform VFL problems into a form compatible with existing Bayesian federated learning algorithms. We present an innovative model formulation for specific VFL scenarios where the joint likelihood factorizes into a product of client-specific likelihoods. To mitigate the dimensionality challenge posed by data augmentation, which scales with the number of observations and clients, we develop a factorized amortized variational approximation that achieves scalability independent of the number of observations. We showcase the efficacy of our framework through extensive numerical experiments on logistic regression, multilevel regression, and a novel hierarchical Bayesian split neural net model. Our work paves the way for privacy-preserving, decentralized Bayesian inference in vertically partitioned data scenarios, opening up new avenues for research and applications in various domains.

Replacements for Wed, 8 May 24

[17] arXiv:2012.00180 (replaced) [pdf, other]: Title: Anisotropic local constant smoothing for change-point regression function estimation

Authors: John R.J. Thompson, W. John Braun

Comments: 30 pages, 12 figures. Parts of this Original Manuscript are in an article published by Taylor & Francis in the Journal of Applied Statistics on April 20th 2024, available at this https URL

Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
[18] arXiv:2207.09098 (replaced) [pdf, other]: Title: ReBoot: Distributed statistical learning via refitting bootstrap samples

Authors: Yumeng Wang, Ziwei Zhu, Xuming He

Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[19] arXiv:2304.02563 (replaced) [pdf, other]: Title: The transcoding sampler for stick-breaking inferences on Dirichlet process mixtures

Authors: Carlo Vicentini

Comments: Some sections have been moved to the Appendix, and Appendix A has been updated with a more relevant example

Subjects: Methodology (stat.ME)
[20] arXiv:2311.02043 (replaced) [pdf, other]: Title: Bayesian Quantile Regression with Subset Selection: A Posterior Summarization Perspective

Authors: Joseph Feldman, Daniel Kowal

Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
[21] arXiv:2401.00097 (replaced) [pdf, other]: Title: Recursive identification with regularization and on-line hyperparameters estimation

Authors: Bernard Vau, Tudor-Bogdan Airimitoaie

Comments: this https URL

Subjects: Methodology (stat.ME); Systems and Control (eess.SY); Optimization and Control (math.OC)
[22] arXiv:2401.02048 (replaced) [pdf, other]: Title: Random Effect Restricted Mean Survival Time Model

Authors: Keisuke Hanada, Masahiro Kojima

Subjects: Methodology (stat.ME); Applications (stat.AP)
[23] arXiv:2202.08081 (replaced) [pdf, other]: Title: Reasoning with fuzzy and uncertain evidence using epistemic random fuzzy sets: general framework and practical models

Authors: Thierry Denoeux

Journal-ref: Fuzzy Sets and Systems, Vol. 453, Pages 1-36, 2023

Subjects: Artificial Intelligence (cs.AI); Methodology (stat.ME)
[24] arXiv:2209.01328 (replaced) [pdf, other]: Title: Optimal empirical Bayes estimation for the Poisson model via minimum-distance methods

Authors: Soham Jana, Yury Polyanskiy, Yihong Wu

Comments: 28 pages, 7 figures, 3 tables. Added an extension to a multivariate setup and a comparison with the worst case prior

Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
[25] arXiv:2211.02039 (replaced) [pdf, other]: Title: The Projected Covariance Measure for assumption-lean variable significance testing

Authors: Anton Rask Lundborg, Ilmun Kim, Rajen D. Shah, Richard J. Samworth

Comments: 97 pages, 5 figures

Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
[26] arXiv:2404.01469 (replaced) [pdf, other]: Title: A group testing based exploration of age-varying factors in chlamydia infections among Iowa residents

Authors: Yizeng Li, Dewei Wang, Joshua M. Tebbs

Subjects: Applications (stat.AP); Methodology (stat.ME)

New submissions
Cross-lists
Replacements

[ total of 26 entries: 1-26 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2405, contact, help (Access key information)

> stat > stat.ME

Methodology

New submissions

New submissions for Wed, 8 May 24

Cross-lists for Wed, 8 May 24

Replacements for Wed, 8 May 24