Methodology
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Thu, 28 Mar 24
- [1] arXiv:2403.17948 [pdf, ps, other]
-
Title: The Rule of link functions on Binomial Regression Model: A Cross Sectional Study on Child Malnutrition, BangladeshAuthors: Md Mehedi Hasan BhuiyanSubjects: Methodology (stat.ME)
Link function is a key tool in the binomial regression model defined as non-linear model under GLM approach. It transforms the nonlinear regression to linear model with converting the interval (-\infty,\infty) to the probability [0,1]. The binomial model with link functions (logit, probit, cloglog and cauchy) are applied on the proportional of child malnutrition age 0-5 years in each household level. Multiple Indicator Cluster survey (MICS)-2019, Bangladesh was conducted by a joint cooperation of UNICEF and BBS . The survey covered 64000 households using two stage stratified sampling technique, where around 21000 household have children age 0-5 years. We use bi-variate analysis to find the statistical association between response and sociodemographic features. In the binary regression model, probit model provides the best result based on the lowest standard error of covariates and goodness of fit test (deviance, AIC).
- [2] arXiv:2403.17982 [pdf, ps, other]
-
Title: Markov chain models for inspecting response dynamics in psychological testingAuthors: Andrea BoscoComments: 20 pages, 1 figure, 3 tables, 25 equations/matrices. Part of this paper was presented to the XXIX AIP Congress, Experimental Psychology Section. September 18th-20th 2023, Lucca, Italy. Title of the talk: "Differentiating students with signs of ADHD or OCD based on hysteresis in responses to a mind-wandering test. A Study of Markov Chain Test Response Sequences"Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Probability (math.PR)
The importance of considering contextual probabilities in shaping response patterns within psychological testing is underscored, despite the ubiquitous nature of order effects discussed extensively in methodological literature. Drawing from concepts such as path-dependency, first-order autocorrelation, state-dependency, and hysteresis, the present study is an attempt to address how earlier responses serve as an anchor for subsequent answers in tests, surveys, and questionnaires. Introducing the notion of non-commuting observables derived from quantum physics, I highlight their role in characterizing psychological processes and the impact of measurement instruments on participants' responses. We advocate for the utilization of first-order Markov chain modeling to capture and forecast sequential dependencies in survey and test responses. The employment of the first-order Markov chain model lies in individuals' propensity to exhibit partial focus to preceding responses, with recent items most likely exerting a substantial influence on subsequent response selection. This study contributes to advancing our understanding of the dynamics inherent in sequential data within psychological research and provides a methodological framework for conducting longitudinal analyses of response patterns of test and questionnaire.
- [3] arXiv:2403.17986 [pdf, other]
-
Title: Comment on "Safe Testing" by Grünwald, de Heide, and KoolenAuthors: Joris MulderComments: 2 pages, 1 figure; comment on arXiv:1906.07801Subjects: Methodology (stat.ME)
This comment briefly reflects on "Safe Testing" by Gr\"{u}wald et al. (2024). The safety of fractional Bayes factors (O'Hagan, 1995) is illustrated and compared to (safe) Bayes factors based on the right Haar prior.
- [4] arXiv:2403.18039 [pdf, other]
-
Title: Doubly robust causal inference through penalized bias-reduced estimation: combining non-probability samples with designed surveysSubjects: Methodology (stat.ME)
Causal inference on the average treatment effect (ATE) using non-probability samples, such as electronic health records (EHR), faces challenges from sample selection bias and high-dimensional covariates. This requires considering a selection model alongside treatment and outcome models that are typical ingredients in causal inference. This paper considers integrating large non-probability samples with external probability samples from a design survey, addressing moderately high-dimensional confounders and variables that influence selection. In contrast to the two-step approach that separates variable selection and debiased estimation, we propose a one-step plug-in doubly robust (DR) estimator of the ATE. We construct a novel penalized estimating equation by minimizing the squared asymptotic bias of the DR estimator. Our approach facilitates ATE inference in high-dimensional settings by ignoring the variability in estimating nuisance parameters, which is not guaranteed in conventional likelihood approaches with non-differentiable L1-type penalties. We provide a consistent variance estimator for the DR estimator. Simulation studies demonstrate the double robustness of our estimator under misspecification of either the outcome model or the selection and treatment models, as well as the validity of statistical inference under penalized estimation. We apply our method to integrate EHR data from the Michigan Genomics Initiative with an external probability sample.
- [5] arXiv:2403.18069 [pdf, other]
-
Title: Personalized Imputation in metric spaces via conformal prediction: Applications in Predicting Diabetes Development with Continuous Glucose Monitoring InformationSubjects: Methodology (stat.ME); Applications (stat.AP)
The challenge of handling missing data is widespread in modern data analysis, particularly during the preprocessing phase and in various inferential modeling tasks. Although numerous algorithms exist for imputing missing data, the assessment of imputation quality at the patient level often lacks personalized statistical approaches. Moreover, there is a scarcity of imputation methods for metric space based statistical objects. The aim of this paper is to introduce a novel two-step framework that comprises: (i) a imputation methods for statistical objects taking values in metrics spaces, and (ii) a criterion for personalizing imputation using conformal inference techniques. This work is motivated by the need to impute distributional functional representations of continuous glucose monitoring (CGM) data within the context of a longitudinal study on diabetes, where a significant fraction of patients do not have available CGM profiles. The importance of these methods is illustrated by evaluating the effectiveness of CGM data as new digital biomarkers to predict the time to diabetes onset in healthy populations. To address these scientific challenges, we propose: (i) a new regression algorithm for missing responses; (ii) novel conformal prediction algorithms tailored for metric spaces with a focus on density responses within the 2-Wasserstein geometry; (iii) a broadly applicable personalized imputation method criterion, designed to enhance both of the aforementioned strategies, yet valid across any statistical model and data structure. Our findings reveal that incorporating CGM data into diabetes time-to-event analysis, augmented with a novel personalization phase of imputation, significantly enhances predictive accuracy by over ten percent compared to traditional predictive models for time to diabetes.
- [6] arXiv:2403.18115 [pdf, other]
-
Title: Assessing COVID-19 Vaccine Effectiveness in Observational Studies via Nested Trial EmulationComments: 27 pages, 2 figuresSubjects: Methodology (stat.ME); Applications (stat.AP)
Observational data are often used to estimate real-world effectiveness and durability of coronavirus disease 2019 (COVID-19) vaccines. A sequence of nested trials can be emulated to draw inference from such data while minimizing selection bias, immortal time bias, and confounding. Typically, when nested trial emulation (NTE) is employed, effect estimates are pooled across trials to increase statistical efficiency. However, such pooled estimates may lack a clear interpretation when the treatment effect is heterogeneous across trials. In the context of COVID-19, vaccine effectiveness quite plausibly will vary over calendar time due to newly emerging variants of the virus. This manuscript considers a NTE inverse probability weighted estimator of vaccine effectiveness that may vary over calendar time, time since vaccination, or both. Statistical testing of the trial effect homogeneity assumption is considered. Simulation studies are presented examining the finite-sample performance of these methods under a variety of scenarios. The methods are used to estimate vaccine effectiveness against COVID-19 outcomes using observational data on over 120,000 residents of Abruzzo, Italy during 2021.
- [7] arXiv:2403.18464 [pdf, other]
-
Title: Cumulative Incidence Function Estimation Based on Population-Based Biobank DataSubjects: Methodology (stat.ME)
Many countries have established population-based biobanks, which are being used increasingly in epidemiolgical and clinical research. These biobanks offer opportunities for large-scale studies addressing questions beyond the scope of traditional clinical trials or cohort studies. However, using biobank data poses new challenges. Typically, biobank data is collected from a study cohort recruited over a defined calendar period, with subjects entering the study at various ages falling between $c_L$ and $c_U$. This work focuses on biobank data with individuals reporting disease-onset age upon recruitment, termed prevalent data, along with individuals initially recruited as healthy, and their disease onset observed during the follow-up period. We propose a novel cumulative incidence function (CIF) estimator that efficiently incorporates prevalent cases, in contrast to existing methods, providing two advantages: (1) increased efficiency, and (2) CIF estimation for ages before the lower limit, $c_L$.
- [8] arXiv:2403.18549 [pdf, other]
-
Title: A communication-efficient, online changepoint detection method for monitoring distributed sensor networksComments: 36 pages, 8 figures, 5 tables, accepted by Statistics and ComputingSubjects: Methodology (stat.ME)
We consider the challenge of efficiently detecting changes within a network of sensors, where we also need to minimise communication between sensors and the cloud. We propose an online, communication-efficient method to detect such changes. The procedure works by performing likelihood ratio tests at each time point, and two thresholds are chosen to filter unimportant test statistics and make decisions based on the aggregated test statistics respectively. We provide asymptotic theory concerning consistency and the asymptotic distribution if there are no changes. Simulation results suggest that our method can achieve similar performance to the idealised setting, where we have no constraints on communication between sensors, but substantially reduce the transmission costs.
- [9] arXiv:2403.18602 [pdf, other]
-
Title: Collaborative graphical lassoSubjects: Methodology (stat.ME); Molecular Networks (q-bio.MN)
In recent years, the availability of multi-omics data has increased substantially. Multi-omics data integration methods mainly aim to leverage different molecular data sets to gain a complete molecular description of biological processes. An attractive integration approach is the reconstruction of multi-omics networks. However, the development of effective multi-omics network reconstruction strategies lags behind. This hinders maximizing the potential of multi-omics data sets. With this study, we advance the frontier of multi-omics network reconstruction by introducing "collaborative graphical lasso" as a novel strategy. Our proposed algorithm synergizes "graphical lasso" with the concept of "collaboration", effectively harmonizing multi-omics data sets integration, thereby enhancing the accuracy of network inference. Besides, to tackle model selection in this framework, we designed an ad hoc procedure based on network stability. We assess the performance of collaborative graphical lasso and the corresponding model selection procedure through simulations, and we apply them to publicly available multi-omics data. This demonstrated collaborative graphical lasso is able to reconstruct known biological connections and suggest previously unknown and biologically coherent interactions, enabling the generation of novel hypotheses. We implemented collaborative graphical lasso as an R package, available on CRAN as coglasso.
Cross-lists for Thu, 28 Mar 24
- [10] arXiv:2403.18072 (cross-list from stat.CO) [pdf, other]
-
Title: Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte CarloSubjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Optimal experimental design (OED) provides a systematic approach to quantify and maximize the value of experimental data. Under a Bayesian approach, conventional OED maximizes the expected information gain (EIG) on model parameters. However, we are often interested in not the parameters themselves, but predictive quantities of interest (QoIs) that depend on the parameters in a nonlinear manner. We present a computational framework of predictive goal-oriented OED (GO-OED) suitable for nonlinear observation and prediction models, which seeks the experimental design providing the greatest EIG on the QoIs. In particular, we propose a nested Monte Carlo estimator for the QoI EIG, featuring Markov chain Monte Carlo for posterior sampling and kernel density estimation for evaluating the posterior-predictive density and its Kullback-Leibler divergence from the prior-predictive. The GO-OED design is then found by maximizing the EIG over the design space using Bayesian optimization. We demonstrate the effectiveness of the overall nonlinear GO-OED method, and illustrate its differences versus conventional non-GO-OED, through various test problems and an application of sensor placement for source inversion in a convection-diffusion field.
- [11] arXiv:2403.18245 (cross-list from stat.CO) [pdf, other]
-
Title: LocalCop: An R package for local likelihood inference for conditional copulasComments: 6 pages, 2 figures; submitted to the Journal of Open Source Software (JOSS)Subjects: Computation (stat.CO); Methodology (stat.ME)
Conditional copulas models allow the dependence structure between multiple response variables to be modelled as a function of covariates. LocalCop (Acar & Lysy, 2024) is an R/C++ package for computationally efficient semiparametric conditional copula modelling using a local likelihood inference framework developed in Acar, Craiu, & Yao (2011), Acar, Craiu, & Yao (2013) and Acar, Czado, & Lysy (2019).
- [12] arXiv:2403.18782 (cross-list from math.ST) [pdf, ps, other]
-
Title: Beyond boundaries: Gary Lorden's groundbreaking contributions to sequential analysisSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Gary Lorden provided a number of fundamental and novel insights to sequential hypothesis testing and changepoint detection. In this article we provide an overview of Lorden's contributions in the context of existing results in those areas, and some extensions made possible by Lorden's work, mentioning also areas of application including threat detection in physical-computer systems, near-Earth space informatics, epidemiology, clinical trials, and finance.
Replacements for Thu, 28 Mar 24
- [13] arXiv:2109.05755 (replaced) [src]
-
Title: IQ: Intrinsic measure for quantifying the heterogeneity in meta-analysisComments: With a move comprehensive version with the new title "An alternative measure for quantifying the heterogeneity in meta-analysis", this old version is no longer most suitable to be posted in the arXiv. We hence will submit the new version with a new title as arXiv:2403.16706 and withdraw this outdated version. Thank you very much for your kind considerationSubjects: Methodology (stat.ME)
- [14] arXiv:2207.07020 (replaced) [pdf, other]
-
Title: Estimating sparse direct effects in multivariate regression with the spike-and-slab LASSOSubjects: Methodology (stat.ME)
- [15] arXiv:2306.13829 (replaced) [pdf, other]
-
Title: Selective inference using randomized group lasso estimators for general modelsComments: 64pages, 4 figures, 3 tablesSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
- [16] arXiv:2306.15173 (replaced) [pdf, other]
-
Title: Robust propensity score weighting estimation under missing at randomSubjects: Methodology (stat.ME)
- [17] arXiv:2307.00567 (replaced) [pdf, other]
-
Title: A Note on Ising Network Analysis with Missing DataSubjects: Methodology (stat.ME)
- [18] arXiv:2307.09713 (replaced) [pdf, other]
-
Title: Non-parametric inference on calibration of predicted risksComments: 15 pages (including 2 appendices), 5 figures, 0 tablesSubjects: Methodology (stat.ME); Applications (stat.AP)
- [19] arXiv:2308.11138 (replaced) [pdf, ps, other]
-
Title: NLP-based detection of systematic anomalies among the narratives of consumer complaintsSubjects: Methodology (stat.ME); Computation and Language (cs.CL); Risk Management (q-fin.RM); Machine Learning (stat.ML)
- [20] arXiv:2309.10978 (replaced) [pdf, ps, other]
-
Title: Negative Spillover: A Potential Source of Bias in Pragmatic Clinical TrialsAuthors: Sean MannComments: 6.5 pages of main text, 2 figures, 1 table; New version with title change and minor edits to main textSubjects: Methodology (stat.ME); Quantitative Methods (q-bio.QM)
- [21] arXiv:2310.11471 (replaced) [pdf, other]
-
Title: Modeling lower-truncated and right-censored insurance claims with an extension of the MBBEFD classComments: 36 pagesSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
- [22] arXiv:2310.13580 (replaced) [pdf, other]
-
Title: Bayesian Hierarchical Modeling for Bivariate Multiscale Spatial Data with Application to Blood Test MonitoringSubjects: Methodology (stat.ME); Applications (stat.AP)
- [23] arXiv:2310.16502 (replaced) [pdf, other]
-
Title: Assessing the overall and partial causal well-specification of nonlinear additive noise modelsSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
- [24] arXiv:2401.00624 (replaced) [pdf, other]
-
Title: Semi-Confirmatory Factor Analysis for High-Dimensional Data with Interconnected Community StructuresSubjects: Methodology (stat.ME)
- [25] arXiv:2401.16749 (replaced) [pdf, other]
-
Title: Bayesian scalar-on-network regression with applications to brain functional connectivitySubjects: Methodology (stat.ME)
- [26] arXiv:2403.17481 (replaced) [pdf, ps, other]
- [27] arXiv:2306.10594 (replaced) [pdf, other]
- [28] arXiv:2402.07868 (replaced) [pdf, ps, other]
-
Title: Nesting Particle Filters for Experimental Design in Dynamical SystemsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
- [29] arXiv:2403.07236 (replaced) [pdf, other]
-
Title: Partial Identification of Individual-Level Parameters Using Aggregate Data in a Nonparametric Binary Outcome ModelAuthors: Sarah MoonSubjects: Econometrics (econ.EM); Methodology (stat.ME)
- [30] arXiv:2403.15198 (replaced) [pdf, ps, other]
-
Title: On the Weighted Top-Difference Distance: Axioms, Aggregation, and ApproximationComments: 64 pagesSubjects: Computer Science and Game Theory (cs.GT); Discrete Mathematics (cs.DM); Theoretical Economics (econ.TH); Methodology (stat.ME)
- [31] arXiv:2403.16828 (replaced) [pdf, other]
-
Title: Asymptotics of predictive distributions driven by sample means and variancesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2403, contact, help (Access key information)