We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 21 entries: 1-21 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 23 Mar 23

[1]  arXiv:2303.12378 [pdf, other]
Title: A functional spatial autoregressive model using signatures
Authors: Camille Frévent
Subjects: Methodology (stat.ME)

We propose a new approach to the autoregressive spatial functional model, based on the notion of signature, which represents a function as an infinite series of its iterated integrals. It presents the advantage of being applicable to a wide range of processes. After having provided theoretical guarantees to the proposed model, we have shown in a simulation study that this new approach presents competitive performances compared to the traditional model.

[2]  arXiv:2303.12462 [pdf, other]
Title: Scalable Bayesian bi-level variable selection in generalized linear models
Subjects: Methodology (stat.ME); Computation (stat.CO)

Motivated by a real-world application in cardiology, we develop an algorithm to perform Bayesian bi-level variable selection in a generalized linear model, for datasets that may be large both in terms of the number of individuals and the number of predictors. Our algorithm relies on the waste-free SMC Sequential Monte Carlo methodology of Dau and Chopin (2022), a new proposal mechanism to deal with the constraints specific to bi-level selection (which forbid to select an individual predictor if its group is not selected), and the ALA (approximate Laplace approximation) approach of Rossell et al. (2021). We show in our numerical study that the algorithm may offer reliable performance on large datasets within a few minutes, on both simulated data and real data related to the aforementioned cardiology application.

[3]  arXiv:2303.12502 [pdf, other]
Title: Measuring agreement among several raters classifying subjects into one-or-more (hierarchical) nominal categories. A generalisation of Fleiss' kappa
Comments: 19 pages, 2 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Cohen's and Fleiss' kappa are well-known measures for inter-rater reliability. However, they only allow a rater to select exactly one category for each subject. This is a severe limitation in some research contexts: for example, measuring the inter-rater reliability of a group of psychiatrists diagnosing patients into multiple disorders is impossible with these measures. This paper proposes a generalisation of the Fleiss' kappa coefficient that lifts this limitation. Specifically, the proposed $\kappa$ statistic measures inter-rater reliability between multiple raters classifying subjects into one-or-more nominal categories. These categories can be weighted according to their importance, and the measure can take into account the category hierarchy (e.g., categories consisting of subcategories that are only available when choosing the main category like a primary psychiatric disorder and sub-disorders; but much more complex dependencies between categories are possible as well). The proposed $\kappa$ statistic can handle missing data and a varying number of raters for subjects or categories. The paper briefly overviews existing methods allowing raters to classify subjects into multiple categories. Next, we derive our proposed measure step-by-step and prove that the proposed measure equals Fleiss' kappa when a fixed number of raters chose one category for each subject. The measure was developed to investigate the reliability of a new mathematics assessment method, of which an example is elaborated. The paper concludes with the worked-out example of psychiatrists diagnosing patients into multiple disorders.

[4]  arXiv:2303.12652 [pdf, other]
Title: Estimation of Complier Expected Shortfall Treatment Effects with a Binary Instrumental Variable
Subjects: Methodology (stat.ME)

Estimating the causal effect of a treatment or exposure for a subpopulation is of great interest in many biomedical and economical studies. Expected shortfall, also referred to as the super-quantile, is an attractive effect-size measure that can accommodate data heterogeneity and aggregate local information of effect over a certain region of interest of the outcome distribution. In this article, we propose the ComplieR Expected Shortfall Treatment Effect (CRESTE) model under an instrumental variable framework to quantity the CRESTE for a binary endogenous treatment variable. By utilizing the special characteristics of a binary instrumental variable and a specific formulation of Neyman-orthogonalization, we propose a two-step estimation procedure, which can be implemented by simply solving weighted least-squares regression and weighted quantile regression with estimated weights. We develop the asymptotic properties for the proposed estimator and use numerical simulations to confirm its validity and robust finite-sample performance. An illustrative analysis of a National Job Training Partnership Act study is presented to show the practical utility of the proposed method.

[5]  arXiv:2303.12677 [pdf, other]
Title: Learning Brain Connectivity in Social Cognition with Dynamic Network Regression
Subjects: Methodology (stat.ME); Applications (stat.AP)

Dynamic networks have been increasingly used to characterize brain connectivity that varies during resting and task states. In such characterizations, a connectivity network is typically measured at each time point for a subject over a common set of nodes representing brain regions, together with rich subject-level information. A common approach to analyzing such data is an edge-based method that models the connectivity between each pair of nodes separately. However, such approach may have limited performance when the noise level is high and the number of subjects is limited, as it does not take advantage of the inherent network structure. To better understand if and how the subject-level covariates affect the dynamic brain connectivity, we introduce a semi-parametric dynamic network response regression that relates a dynamic brain connectivity network to a vector of subject-level covariates. A key advantage of our method is to exploit the structure of dynamic imaging coefficients in the form of high-order tensors. We develop an efficient estimation algorithm and evaluate the efficacy of our approach through simulation studies. Finally, we present our results on the analysis of a task-related study on social cognition in the Human Connectome Project, where we identify known sex-specific effects on brain connectivity that cannot be inferred using alternative methods.

[6]  arXiv:2303.12687 [pdf, other]
Title: On a General Class of Orthogonal Learners for the Estimation of Heterogeneous Treatment Effects
Comments: 61 pages, 8 figures
Subjects: Methodology (stat.ME)

Motivated by applications in personalized medicine and individualized policy making, there is a growing interest in techniques for quantifying treatment effect heterogeneity in terms of the conditional average treatment effect (CATE). Some of the most prominent methods for CATE estimation developed in recent years are T-Learner, DR-Learner and R-Learner. The latter two were designed to improve on the former by being Neyman-orthogonal. However, the relations between them remain unclear, and likewise does the literature remain vague on whether these learners converge to a useful quantity or (functional) estimand when the underlying optimization procedure is restricted to a class of functions that does not include the CATE. In this article, we provide insight into these questions by discussing DR-learner and R-learner as special cases of a general class of Neyman-orthogonal learners for the CATE, for which we moreover derive oracle bounds. Our results shed light on how one may construct Neyman-orthogonal learners with desirable properties, on when DR-learner may be preferred over R-learner (and vice versa), and on novel learners that may sometimes be preferable to either of these. Theoretical findings are confirmed using results from simulation studies on synthetic data, as well as an application in critical care medicine.

Cross-lists for Thu, 23 Mar 23

[7]  arXiv:2303.12385 (cross-list from stat.AP) [pdf, ps, other]
Title: Optimal selection of the starting lineup for a football team
Subjects: Applications (stat.AP); Methodology (stat.ME)

The success of a football team depends on various individual skills and performances of the selected players as well as how cohesively they perform. This work proposes a two-stage process for selecting optimal playing eleven of a football team from its pool of available players. In the first stage, for the reference team, a LASSO-induced modified trinomial logistic regression model is derived to analyze the probabilities of the three possible outcomes. The model takes into account strengths of the players in the team as well as those of the opponent, home advantage, and also the effects of individual players and player combinations beyond the recorded performances of these players. Careful use of the LASSO technique acts as an appropriate enabler of the player selection exercise while keeping the number of variables at a reasonable level. Then, in the second stage, a GRASP-type meta-heuristic is implemented for the team selection which maximizes the probability of win for the team. The work is illustrated with English Premier League data from 2008/09 to 2015/16. The application demonstrates that the model in the first stage furnishes valuable insights about the deciding factors for different teams whereas the optimization steps can be effectively used to determine the best possible starting lineup under various circumstances. Based on the adopted model and methodology, we propose a measure of efficiency in team selection by the team management and analyze the performance of EPL teams on this front.

[8]  arXiv:2303.12401 (cross-list from stat.AP) [pdf, other]
Title: Real-time forecasting within soccer matches through a Bayesian lens
Subjects: Applications (stat.AP); Methodology (stat.ME)

This paper employs a Bayesian methodology to predict the results of soccer matches in real-time. Using sequential data of various events throughout the match, we utilize a multinomial probit regression in a novel framework to estimate the time-varying impact of covariates and to forecast the outcome. English Premier League data from eight seasons are used to evaluate the efficacy of our method. Different evaluation metrics establish that the proposed model outperforms potential competitors, which are inspired from existing statistical or machine learning algorithms. Additionally, we apply robustness checks to demonstrate the model's accuracy across various scenarios.

[9]  arXiv:2303.12657 (cross-list from stat.CO) [pdf, other]
Title: Generalised Linear Mixed Model Specification, Analysis, Fitting, and Optimal Design in R with the glmmr Packages
Authors: Samuel I. Watson
Subjects: Computation (stat.CO); Methodology (stat.ME)

We describe the R package glmmrBase and an extension glmmrOptim. glmmrBase provides a flexible approach to specifying and analysing generalised linear mixed models. We use an object-orientated class system within R to provide methods for a wide range of covariance and mean functions relevant to multiple applications including cluster randomised trials, cohort studies, spatial and spatio-temporal modelling, and split-plot designs. The class generates relevant matrices and statistics and a wide range of methods including full likelihood estimation of generalised linear mixed models using Markov Chain Monte Carlo Maximum Likelihood, Laplace approximation, power calculation, and access to relevant calculations. The class also includes Hamiltonian Monte Carlo simulation of random effects, sparse matrix methods, and other functionality to support efficient estimation. The glmmrOptim package implements a set of algorithms to identify c-optimal experimental designs where observations are correlated and can be specified using a generalised linear mixed model. Several examples and comparisons to existing packages are provided to illustrate use of the packages.

[10]  arXiv:2303.12683 (cross-list from stat.AP) [pdf, other]
Title: Knowing what to know: Implications of the choice of prior distribution on the behavior of adaptive design optimization
Comments: Submitted for journal publication
Subjects: Applications (stat.AP); Methodology (stat.ME)

Adaptive design optimization (ADO) is a state-of-the-art technique for experimental design (Cavagnaro, Myung, Pitt, & Kujala, 2010). ADO dynamically identifies stimuli that, in expectation, yield the most information about a hypothetical construct of interest (e.g., parameters of a cognitive model). To calculate this expectation, ADO leverages the modeler's existing knowledge, specified in the form of a prior distribution. Informative priors align with the distribution of the focal construct in the participant population. This alignment is assumed by ADO's internal assessment of expected information gain. If the prior is instead misinformative, i.e., does not align with the participant population, ADO's estimates of expected information gain could be inaccurate. In many cases, the true distribution that characterizes the participant population is unknown, and experimenters rely on heuristics in their choice of prior and without an understanding of how this choice affects ADO's behavior.
Our work introduces a mathematical framework that facilitates investigation of the consequences of the choice of prior distribution on the efficiency of experiments designed using ADO. Through theoretical and empirical results, we show that, in the context of prior misinformation, measures of expected information gain are distinct from the correctness of the corresponding inference. Through a series of simulation experiments, we show that, in the case of parameter estimation, ADO nevertheless outperforms other design methods. Conversely, in the case of model selection, misinformative priors can lead inference to favor the wrong model, and rather than mitigating this pitfall, ADO exacerbates it.

[11]  arXiv:2303.12703 (cross-list from cs.LG) [pdf, other]
Title: Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning
Comments: Camera ready version for ICLR 2023
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

Latent confounding has been a long-standing obstacle for causal reasoning from observational data. One popular approach is to model the data using acyclic directed mixed graphs (ADMGs), which describe ancestral relations between variables using directed and bidirected edges. However, existing methods using ADMGs are based on either linear functional assumptions or a discrete search that is complicated to use and lacks computational tractability for large datasets. In this work, we further extend the existing body of work and develop a novel gradient-based approach to learning an ADMG with non-linear functional relations from observational data. We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with non-linear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows for ADMG learning. This not only enables us to determine complex causal structural relationships behind the data in the presence of latent confounding, but also estimate their functional relationships (hence treatment effects) simultaneously. We further validate our approach via experiments on both synthetic and real-world datasets, and demonstrate the competitive performance against relevant baselines.

Replacements for Thu, 23 Mar 23

[12]  arXiv:2011.05493 (replaced) [pdf, ps, other]
Title: Robust and flexible learning of a high-dimensional classification rule using auxiliary outcomes
Comments: 19 pages, 2 figures
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[13]  arXiv:2204.10969 (replaced) [pdf, other]
Title: When Doubly Robust Methods Meet Machine Learning for Estimating Treatment Effects from Real-World Data: A Comparative Study
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
[14]  arXiv:2208.10436 (replaced) [pdf, other]
Title: Towards standard imsets for maximal ancestral graphs
Subjects: Methodology (stat.ME)
[15]  arXiv:2209.03482 (replaced) [pdf, other]
Title: High-Dimensional Inference for Generalized Linear Models with Hidden Confounding
Subjects: Methodology (stat.ME)
[16]  arXiv:2211.15018 (replaced) [src]
Title: Causal Inference with Confounders MNAR under Treatment-independent Missingness Assumption
Authors: Jian Sun, Bo Fu
Comments: This paper is updated and the new version is on arXiv:2303.05878
Subjects: Methodology (stat.ME)
[17]  arXiv:2212.05562 (replaced) [pdf, ps, other]
Title: Retire: Robust Expectile Regression in High Dimensions
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[18]  arXiv:2301.11472 (replaced) [pdf, ps, other]
Title: A Spatial Zero-Inflated Conway--Maxwell--Poisson Regression Model for US Vaccine Refusal
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
[19]  arXiv:2302.02468 (replaced) [pdf, other]
Title: Circular and spherical projected Cauchy distributions
Comments: Preprint
Subjects: Methodology (stat.ME)
[20]  arXiv:2302.02590 (replaced) [pdf, ps, other]
Title: Consensus dynamics and coherence in hierarchical small-world networks
Subjects: Methodology (stat.ME); Discrete Mathematics (cs.DM)
[21]  arXiv:2009.02776 (replaced) [pdf, other]
Title: Matching Bounds: How Choice of Matching Algorithm Impacts Treatment Effects Estimates and What to Do about It
Subjects: Applications (stat.AP); Methodology (stat.ME)
[ total of 21 entries: 1-21 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2303, contact, help  (Access key information)