We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 15 entries: 1-15 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Fri, 31 Mar 23

[1]  arXiv:2303.17091 [pdf, ps, other]
Title: Exact sequential single-arm trial design with curtailment for binary endpoint
Subjects: Methodology (stat.ME)

Due to ethical and economical reasons, sequential single-arm trial designs are used for assessing the therapeutic efficacy of new treatments in phase II trials. Simon's 2-stage design and Lan-DeMets' $\alpha$-spending function method with O'Brien-Fleming type are widely recognized as the traditional methods for futility stopping and efficacy stopping, respectively. These methods have two practical problems, which are the difficulty of interpretation for stopping under staggered entry and the inflation of error rate due to a small-sample trial. In this research, we propose the exact sequential design making the threshold value for efficacy fixed, and compare with traditional designs in sample size. Since the maximum sample size and average sample number of the proposed design are generally smaller than those of traditional designs containing fixed design, the proposed design is expected to be enrolled fewer subjects. In addition, we evaluate several kinds of point estimators and confidence intervals at the end of trials in the proposed design. If one is concerned with bias, the bias-adjusted estimator may be better. As for a confidence interval, the mid-p approach will be a good choice.

[2]  arXiv:2303.17102 [pdf, other]
Title: When is the estimated propensity score better? High-dimensional analysis and bias correction
Comments: Fangzhou Su and Wenlong Mou contributed equally to this work
Subjects: Methodology (stat.ME)

Anecdotally, using an estimated propensity score is superior to the true propensity score in estimating the average treatment effect based on observational data. However, this claim comes with several qualifications: it holds only if propensity score model is correctly specified and the number of covariates $d$ is small relative to the sample size $n$. We revisit this phenomenon by studying the inverse propensity score weighting (IPW) estimator based on a logistic model with a diverging number of covariates. We first show that the IPW estimator based on the estimated propensity score is consistent and asymptotically normal with smaller variance than the oracle IPW estimator (using the true propensity score) if and only if $n \gtrsim d^2$. We then propose a debiased IPW estimator that achieves the same guarantees in the regime $n \gtrsim d^{3/2}$. Our proofs rely on a novel non-asymptotic decomposition of the IPW error along with careful control of the higher order terms.

[3]  arXiv:2303.17182 [pdf, other]
Title: A review on Bayesian model-based clustering
Authors: Clara Grazian
Subjects: Methodology (stat.ME)

Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random object and define a prior distribution on it. This prior distribution may be induced by models on the observations, or directly defined for the partition. Several recent results, however, have shown the difficulties in consistently estimating the number of clusters, and, therefore, the partition. The problem itself of summarising the posterior distribution on the partition remains open, given the large dimension of the partition space. This work aims at reviewing the Bayesian approaches available in the literature to perform clustering, presenting advantages and disadvantages of each of them in order to suggest future lines of research.

[4]  arXiv:2303.17217 [pdf, other]
Title: Bayesian inference of grid cell firing patterns using Poisson point process models with latent oscillatory Gaussian random fields
Comments: 44 pages, 16 figures
Subjects: Methodology (stat.ME); Applications (stat.AP)

Questions about information encoded by the brain demand statistical frameworks for inferring relationships between neural firing and features of the world. The landmark discovery of grid cells demonstrates that neurons can represent spatial information through regularly repeating firing fields. However, the influence of covariates may be masked in current statistical models of grid cell activity, which by employing approaches such as discretizing, aggregating and smoothing, are computationally inefficient and do not account for the continuous nature of the physical world. These limitations motivated us to develop likelihood-based procedures for modelling and estimating the firing activity of grid cells conditionally on biologically relevant covariates. Our approach models firing activity using Poisson point processes with latent Gaussian effects, which accommodate persistent inhomogeneous spatial-directional patterns and overdispersion. Inference is performed in a fully Bayesian manner, which allows us to quantify uncertainty. Applying these methods to experimental data, we provide evidence for temporal and local head direction effects on grid firing. Our approaches offer a novel and principled framework for analysis of neural representations of space.

[5]  arXiv:2303.17277 [pdf, other]
Title: Cross-temporal Probabilistic Forecast Reconciliation
Comments: 34 pages, 9 figures, 11 tables
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)

Forecast reconciliation is a post-forecasting process that involves transforming a set of incoherent forecasts into coherent forecasts which satisfy a given set of linear constraints for a multivariate time series. In this paper we extend the current state-of-the-art cross-sectional probabilistic forecast reconciliation approach to encompass a cross-temporal framework, where temporal constraints are also applied. Our proposed methodology employs both parametric Gaussian and non-parametric bootstrap approaches to draw samples from an incoherent cross-temporal distribution. To improve the estimation of the forecast error covariance matrix, we propose using multi-step residuals, especially in the time dimension where the usual one-step residuals fail. To address high-dimensionality issues, we present four alternatives for the covariance matrix, where we exploit the two-fold nature (cross-sectional and temporal) of the cross-temporal structure, and introduce the idea of overlapping residuals. We evaluate the proposed methods through a simulation study that investigates their theoretical and empirical properties. We further assess the effectiveness of the proposed cross-temporal reconciliation approach by applying it to two empirical forecasting experiments, using the Australian GDP and the Australian Tourism Demand datasets. For both applications, we show that the optimal cross-temporal reconciliation approaches significantly outperform the incoherent base forecasts in terms of the Continuous Ranked Probability Score and the Energy Score. Overall, our study expands and unifies the notation for cross-sectional, temporal and cross-temporal reconciliation, thus extending and deepening the probabilistic cross-temporal framework. The results highlight the potential of the proposed cross-temporal forecast reconciliation methods in improving the accuracy of probabilistic forecasting models.

[6]  arXiv:2303.17331 [pdf, other]
Title: Multiple Imputation Approaches for Epoch-level Accelerometer data in Trials
Comments: 32 pages, 16 figures, 2 tables
Subjects: Methodology (stat.ME); Applications (stat.AP)

Clinical trials that investigate interventions on physical activity often use accelerometers to measure step count at a very granular level, often in 5-second epochs. Participants typically wear the accelerometer for a week-long period at baseline, and for one or more week-long follow-up periods after the intervention. The data is usually aggregated to provide daily or weekly step counts for the primary analysis. Missing data are common as participants may not wear the device as per protocol. Approaches to handling missing data in the literature have largely defined missingness on the day level using a threshold on daily wear time, which leads to loss of information on the time of day when data are missing. We propose an approach to identifying and classifying missingness at the finer epoch-level, and then present two approaches to handling missingness. Firstly, we present a parametric approach which takes into account the number of missing epochs per day. Secondly, we describe a non-parametric approach to Multiple Imputation (MI) where missing periods during the day are replaced by donor data from the same person where possible, or data from a different person who is matched on demographic and physical activity-related variables. Our simulation studies comparing these approaches in a number of settings show that the non-parametric approach leads to estimates of the effect of treatment that are least biased while maintaining small standard errors. We illustrate the application of these different MI strategies to the analysis of the 2017 PACE-UP Trial. The proposed framework of classifying missingness and applying MI at the epoch-level is likely to be applicable to a number of different outcomes and data from other wearable devices.

[7]  arXiv:2303.17478 [pdf, other]
Title: A Bayesian Dirichlet Auto-Regressive Moving Average Model for Forecasting Lead Times
Subjects: Methodology (stat.ME); Applications (stat.AP)

Lead time data is compositional data found frequently in the hospitality industry. Hospitality businesses earn fees each day, however these fees cannot be recognized until later. For business purposes, it is important to understand and forecast the distribution of future fees for the allocation of resources, for business planning, and for staffing. Motivated by 5 years of daily fees data, we propose a new class of Bayesian time series models, a Bayesian Dirichlet Auto-Regressive Moving Average (B-DARMA) model for compositional time series, modeling the proportion of future fees that will be recognized in 11 consecutive 30 day windows and 1 last consecutive 35 day window. Each day's compositional datum is modeled as Dirichlet distributed given the mean and a scale parameter. The mean is modeled with a Vector Autoregressive Moving Average process after transforming with an additive log ratio link function and depends on previous compositional data, previous compositional parameters and daily covariates. The B-DARMA model offers solutions to data analyses of large compositional vectors and short or long time series, offers efficiency gains through choice of priors, provides interpretable parameters for inference, and makes reasonable forecasts.

Cross-lists for Fri, 31 Mar 23

[8]  arXiv:2303.17230 (cross-list from math.ST) [pdf, other]
Title: KOO approach for scalable variable selection problem in large-dimensional regression
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

An important issue in many multivariate regression problems is eliminating candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics have the potential to meet this challenge. In this paper, the strong consistency and the central limit theorem of the KOO statistics are derived under the LD setting and mild distributional assumptions (finite fourth moments) of the errors. These theoretical results lead us to propose a subset selection rule based on the KOO statistics with the bootstrap threshold. Simulation results support our conclusions and demonstrate the selection probabilities by the KOO approach with the bootstrap threshold outperform the methods using Akaike information threshold, Bayesian information threshold and Mallow's C$_p$ threshold. We compare the proposed KOO approach with those based on information threshold to a chemometrics dataset and a yeast cell-cycle dataset, which suggests our proposed method identifies useful models.

[9]  arXiv:2303.17425 (cross-list from math.ST) [pdf, other]
Title: A possibility-theoretic solution to Basu's Bayesian--frequentist via media
Authors: Ryan Martin
Comments: Comments welcome at this https URL
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Other Statistics (stat.OT)

Basu's via media is what he referred to as the middle road between the Bayesian and frequentist poles. He seemed skeptical that a suitable via media could be found, but I disagree. My basic claim is that the likelihood alone can't reliably support probabilistic inference, and I justify this by considering a technical trap that Basu stepped in concerning interpretation of the likelihood. While reliable probabilistic inference is out of reach, it turns out that reliable possibilistic inference is not. I lay out my proposed possibility-theoretic solution to Basu's via media and I investigate how the flexibility afforded by my imprecise-probabilistic solution can be leveraged to achieve the likelihood principle (or something close to it).

[10]  arXiv:2303.17482 (cross-list from cs.AI) [pdf]
Title: Three-way causal attribute partial order structure analysis
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)

As an emerging concept cognitive learning model, partial order formal structure analysis (POFSA) has been widely used in the field of knowledge processing. In this paper, we propose the method named three-way causal attribute partial order structure (3WCAPOS) to evolve the POFSA from set coverage to causal coverage in order to increase the interpretability and classification performance of the model. First, the concept of causal factor (CF) is proposed to evaluate the causal correlation between attributes and decision attributes in the formal decision context. Then, combining CF with attribute partial order structure, the concept of causal attribute partial order structure is defined and makes set coverage evolve into causal coverage. Finally, combined with the idea of three-way decision, 3WCAPOS is formed, which makes the purity of nodes in the structure clearer and the changes between levels more obviously. In addition, the experiments are carried out from the classification ability and the interpretability of the structure through the six datasets. Through these experiments, it is concluded the accuracy of 3WCAPOS is improved by 1% - 9% compared with classification and regression tree, and more interpretable and the processing of knowledge is more reasonable compared with attribute partial order structure.

Replacements for Fri, 31 Mar 23

[11]  arXiv:2302.07340 (replaced) [pdf, other]
Title: Functional proportional hazards mixture cure model and its application to modelling the association between cancer mortality and physical activity in NHANES 2003-2006
Subjects: Methodology (stat.ME); Applications (stat.AP)
[12]  arXiv:1911.03764 (replaced) [pdf, other]
Title: Optimal Experimental Design for Staggered Rollouts
Subjects: Econometrics (econ.EM); Methodology (stat.ME); Machine Learning (stat.ML)
[13]  arXiv:2205.03486 (replaced) [pdf, other]
Title: Clustered Graph Matching for Label Recovery and Graph Classification
Comments: 22 pages, 8 figures, 5 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[14]  arXiv:2206.04781 (replaced) [pdf, other]
Title: Land-Use Filtering for Nonstationary Prediction of Collective Efficacy in an Urban Environment
Subjects: Applications (stat.AP); Methodology (stat.ME)
[15]  arXiv:2303.00515 (replaced) [pdf, other]
Title: Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
[ total of 15 entries: 1-15 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2303, contact, help  (Access key information)