New submissions for Wed, 23 Sep 20

[1]  arXiv:2009.10117 [pdf, other]
Title: Sample Size Calculation for Cluster Randomized Trials with Zero-inflated Count Outcomes
Subjects: Applications (stat.AP); Methodology (stat.ME)

Cluster randomized trails (CRT) have been widely employed in medical and public health research. Many clinical count outcomes, such as the number of falls in nursing homes, exhibit excessive zero values. In the presence of zero inflation, traditional power analysis methods for count data based on Poisson or negative binomial distribution may be inadequate. In this study, we present a sample size method for CRTs with zero-inflated count outcomes. It is developed based on GEE regression directly modeling the marginal mean of a ZIP outcome, which avoids the challenge of testing two intervention effects under traditional modeling approaches. A closed-form sample size formula is derived which properly accounts for zero inflation, ICCs due to clustering, unbalanced randomization, and variability in cluster size. Robust approaches, including t-distribution-based approximation and Jackknife re-sampling variance estimator, are employed to enhance trial properties under small sample sizes. Extensive simulations are conducted to evaluate the performance of the proposed method. An application example is presented in a real clinical trial setting.

[2]  arXiv:2009.10265 [pdf, other]
Title: A Bias Correction Method in Meta-analysis of Randomized Clinical Trials with no Adjustments for Zero-inflated Outcomes
Subjects: Applications (stat.AP); Methodology (stat.ME)

Many clinical endpoint measures, such as the number of standard drinks consumed per week or the number of days that patients stayed in the hospital, are count data with excessive zeros. However, the zero-inflated nature of such outcomes is often ignored in analyses, which leads to biased estimates and, consequently, a biased estimate of the overall intervention effect in a meta-analysis. The current study proposes a novel statistical approach, the Zero-inflation Bias Correction (ZIBC) method, that can account for the bias introduced when using the Poisson regression model despite a high rate of zeros in the outcome distribution for randomized clinical trials. This correction method utilizes summary information from individual studies to correct intervention effect estimates as if they were appropriately estimated in zero-inflated Poisson regression models. Simulation studies and real data analyses show that the ZIBC method has good performance in correcting zero-inflation bias in many situations. This method provides a methodological solution in improving the accuracy of meta-analysis results, which is important to evidence-based medicine.

[3]  arXiv:2009.10426 [pdf, other]
Title: Effects of winter climate on high speed passenger trains inBotnia-Atlantica region
Subjects: Applications (stat.AP)

Harsh winter climate can cause various problems for both public and private sectors in Sweden, especially in the northern part for railway industry. To have a better understanding of winter climate impacts, this study investigates effects of the winter climate including atmospheric icing on the performance of high speed passenger trains in the Botnia-Atlantica region. The investigation is done with train operational data together with simulated weather data from the Weather Research and Forecast model over January - February 2017.
Two different measurements of the train performance are analysed. One is cumulative delay which measures the increment in delay in terms of running time within two consecutive measuring spots, the other is current delay which is the delay in terms of arrival time at each measuring spot compared to the schedule. Cumulative delay is investigated through a Cox model and the current delay is studied using a Markov chain model.
The results show that the weather factors have impacts on the train performance. Therein temperature and humidity have significant impacts on both the occurrence of cumulative delay and the transition probabilities between (current) delayed and non-delayed states.

[4]  arXiv:2009.10476 [pdf, other]
Title: Spatio-temporal modelling of $\text{PM}_{10}$ daily concentrations in Italy using the SPDE approach
Subjects: Applications (stat.AP)

This paper illustrates the main results of a spatio-temporal interpolation process of $\text{PM}_{10}$ concentrations at daily resolution using a set of 410 monitoring sites, distributed throughout the Italian territory, for the year 2015. The interpolation process is based on a Bayesian hierarchical model where the spatial-component is represented through the Stochastic Partial Differential Equation (SPDE) approach with a lag-1 temporal autoregressive component (AR1). Inference is performed through the Integrated Nested Laplace Approximation (INLA). Our model includes 11 spatial and spatio-temporal predictors, including meteorological variables and Aerosol Optical Depth. As the predictors' impact varies across months, the regression is based on 12 monthly models with the same set of covariates. The predictive model performance has been analyzed using a cross-validation study. Our results show that the predicted and the observed values are well in accordance (correlation range: 0.79 - 0.91; bias: 0.22 - 1.07 $\mu \text{g}/\text{m}^3$; RMSE: 4.9 - 13.9 $\mu \text{g}/\text{m}^3$). The model final output is a set of 365 gridded (1km $\times$ 1km) daily $\text{PM}_{10}$ maps over Italy equipped with an uncertainty measure. The spatial prediction performance shows that the interpolation procedure is able to reproduce the large scale data features without unrealistic artifacts in the generated $\text{PM}_{10}$ surfaces. The paper presents also two illustrative examples of practical applications of our model, exceedance probability and population exposure maps.

[5]  arXiv:2009.10498 [pdf, ps, other]
Title: ABM: an automatic supervised feature engineering method for loss based models based on group and fused lasso
Subjects: Applications (stat.AP); Machine Learning (stat.ML)

A vital problem in solving classification or regression problem is to apply feature engineering and variable selection on data before fed into models.One of a most popular feature engineering method is to discretisize continous variable with some cutting points,which is refered to as bining processing.Good cutting points are important for improving model's ability, because wonderful bining may ignore some noisy variance in continous variable range and keep useful leveled information with good ordered encodings.However, to our best knowledge a majority of cutting point selection is done via researchers domain knownledge or some naive methods like equal-width cutting or equal-frequency cutting.In this paper we propose an end-to-end supervised cutting point selection method based on group and fused lasso along with the automatically variable selection effect.We name our method \textbf{ABM}(automatic bining machine). We firstly cut each variable range into fine grid bins and train model with our group and group fused lasso regularization on each successive bins.It is a method that integrates feature engineering,variable selection and model training simultanously.And one more inspiring thing is that the method is flexible such that it can be taken into a bunch of loss function based model including deep neural networks.We have also implemented the method in R and open the source code to other researchers.A Python version will also meet the community in days.

[6]  arXiv:2009.10518 [pdf, other]
Title: Subgroup identification in individual patient data meta-analysis using model-based recursive partitioning
Subjects: Applications (stat.AP)

Model-based recursive partitioning (MOB) can be used to identify subgroups with differing treatment effects. The detection rate of treatment-by-covariate interactions and the accuracy of identified subgroups using MOB depend strongly on the sample size. Using data from multiple randomized controlled clinical trials can overcome the problem of too small samples. However, naively pooling data from multiple trials may result in the identification of spurious subgroups as differences in study design, subject selection and other sources of between-trial heterogeneity are ignored. In order to account for between-trial heterogeneity in individual participant data (IPD) meta-analysis random-effect models are frequently used. Commonly, heterogeneity in the treatment effect is modelled using random effects whereas heterogeneity in the baseline risks is modelled by either fixed effects or random effects. In this article, we propose metaMOB, a procedure using the generalized mixed-effects model tree (GLMM tree) algorithm for subgroup identification in IPD meta-analysis. Although the application of metaMOB is potentially wider, e.g. randomized experiments with participants in social sciences or preclinical experiments in life sciences, we focus on randomized controlled clinical trials. In a simulation study, metaMOB outperformed GLMM trees assuming a random intercept only and model-based recursive partitioning (MOB), whose algorithm is the basis for GLMM trees, with respect to the false discovery rates, accuracy of identified subgroups and accuracy of estimated treatment effect. The most robust and therefore most promising method is metaMOB with fixed effects for modelling the between-trial heterogeneity in the baseline risks.

[7]  arXiv:2009.10532 [pdf, other]
Title: A rapidly updating stratified mix-adjusted median property price index model
Comments: 7 pages, 2 figures, 1 table
Subjects: Applications (stat.AP); Computation (stat.CO)

Homeowners, first-time buyers, banks, governments and construction companies are highly interested in following the state of the property market. Currently, property price indexes are published several months out of date and hence do not offer the up-to-date information which housing market stakeholders need in order to make informed decisions. In this article, we present an updated version of a central-price tendency based property price index which uses geospatial property data and stratification in order to compare similar houses. The expansion of the algorithm to include additional parameters owing to a new data structure implementation and a richer dataset allows for the construction of a far smoother and more robust index than the original algorithm produced.

Cross-lists for Wed, 23 Sep 20

[8]  arXiv:2009.10126 (cross-list from eess.SP) [pdf, other]
Title: Evaluating phase synchronization methods in fMRI: a comparison study and new approaches
Authors: Hamed Honari (1), Ann S. Choe (2 and 3 and 4), Martin A. Lindquist (5) ((1) Department of Electrical and Computer Engineering, Johns Hopkins University, USA (2) F. M. Kirby Research Center for Functional Brain Imaging, Kennedy Krieger Institute, USA (3) International Center for Spinal Cord Injury, Kennedy Krieger Institute, USA (4) Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins School of Medicine, USA (5) Department of Biostatistics, Johns Hopkins University, USA)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)

In recent years there has been growing interest in measuring time-varying functional connectivity between different brain regions using resting-state functional magnetic resonance imaging (rs-fMRI) data. One way to assess the relationship between signals from different brain regions is to measure their phase synchronization (PS) across time. There are several ways to perform such analyses, and here we compare methods that utilize a PS metric together with a sliding window, referred to here as windowed phase synchronization (WPS), with those that directly measure the instantaneous phase synchronization (IPS). In particular, IPS has recently gained popularity as it offers single time-point resolution of time-resolved fMRI connectivity. In this paper, we discuss the underlying assumptions required for performing PS analyses and emphasize the necessity of band-pass filtering the data to obtain valid results. We review various methods for evaluating PS and introduce a new approach within the IPS framework denoted the cosine of the relative phase (CRP). We contrast methods through a series of simulations and application to rs-fMRI data. Our results indicate that CRP outperforms other tested methods and overcomes issues related to undetected temporal transitions from positive to negative associations common in IPS analysis. Further, in contrast to phase coherence, CRP unfolds the distribution of PS measures, which benefits subsequent clustering of PS matrices into recurring brain states.

[9]  arXiv:2009.10645 (cross-list from stat.ML) [pdf, other]
Title: Partially Observable Online Change Detection via Smooth-Sparse Decomposition
Comments: 48 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)

We consider online change detection of high dimensional data streams with sparse changes, where only a subset of data streams can be observed at each sensing time point due to limited sensing capacities. On the one hand, the detection scheme should be able to deal with partially observable data and meanwhile have efficient detection power for sparse changes. On the other, the scheme should be able to adaptively and actively select the most important variables to observe to maximize the detection power. To address these two points, in this paper, we propose a novel detection scheme called CDSSD. In particular, it describes the structure of high dimensional data with sparse changes by smooth-sparse decomposition, whose parameters can be learned via spike-slab variational Bayesian inference. Then the posterior Bayes factor, which incorporates the learned parameters and sparse change information, is formulated as a detection statistic. Finally, by formulating the statistic as the reward of a combinatorial multi-armed bandit problem, an adaptive sampling strategy based on Thompson sampling is proposed. The efficacy and applicability of our method in practice are demonstrated with numerical studies and a real case study.

Replacements for Wed, 23 Sep 20

[10]  arXiv:2006.09329 (replaced) [pdf, ps, other]
Title: Improving Interpretable Piecewise Linear Models through Hierarchical Spatial and Functional Smoothing
Subjects: Methodology (stat.ME); Applications (stat.AP)
