New submissions for Fri, 4 Dec 20

[1]  arXiv:2012.01618 [pdf, other]
Title: Matrix Completion Methods for the Total Electron Content Video Reconstruction
Subjects: Applications (stat.AP); Signal Processing (eess.SP)

The total electron content (TEC) maps can be used to estimate the signal delay of GPS due to the ionospheric electron content between a receiver and satellite. This delay can result in GPS positioning error. Thus it is important to monitor the TEC maps. The observed TEC maps have big patches of missingness in the ocean and scattered small areas of missingness on the land. In this paper, we propose several extensions of existing matrix completion algorithms to achieve TEC map reconstruction, accounting for spatial smoothness and temporal consistency while preserving important structures of the TEC maps. We call the proposed method Video Imputation with SoftImpute, Temporal smoothing and Auxiliary data (VISTA). Numerical simulations that mimic patterns of real data are given. We show that our proposed method achieves better reconstructed TEC maps as compared to existing methods in literature. Our proposed computational algorithm is general and can be readily applied for other problems besides TEC map reconstruction.

[2]  arXiv:2012.01619 [pdf, other]
Title: brolgar: An R package to BRowse Over Longitudinal Data Graphically and Analytically in R
Comments: 19 pages, 14 figures
Subjects: Applications (stat.AP); Methodology (stat.ME)

Longitudinal (panel) data provide the opportunity to examine temporal patterns of individuals, because measurements are collected on the same person at different, and often irregular, time points. The data is typically visualised using a "spaghetti plot", where a line plot is drawn for each individual. When overlaid in one plot, it can have the appearance of a bowl of spaghetti. With even a small number of subjects, these plots are too overloaded to be read easily. The interesting aspects of individual differences are lost in the noise. Longitudinal data is often modelled with a hierarchical linear model to capture the overall trends, and variation among individuals, while accounting for various levels of dependence. However, these models can be difficult to fit, and can miss unusual individual patterns. Better visual tools can help to diagnose longitudinal models, and better capture the individual experiences. This paper introduces the R package, brolgar (BRowse over Longitudinal data Graphically and Analytically in R), which provides tools to identify and summarise interesting individual patterns in longitudinal data.

[3]  arXiv:2012.01912 [pdf, other]
Title: Assaying Large-scale Testing Models to Interpret Covid-19 Case Numbers. A Cross-country Study
Comments: 41 pages, 7 figures
Subjects: Applications (stat.AP)

Large-scale testing is considered key to assessing the state of the current COVID-19 pandemic, yet interpreting such data remains elusive. We modeled competing hypotheses regarding the underlying testing mechanisms, thereby providing different prevalence estimates based on case numbers, and used them to predict SARS-CoV-2-attributed death rate trajectories. Assuming that individuals were tested based solely on a predefined risk of being infectious implied the absolute case numbers reflected prevalence, but turned out to be a poor predictor. In contrast, models accounting for testing capacity, limiting the pool of tested individuals, performed better. This puts forward the percentage of positive tests as a robust indicator of epidemic dynamics in absence of country-specific information. We next demonstrated this strongly affects data interpretation. Notably absolute case numbers trajectories consistently overestimated growth rates at the beginning of two COVID-19 epidemic waves. Overall, this supports non-trivial testing mechanisms can be inferred from data and should be scrutinized.

[4]  arXiv:2012.02099 [pdf]
Title: Performance Indicators Contributing To Success At The Group And Play-Off Stages Of The 2019 Rugby World Cup
Subjects: Applications (stat.AP); Machine Learning (cs.LG)

Performance indicators that contributed to success at the group stage and play-off stages of the 2019 Rugby World Cup were analysed using publicly available data obtained from the official tournament website using both a non-parametric statistical technique, Wilcoxon's signed rank test, and a decision rules technique from machine learning called RIPPER. Our statistical results found that ball carry effectiveness (percentage of ball carries that penetrated the opposition gain-line) and total metres gained (kick metres plus carry metres) were found to contribute to success at both stages of the tournament and that indicators that contributed to success during the group stages (dominating possession, making more ball carries, making more passes, winning more rucks, and making less tackles) did not contribute to success at the play-off stage. Our results using RIPPER found that low ball carries and a low lineout success percentage jointly contributed to losing at the group stage, while winning a low number of rucks and carrying over the gain-line a sufficient number of times contributed to winning at the play-off stage of the tournament. The results emphasise the need for teams to adapt their playing strategies from the group stage to the play-off stage at tournament in order to be successful.

[5]  arXiv:2012.02100 [pdf, other]
Title: Statistical techniques to estimate the SARS-CoV-2 infection fatality rate
Comments: 50 pages, 13 figures
Subjects: Applications (stat.AP); Methodology (stat.ME)

The determination of the infection fatality rate (IFR) for the novel SARS-CoV-2 coronavirus is a key aim for many of the field studies that are currently being undertaken in response to the pandemic. The IFR together with the basic reproduction number $R_0$, are the main epidemic parameters describing severity and transmissibility of the virus, respectively. The IFR can be also used as a basis for estimating and monitoring the number of infected individuals in a population, which may be subsequently used to inform policy decisions relating to public health interventions and lockdown strategies. The interpretation of IFR measurements requires the calculation of confidence intervals. We present a number of statistical methods that are relevant in this context and develop an inverse problem formulation to determine correction factors to mitigate time-dependent effects that can lead to biased IFR estimates. We also review a number of methods to combine IFR estimates from multiple independent studies, provide example calculations throughout this note and conclude with a summary and "best practice" recommendations. The developed code is available online.

[6]  arXiv:2012.02101 [pdf, other]
Title: The Statistics of Noisy One-Stage Group Testing in Outbreaks
Comments: 30 pages, 20 figures
Subjects: Applications (stat.AP); Discrete Mathematics (cs.DM); Quantitative Methods (q-bio.QM)

In one-stage or non-adaptive group testing, instead of testing every sample unit individually, they are split, bundled in pools, and simultaneously tested. The results are then decoded to infer the states of the individual items. This combines advantages of adaptive pooled testing, i. e. saving resources and higher throughput, with those of individual testing, e. g. short detection time and lean laboratory organisation, and might be suitable for screening during outbreaks. We study the COMP and NCOMP decoding algorithms for non-adaptive pooling strategies based on maximally disjunct pooling matrices with constant row and column sums in the linear prevalence regime and in the presence of noisy measurements motivated by PCR tests. We calculate sensitivity, specificity, the probabilities of Type I and II errors, and the expected number of items with a positive result as well as the expected number of false positives and false negatives. We further provide estimates on the variance of the number of positive and false positive results. We conduct a thorough discussion of the calculations and bounds derived. Altogether, the article provides blueprints for screening strategies and tools to help decision makers to appropriately tune them in an outbreak.

[7]  arXiv:2012.02102 [pdf, other]
Title: A modified risk detection approach of biomarkers by frailty effect on multiple time to event data
Comments: 16 pages, 2 figures,7 tables
Subjects: Applications (stat.AP)

Multiple indications of disease progression found in a cancer patient by loco-regional relapse, distant metastasis and death. Early identification of these indications is necessary to change the treatment strategy. Biomarkers play an essential role in this aspect. The biomarkers can influence how particular cancer behaves and how it may respond to a specific treatment. The survival chance of a patient is dependent on the biomarker, and the treatment strategy also differs accordingly, e.g., the survival prediction of breast cancer patients diagnosed with HER2 positive status is different from the same with HER2 negative status. This results in a different treatment strategy. So, the heterogeneity of the biomarker statuses or levels should be taken into consideration while modelling the survival outcome. This heterogeneity factor which is often unobserved, is called frailty. When multiple indications are present simultaneously, the scenario becomes more complex as only one of them can occur, which will censor the occurrence of other events. The events indicating cancer progression are likely to be inter-related. So, the correlation should be incorporated through the frailties of different events. In our study, we considered a multiple events or risks model with a heterogeneity component. Based on the estimated variance of the frailty, the threshold levels of a biomarker are utilised as early detection tool of the disease progression or death. Additive-gamma frailty model is considered to account the correlation between different frailty components and estimation of parameters are performed using Expectation-Maximization Algorithm. With the extensive algorithm in R, we have obtained the threshold levels of activity of a biomarker in a multiple events scenario.

[8]  arXiv:2012.02103 [pdf, ps, other]
Title: Design aspects of COVID-19 treatment trials: Improving probability and time of favourable events
Subjects: Applications (stat.AP); Methodology (stat.ME)

As a reaction to the pandemic of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a multitude of clinical trials for the treatment of SARS-CoV-2 or the resulting corona disease (COVID-19) are globally at various stages from planning to completion. Although some attempts were made to standardize study designs, this was hindered by the ferocity of the pandemic and the need to set up trials quickly. We take the view that a successful treatment of COVID-19 patients (i) increases the probability of a recovery or improvement within a certain time interval, say 28 days; (ii) aims to expedite favourable events within this time frame; and (iii) does not increase mortality over this time period. On this background we discuss the choice of endpoint and its analysis. Furthermore, we consider consequences of this choice for other design aspects including sample size and power and provide some guidance on the application of adaptive designs in this particular context.

[9]  arXiv:2012.02105 [pdf, other]
Title: Systematic errors in estimates of $R_t$ from symptomatic cases in the presence of observation bias
Comments: 7 pages, 2 figures
Subjects: Applications (stat.AP)

We consider the problem of estimating the reproduction number $R_t$ of an epidemic for populations where the probability of detection of cases depends on a known covariate. We argue that in such cases the normal empirical estimator can fail when the prevalence of cases among groups changes with time. We propose a Bayesian strategy to resolve the problem, as well as a simple solution in the case of large number of cases. We illustrate the issue and its solution on a simple yet realistic simulation study, and discuss the general relevance of the issue to the current covid19 pandemic.

Cross-lists for Fri, 4 Dec 20

[10]  arXiv:2012.01937 (cross-list from stat.ME) [pdf, other]
Title: On spike-and-slab priors for Bayesian equation discovery of nonlinear dynamical systems via sparse linear regression
Subjects: Methodology (stat.ME); Systems and Control (eess.SY); Applications (stat.AP)

This paper presents the use of spike-and-slab (SS) priors for discovering governing differential equations of motion of nonlinear structural dynamic systems. The problem of discovering governing equations is cast as that of selecting relevant variables from a predetermined dictionary of basis variables and solved via sparse Bayesian linear regression. The SS priors, which belong to a class of discrete-mixture priors and are known for their strong sparsifying (or shrinkage) properties, are employed to induce sparse solutions and select relevant variables. Three different variants of SS priors are explored for performing Bayesian equation discovery. As the posteriors with SS priors are analytically intractable, a Markov chain Monte Carlo (MCMC)-based Gibbs sampler is employed for drawing posterior samples of the model parameters; the posterior samples are used for variable selection and parameter estimation in equation discovery. The proposed algorithm has been applied to four systems of engineering interest, which include a baseline linear system, and systems with cubic stiffness, quadratic viscous damping, and Coulomb damping. The results demonstrate the effectiveness of the SS priors in identifying the presence and type of nonlinearity in the system. Additionally, comparisons with the Relevance Vector Machine (RVM) - that uses a Student's-t prior - indicate that the SS priors can achieve better model selection consistency, reduce false discoveries, and derive models that have superior predictive accuracy. Finally, the Silverbox experimental benchmark is used to validate the proposed methodology.

[11]  arXiv:2012.02021 (cross-list from stat.ME) [pdf, ps, other]
Title: Modeling Count Data via Copulas
Comments: 33 pages
Subjects: Methodology (stat.ME); Applications (stat.AP)

Copula models have been widely used to model the dependence between continuous random variables, but modeling count data via copulas has recently become popular in the statistics literature. Spearman's rho is an appropriate and effective tool to measure the degree of dependence between two random variables. In this paper, we derived the population version of Spearman's rho correlation via copulas when both random variables are discrete. The closed-form expressions of the Spearman correlation are obtained for some copulas of simple structure such as Archimedean copulas with different marginal distributions. We derive the upper bound and the lower bound of the Spearman's rho for Bernoulli random variables. Then, the proposed Spearman's rho correlations are compared with their corresponding Kendall's tau values. We characterize the functional relationship between these two measures of dependence in some special cases. An extensive simulation study is conducted to demonstrate the validity of our theoretical results. Finally, we propose a bivariate copula regression model to analyze the count data of a \emph{cervical cancer} dataset.

[12]  arXiv:2012.02074 (cross-list from stat.ME) [pdf, other]
Title: A Note on Bayesian Modeling Specification of Censored Data in JAGS
Subjects: Methodology (stat.ME); Applications (stat.AP)

Just Another Gibbs Sampling (JAGS) is a convenient tool to draw posterior samples using Markov Chain Monte Carlo for Bayesian modeling. However, the built-in function dinterval() to model censored data misspecifies the computation of deviance function, which may limit its usage to perform likelihood based model comparison. To establish an automatic approach to specify the correct deviance function in JAGS, we propose a simple alternative modeling strategy to implement Bayesian model selection for analysis of censored outcomes. The proposed approach is applicable to a broad spectrum of data types, which include survival data and many other right-, left- and interval-censored Bayesian model structures.

[13]  arXiv:2012.02168 (cross-list from stat.ME) [pdf, other]
Title: Filtering and improved Uncertainty Quantification in the dynamic estimation of effective reproduction numbers
Subjects: Methodology (stat.ME); Applications (stat.AP)

The effective reproduction number $R_t$ measures an infectious disease's transmissibility as the number of secondary infections in one reproduction time in a population having both susceptible and non-susceptible hosts. Current approaches do not quantify the uncertainty correctly in estimating $R_t$, as expected by the observed variability in contagion patterns. We elaborate on the Bayesian estimation of $R_t$ by improving on the Poisson sampling model of Cori et al. (2013). By adding an autoregressive latent process, we build a Dynamic Linear Model on the log of observed $R_t$s, resulting in a filtering type Bayesian inference. We use a conjugate analysis, and all calculations are explicit. Results show an improved uncertainty quantification on the estimation of $R_t$'s, with a reliable method that could safely be used by non-experts and within other forecasting systems. We illustrate our approach with recent data from the current COVID19 epidemic in Mexico.

Replacements for Fri, 4 Dec 20

[14]  arXiv:1807.07536 (replaced) [pdf, other]
Title: A Skellam Regression Model for Quantifying Positional Value in Soccer
Subjects: Applications (stat.AP)
[15]  arXiv:2009.12649 (replaced) [pdf, other]
Title: Estimation of the incubation time distribution for COVID-19
Authors: Piet Groeneboom
Comments: 26 pages, 9 figures, 3 tables
Subjects: Applications (stat.AP)
[16]  arXiv:2004.11169 (replaced) [pdf, other]
Title: On the modelling of multivariate counts with Cox processes and dependent shot noise intensities
Subjects: Risk Management (q-fin.RM); Applications (stat.AP)
[17]  arXiv:2005.14057 (replaced) [pdf, other]
Title: Machine learning time series regressions with an application to nowcasting
Comments: Portions of this work previously appeared as arXiv:1912.06307v1 which has been split into two articles
Subjects: Econometrics (econ.EM); Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)
[18]  arXiv:2006.00717 (replaced) [pdf, other]
Title: On the optimality of joint periodic and extraordinary dividend strategies
Subjects: Risk Management (q-fin.RM); Optimization and Control (math.OC); Applications (stat.AP)
[19]  arXiv:2008.05951 (replaced) [pdf, other]
Title: Predictive-Adjusted Indirect Comparison: A Novel Method for Population Adjustment with Limited Access to Patient-Level Data
Comments: 31 pages, 7 figures. Submitted to Statistics in Medicine. Updated after Ph.D. proposal defense/transfer viva comments. arXiv admin note: text overlap with arXiv:2004.14800
Subjects: Methodology (stat.ME); Applications (stat.AP)
