We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 24 Jan 20

[1]  arXiv:2001.08327 [pdf, other]
Title: The Reciprocal Bayesian LASSO
Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)

A reciprocal LASSO (rLASSO) regularization employs a decreasing penalty function as opposed to conventional penalization methods that use increasing penalties on the coefficients, leading to stronger parsimony and superior model selection relative to traditional shrinkage methods. Here we consider a fully Bayesian formulation of the rLASSO problem, which is based on the observation that the rLASSO estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters are assigned independent inverse Laplace priors. Bayesian inference from this posterior is possible using an expanded hierarchy motivated by a scale mixture of double Pareto or truncated normal distributions. On simulated and real datasets, we show that the Bayesian formulation outperforms its classical cousin in estimation, prediction, and variable selection across a wide range of scenarios while offering the advantage of posterior inference. Finally, we discuss other variants of this new approach and provide a unified framework for variable selection using flexible reciprocal penalties. All methods described in this paper are publicly available as an R package at: https://github.com/himelmallick/BayesRecipe.

[2]  arXiv:2001.08431 [pdf, other]
Title: On the Hauck-Donner Effect in Wald Tests: Detection, Tipping Points, and Parameter Space Characterization
Comments: 6 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)

The Wald test remains ubiquitous in statistical practice despite shortcomings such as its inaccuracy in small samples and lack of invariance under reparameterization. This paper develops on another but lesser-known shortcoming called the Hauck--Donner effect (HDE) whereby a Wald test statistic is not monotonely increasing as a function of increasing distance between the parameter estimate and the null value. Resulting in an upward biased $p$-value and loss of power, the aberration can lead to very damaging consequences such as in variable selection. The HDE afflicts many types of regression models and corresponds to estimates near the boundary of the parameter space. This article presents several new results, and its main contributions are to (i) propose a very general test for detecting the HDE, regardless of its underlying cause; (ii) fundamentally characterize the HDE by pairwise ratios of Wald and Rao score and likelihood ratio test statistics for 1-parameter distributions; (iii) show that the parameter space may be partitioned into an interior encased by 5 HDE severity measures (faint, weak, moderate, strong, extreme); (iv) prove that a necessary condition for the HDE in a 2 by 2 table is a log odds ratio of at least 2; (v) give some practical guidelines about HDE-free hypothesis testing. Overall, practical post-fit tests can now be conducted potentially to any model estimated by iteratively reweighted least squares, such as the generalized linear model (GLM) and Vector GLM (VGLM) classes, the latter which encompasses many popular regression models.

[3]  arXiv:2001.08465 [pdf, other]
Title: Shrinkage with Robustness: Log-Adjusted Priors for Sparse Signals
Comments: 40 pages
Subjects: Methodology (stat.ME)

We introduce a new class of distributions named log-adjusted shrinkage priors for the analysis of sparse signals, which extends the three parameter beta priors by multiplying an additional log-term to their densities. The key feature of the proposed prior is that its density tail is extremely heavy and heavier than even that of the Cauchy distribution, leading to the strong tail-robustness of Bayes estimator, while keeping the shrinkage effect on noises. The proposed prior has density tails that are heavier than even those of the Cauchy distribution and realizes the tail-robustness of the Bayes estimator, while keeping the strong shrinkage effect on noises. We verify this property via the improved posterior mean squared errors in the tail. An integral representation with latent variables for the new density is available and enables fast and simple Gibbs samplers for the full posterior analysis. Our log-adjusted prior is significantly different from existing shrinkage priors with logarithms for allowing its further generalization by multiple log-terms in the density. The performance of the proposed priors is investigated through simulation studies and data analysis.

Cross-lists for Fri, 24 Jan 20

[4]  arXiv:2001.08336 (cross-list from math.ST) [pdf, other]
Title: Geometric Conditions for the Discrepant Posterior Phenomenon and Connections to Simpson's Paradox
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

The discrepant posterior phenomenon (DPP) is a counterintuitive phenomenon that occurs in the Bayesian analysis of multivariate parameters. It refers to when an estimate of a marginal parameter obtained from the posterior is more extreme than both of those obtained using either the prior or the likelihood alone. Inferential claims that exhibit DPP defy intuition, and the phenomenon can be surprisingly ubiquitous in well-behaved Bayesian models. Using point estimation as an example, we derive conditions under which the DPP occurs in Bayesian models with exponential quadratic likelihoods, including Gaussian models and those with local asymptotic normality property, with conjugate multivariate Gaussian priors. We also examine the DPP for the Binomial model, in which the posterior mean is not a linear combination of that of the prior and the likelihood. We provide an intuitive geometric interpretation of the phenomenon and show that there exists a non-trivial space of marginal directions such that the DPP occurs. We further relate the phenomenon to the Simpson's paradox and discover their deep-rooted connection that is associated with marginalization. We also draw connections with Bayesian computational algorithms when difficult geometry exists. Theoretical results are complemented by numerical illustrations. Scenarios covered in this study have implications for parameterization, sensitivity analysis, and prior choice for Bayesian modeling.

[5]  arXiv:2001.08363 (cross-list from stat.AP) [pdf, other]
Title: A covariance-enhanced approach to multi-tissue joint eQTL mapping with application to transcriptome-wide association studies
Subjects: Applications (stat.AP); Methodology (stat.ME)

Transcriptome-wide association studies based on genetically predicted gene expression have the potential to identify novel regions associated with various complex traits. It has been shown that incorporating expression quantitative trait loci (eQTLs) corresponding to multiple tissue types can improve power for association studies involving complex etiology. In this article, we propose a new multivariate response linear regression model and method for predicting gene expression in multiple tissues simultaneously. Unlike existing methods for multi-tissue joint eQTL mapping, our approach incorporates tissue-tissue expression correlation, which allows us to more efficiently handle missing expression measurements and more accurately predict gene expression using a weighted summation of eQTL genotypes. We show through simulation studies that our approach performs better than the existing methods in many scenarios. We use our method to estimate eQTL weights for 29 tissues collected by GTEx, and show that our approach significantly improves expression prediction accuracy compared to competitors. Using our eQTL weights, we perform a multi-tissue-based S-MultiXcan transcriptome-wide association study and show that our method leads to more discoveries in novel regions and more discoveries overall than the existing methods. Estimated eQTL weights are available for download online at github.com/ajmolstad/MTeQTLResults.

Replacements for Fri, 24 Jan 20

[6]  arXiv:1908.02891 (replaced) [pdf, other]
Title: Que será será? The uncertainty estimation of feature-based time series forecasts
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
[7]  arXiv:2001.08089 (replaced) [pdf, other]
Title: Maximum Likelihood Estimation of Spatially Varying Coefficient Models for Large Data with an Application to Real Estate Price Prediction
Authors: Jakob A. Dambon (1 and 2), Fabio Sigrist (2), Reinhard Furrer (1) ((1) University of Zurich, (2) Lucerne University of Applied Sciences and Arts)
Comments: 37 pages, 16 figures; corrected encoding issue in Bibliography
Subjects: Methodology (stat.ME); Applications (stat.AP)
[8]  arXiv:1803.07859 (replaced) [pdf, other]
Title: Efficient Sampling and Structure Learning of Bayesian Networks
Comments: Revised version. 40 pages including 16 pages of supplement, 5 figures and 15 supplemental figures; R package BiDAG is available at this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[9]  arXiv:1912.12150 (replaced) [pdf, other]
Title: The Chi-Square Test of Distance Correlation
Comments: 12 pages + 8 pages appendix, 3 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
[10]  arXiv:2001.08090 (replaced) [pdf, other]
Title: Stratified cross-validation for unbiased and privacy-preserving federated learning
Comments: 13 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2001, contact, help  (Access key information)