[1]  arXiv:2007.06735 [pdf, other]
Title: On the Current and Longest Head Run
Authors: Dennis Koh
Comments: 17 pages, 8 figures
Subjects: Statistics Theory (math.ST); Probability (math.PR)

In this paper, the open problem of finding a closed analytical expression for the distribution function of the length of the longest pure head run in coin tosses of a possibly biased coin is solved by studying the closely related Markov chain of current head runs. Based on this, inequaltities and an asymptotic expression for the centered length of the longest head run can be derived. Moreover, formulae for parameters like expected value and variance solely by means of the distribution function are given. The corresponding results for the length of the longest whatever run in tosses of fair coins are also included, heuristics are discussed as well.

Cross-lists for Wed, 15 Jul 20

[2]  arXiv:2007.06697 (cross-list from math.PR) [pdf, other]
Title: Species tree estimation under joint modeling of coalescence and duplication: sample complexity of quartet methods
Comments: 35 pages, 1 figure
Subjects: Probability (math.PR); Statistics Theory (math.ST); Populations and Evolution (q-bio.PE)

We consider species tree estimation under a standard stochastic model of gene tree evolution that incorporates incomplete lineage sorting (as modeled by a coalescent process) and gene duplication and loss (as modeled by a branching process). Through a probabilistic analysis of the model, we derive sample complexity bounds for widely used quartet-based inference methods that highlight the effect of the duplication and loss rates in both subcritical and supercritical regimes.

[3]  arXiv:2007.06715 (cross-list from math.DS) [pdf, other]
Title: Dynamics of coordinate ascent variational inference: A case study in 2D Ising models
Subjects: Dynamical Systems (math.DS); Statistics Theory (math.ST)

Variational algorithms have gained prominence over the past two decades as a scalable computational environment for Bayesian inference. In this article, we explore tools from the dynamical systems literature to study convergence of coordinate ascent algorithms for mean field variational inference. Focusing on the Ising model defined on two nodes, we fully characterize the dynamics of the sequential coordinate ascent algorithm and its parallel version. We observe that in the regime where the objective function is convex, both the algorithms are stable and exhibit convergence to the unique fixed point. Our analyses reveal interesting {\em discordances} between these two versions of the algorithm in the region when the objective function is non-convex. In fact, the parallel version exhibits a periodic oscillatory behavior which is absent in the sequential version. Drawing intuition from the Markov chain Monte Carlo literature, we {\em empirically} show that a parameter expansion of the Ising model, popularly called as the Edward--Sokal coupling, leads to an enlargement of the regime of convergence to the global optima.

[4]  arXiv:2007.06799 (cross-list from stat.ML) [pdf, ps, other]
Title: A Decentralized Approach to Bayesian Learning
Comments: 52 pages, 29 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Motivated by decentralized approaches to machine learning, we propose a collaborative Bayesian learning algorithm taking the form of decentralized Langevin dynamics in a non-convex setting. Our analysis show that the initial KL-divergence between the Markov Chain and the target posterior distribution is exponentially decreasing while the error contributions to the overall KL-divergence from the additive noise is decreasing in polynomial time. We further show that the polynomial-term experiences speed-up with number of agents and provide sufficient conditions on the time-varying step-sizes to guarantee convergence to the desired distribution. The performance of the proposed algorithm is evaluated on a wide variety of machine learning tasks. The empirical results show that the performance of individual agents with locally available data is on par with the centralized setting with considerable improvement in the convergence rate.

[5]  arXiv:2007.06827 (cross-list from stat.ML) [pdf, other]
Title: Early stopping and polynomial smoothing in regression with reproducing kernels
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

In this paper we study the problem of early stopping for iterative learning algorithms in reproducing kernel Hilbert space (RKHS) in the nonparametric regression framework. In particular, we work with gradient descent and (iterative) kernel ridge regression algorithms. We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle. This method enjoys only one assumption on the regression function: it belongs to a reproducing kernel Hilbert space (RKHS). The proposed rule is proved to be minimax optimal over different types of kernel spaces, including finite rank and Sobolev smoothness classes. The proof is derived from the fixed-point analysis of the localized Rademacher complexities, which is a standard technique for obtaining optimal rates in the nonparametric regression literature. In addition to that, we present simulations results on artificial datasets that show comparable performance of the designed rule with respect to other stopping rules such as the one determined by V-fold cross-validation.

[6]  arXiv:2007.07065 (cross-list from econ.EM) [pdf, other]
Title: A More Robust t-Test
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST)

Standard inference about a scalar parameter estimated via GMM amounts to applying a t-test to a particular set of observations. If the number of observations is not very large, then moderately heavy tails can lead to poor behavior of the t-test. This is a particular problem under clustering, since the number of observations then corresponds to the number of clusters, and heterogeneity in cluster sizes induces a form of heavy tails. This paper combines extreme value theory for the smallest and largest observations with a normal approximation for the average of the remaining observations to construct a more robust alternative to the t-test. The new test is found to control size much more successfully in small samples compared to existing methods. Analytical results in the canonical inference for the mean problem demonstrate that the new test provides a refinement over the full sample t-test under more than two but less than three moments, while the bootstrapped t-test does not.

Replacements for Wed, 15 Jul 20

[7]  arXiv:1909.03540 (replaced) [pdf, other]
Title: Inference In High-dimensional Single-Index Models Under Symmetric Designs
Subjects: Statistics Theory (math.ST); Other Statistics (stat.OT)
[8]  arXiv:1911.10604 (replaced) [pdf, other]
Title: Optimal Permutation Recovery in Permuted Monotone Matrix Model
Journal-ref: Journal of the American Statistical Association, 2020
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
[9]  arXiv:2001.11201 (replaced) [pdf, other]
Title: Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
Authors: Vrettos Moulos
Comments: 31 pages, simulation results added
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
[10]  arXiv:2002.08422 (replaced) [pdf, other]
Title: On conditional versus marginal bias in multi-armed bandits
Comments: 18 pages
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
[11]  arXiv:2007.01958 (replaced) [pdf, other]
Title: Two-sample Testing for Large, Sparse High-Dimensional Multinomials under Rare/Weak Perturbations
Subjects: Statistics Theory (math.ST); Computation (stat.CO)
[12]  arXiv:1910.09485 (replaced) [pdf, other]
Title: Counterexamples for optimal scaling of Metropolis-Hastings chains with rough target densities
Comments: 44 pages, 3 figures
Subjects: Probability (math.PR); Statistics Theory (math.ST)
