[1]  arXiv:2004.00744 [pdf, other]
Title: Pattern graphs: a graphical approach to nonmonotone missing data
Authors: Yen-Chi Chen
Comments: 46 pages; 7 figures; 2 tables
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

We introduce the concept of pattern graphs--directed acyclic graphs representing how response patterns are associated. A pattern graph represents an identifying restriction that is nonparametrically identified/saturated and is often a missing not at random restriction. We introduce a selection model and a pattern mixture model formulations using the pattern graphs and show that they are equivalent. A pattern graph leads to an inverse probability weighting estimator as well as an imputation-based estimator. Asymptotic theories of the estimators are studied and we provide a graph-based recursive procedure for computing both estimators. We propose three graph-based sensitivity analyses and study the equivalence class of pattern graphs.

[2]  arXiv:2004.00775 [pdf, other]
Title: Strong Converse for Testing Against Independence over a Noisy channel
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)

A distributed binary hypothesis testing (HT) problem over a noisy channel studied previously by the authors is investigated from the perspective of the strong converse property. It was shown by Ahlswede and Csisz\'{a}r that a strong converse holds in the above setting when the channel is rate-limited and noiseless. Motivated by this observation, we show that the strong converse continues to hold in the noisy channel setting for a special case of HT known as testing against independence (TAI). The proof utilizes the blowing up lemma and the recent change of measure technique of Tyagi and Watanabe as the key tools.

[3]  arXiv:2004.00792 [pdf, other]
Title: Sequential online subsampling for thinning experimental designs
Comments: 33 pages, 12 figures
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)

We consider a design problem where experimental conditions (design points $X_i$) are presented in the form of a sequence of i.i.d.\ random variables, generated with an unknown probability measure $\mu$, and only a given proportion $\alpha\in(0,1)$ can be selected. The objective is to select good candidates $X_i$ on the fly and maximize a concave function $\Phi$ of the corresponding information matrix. The optimal solution corresponds to the construction of an optimal bounded design measure $\xi_\alpha^*\leq \mu/\alpha$, with the difficulty that $\mu$ is unknown and $\xi_\alpha^*$ must be constructed online. The construction proposed relies on the definition of a threshold $\tau$ on the directional derivative of $\Phi$ at the current information matrix, the value of $\tau$ being fixed by a certain quantile of the distribution of this directional derivative. Combination with recursive quantile estimation yields a nonlinear two-time-scale stochastic approximation method. It can be applied to very long design sequences since only the current information matrix and estimated quantile need to be stored. Convergence to an optimum design is proved. Various illustrative examples are presented.

[4]  arXiv:2004.01089 [pdf, ps, other]
Title: Markov Chain-based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model
Comments: 15 pages, 2 figures
Subjects: Combinatorics (math.CO); Statistics Theory (math.ST)

We study plane trees as a model for RNA secondary structure, assigning energy to each tree based on the Nearest Neighbor Thermodynamic Model, and defining a corresponding Gibbs distribution on the trees. Through a bijection between plane trees and 2-Motzkin paths, we design a Markov chain converging to the Gibbs distribution, and establish fast mixing time results by estimating the spectral gap of the chain. The spectral gap estimate is established through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. In addition to the mathematical aspects of the result, the resulting algorithm can be used as a tool for exploring the branching structure of RNA and its dependence on energy model parameters. The pseudocode implementing the Markov chain is provided in an appendix.

Replacements for Fri, 3 Apr 20

[5]  arXiv:1909.02546 (replaced) [pdf, other]
Title: The distribution of Yule's "nonsense correlation"
Comments: 23 pages, 1 figure
Subjects: Statistics Theory (math.ST)
[6]  arXiv:1910.06914 (replaced) [pdf, other]
Title: Bayesian Inverse Problems with Heterogeneous Variance
Subjects: Statistics Theory (math.ST); Analysis of PDEs (math.AP)
[7]  arXiv:2002.08409 (replaced) [pdf, other]
Title: On the expected number of components in a finite mixture model
Subjects: Statistics Theory (math.ST)
