We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.CO

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Computation

Title: Binomial and Multinomial Proportions: Accurate Estimation and Reliable Assessment of Accuracy

Abstract: Misestimates of $\sigma_{P_o}$, the \emph{uncertainty} in $P_o$ from a 2-state Bayes equation used for binary classification, apparently arose from $\hat{\sigma}_{p_i}$, the uncertainty in underlying pdfs estimated from experimental $b$-bin histograms. To address this, several Bayesian estimator pairs $(\hat{p}_i, \hat{\sigma}_{p_i})$ were compared for agreement between nominal confidence level ($\xi$) and calculated coverage values ($C$). Large $\xi$-to-$C$ inconsistency for large $b$ and $ p_i \gg \frac{1}{b}$ arises for all multinomial estimators since priors downweight low likelihood, high $p_i$ values. To improve $\xi$-to-$C$ matching, $(\xi-C)^2$ was minimized against $\alpha_0$ in a more general prior pdf ($\mathcal{B}[\alpha_0,(b-1)\alpha_0;x]$) to obtain $(\hat{p_i})_{\xi\leftrightarrow C}$. This improved matching for $b=2$, but for $b>2$, $\xi$-to-$C$ matching by $(\hat{p_i})_{\xi\leftrightarrow C}$ required an effective value "$b=2$" and renormalization, and this reduced $\hat{p}_i$-to-$p_i$ matching. Better $\hat{p}_i$-to-$p_i$ matching came from the original multinomial estimators, a new discrete-domain estimator $\hat{p}(n_i,N)$, or an earlier \emph{joint} estimator, $(\hat{p_i})_{\bowtie}$ that co-adjusted all estimates $p_i$ for James-Stein shrinkage to a mean vector. Best simultaneous $\xi$-to-$C$ and $\hat{p}_i$-to-$p_i$ matching came by \emph{de-noising} initial estimates of underlying pdfs. For $b=100$, $N<12800$, de-noised $\hat{p}$ needed $\approx 10\times$ fewer observations to achieve $\hat{p}_i$-to-$p_i$ matching equivalent to that found for $\hat{p}(n_i,N)$, $(\hat{p_i})_{\bowtie}$ or the original multinomial $\hat{p}_i$. De-noising each different type of initial estimate yielded similarly high accuracy in Monte-Carlo tests.
Comments: 61 pages, 24 figures; Small changes occurred (Figs 13-18, A1 & A2, Tables 1, S1) after fixing a slight bug in the the source code. For comparison, version (N-1) prior to fixing the bug is at: this http URL
Subjects: Computation (stat.CO); Methodology (stat.ME)
Cite as: arXiv:1602.00207 [stat.CO]
  (or arXiv:1602.00207v1 [stat.CO] for this version)

Submission history

From: Jonathan Friedman [view email]
[v1] Sun, 31 Jan 2016 06:39:07 GMT (2825kb,D)

Link back to: arXiv, form interface, contact.