We gratefully acknowledge support from
the Simons Foundation
and member institutions

Statistics

New submissions

[ total of 37 entries: 1-37 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 11 Dec 17

[1]  arXiv:1712.02831 [pdf, other]
Title: RelNN: A Deep Neural Model for Relational Learning
Comments: 9 pages, 8 figures, accepted at AAAI-2018
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Statistical relational AI (StarAI) aims at reasoning and learning in noisy domains described in terms of objects and relationships by combining probability with first-order logic. With huge advances in deep learning in the current years, combining deep networks with first-order logic has been the focus of several recent studies. Many of the existing attempts, however, only focus on relations and ignore object properties. The attempts that do consider object properties are limited in terms of modelling power or scalability. In this paper, we develop relational neural networks (RelNNs) by adding hidden layers to relational logistic regression (the relational counterpart of logistic regression). We learn latent properties for objects both directly and through general rules. Back-propagation is used for training these models. A modular, layer-wise architecture facilitates utilizing the techniques developed within deep learning community to our architecture. Initial experiments on eight tasks over three real-world datasets show that RelNNs are promising models for relational learning.

[2]  arXiv:1712.02845 [pdf, ps, other]
Title: The Shrinkage Variance Hotelling $T^2$ Test for Genomic Profiling Studies
Authors: Grant Izmirlian
Subjects: Methodology (stat.ME)

Designed gene expression micro-array experiments, consisting of several treatment levels with a number of replicates per level, are analyzed by applying simple tests for group differences at the per gene level. The gene level statistics are sorted and a criterion for selecting important genes which takes into account multiplicity is applied. A caveat arises in that true signals (genes truly over or under expressed) are "competing" with fairly large type I error signals. False positives near the top of a sorted list can occur when genes having very small fold-change are compensated by small enough variance to yield a large test statistic. One of the first attempts around this caveat as the development of "significance analysis of micro-arrays (SAM)", which used a modified t-type statistic thresholded against its permutation distribution. The key innovation of the modified t-statistic was the addition of a constant to the per gene standard errors in order to stabilize the coefficient of variation of the resulting test statistic. Since then, several authors have proposed the use of shrinkage variance estimators in conjunction with t-type, and more generally, ANOVA type tests at the gene level. Our new approach proposes the use of a shrinkage variance Hotelling T-squared statistic in which the per gene sample covariance matrix is replaced by a shrinkage estimate borrowing strength from across all genes. It is demonstrated that the new statistic retains the F-distribution under the null, with added degrees of freedom in the denominator. Advantages of this class of tests are (i) flexibility in that a whole family of hypothesis tests is possible (ii) the gains of the above-mentioned earlier innovations are enjoyed more fully. This paper summarizes our results and presents a simulation study benchmarking the new statistic against another recently proposed statistic.

[3]  arXiv:1712.02860 [pdf]
Title: Remarks on Bayesian Control Charts
Subjects: Applications (stat.AP); Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC); Probability (math.PR); Economics (q-fin.EC)

There is a considerable amount of ongoing research on the use of Bayesian control charts for detecting a shift from a good quality distribution to a bad quality distribution in univariate and multivariate processes. Monitoring continuous-time multivariate processes by using Bayesian control charts is studied in Makis (2008) [Makis, V. (2008). Multivariate Bayesian control chart. Operations Research, 56(2), 487-496, DOI: 10.1287/opre.1070.0495]. Makis (2008) and some other authors widely claimed that Bayesian control charts were economically optimal, compared to non-Bayesian control charts. This paper first shows that the Bayesian control charts considered by Makis (2008) are not always better than non-Bayesian control charts. Secondly, it demonstrates that the algorithm presented in Makis (2008) to determine the optimal control limits of Bayesian control charts fails to find the true optimal values.

[4]  arXiv:1712.02902 [pdf, other]
Title: Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start
Subjects: Machine Learning (stat.ML)

Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization. Typically, BO is powered by a Gaussian process (GP), whose algorithmic complexity is cubic in the number of evaluations. Hence, GP-based BO cannot leverage large amounts of past or related function evaluations, for example, to warm start the BO procedure. We develop a multiple adaptive Bayesian linear regression model as a scalable alternative whose complexity is linear in the number of observations. The multiple Bayesian linear regression models are coupled through a shared feedforward neural network, which learns a joint representation and transfers knowledge across machine learning problems.

[5]  arXiv:1712.02903 [pdf, other]
Title: Blind Multi-class Ensemble Learning with Unequally Reliable Classifiers
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Signal Processing (eess.SP)

The rising interest in pattern recognition and data analytics has spurred the development of innovative machine learning algorithms and tools. However, as each algorithm has its strengths and limitations, one is motivated to judiciously fuse multiple algorithms in order to find the "best" performing one, for a given dataset. Ensemble learning aims at such high-performance meta-algorithm, by combining the outputs from multiple algorithms. The present work introduces a blind scheme for learning from ensembles of classifiers, using a moment matching method that leverages joint tensor and matrix factorization. Blind refers to the combiner who has no knowledge of the ground-truth labels that each classifier has been trained on. A rigorous performance analysis is derived and the proposed scheme is evaluated on synthetic and real datasets.

[6]  arXiv:1712.02964 [pdf, other]
Title: Bayesian Variable Selection in High Dimensional Survival Time Cancer Genomic Datasets using Nonlocal Priors
Comments: 18 pages, 3 figures
Subjects: Applications (stat.AP); Computation (stat.CO)

Variable selection in high dimensional cancer genomic studies has become very popular in the past decade, due to the interest in discovering significant genes pertinent to a specific cancer type. Censored survival data is the main data structure in such studies and performing variable selection for such data type requires certain methodology. With recent developments in computational power, Bayesian methods have become more attractive in the context of variable selection. In this article we introduce a new Bayesian variable selection approach that exploits a mixture of a point mass at zero and an inverse moment prior which is a non-local prior density on the Cox proportional hazard model coefficients. Our method utilizes parallel computing structure and takes advantage of a stochastic search based method to explore the model space and to circumvent the computationally expensive MCMC procedure. It then reports highest posterior probability model, median probability model and posterior inclusion probability for each covariate in the design matrix. Bayesian model averaging is also exploited for predictive power measurements. The proposed algorithm provides improved performance in identifying true models by reducing estimation and prediction error in simulation studies as well as real genomic datasets. This algorithm is implemented in an R package named BVSNLP.

[7]  arXiv:1712.02990 [pdf, other]
Title: Censored pairwise likelihood-based tests for mixing coefficient of spatial max-mixture models
Subjects: Statistics Theory (math.ST); Probability (math.PR)

Max-mixture processes are defined as Z = max(aX, (1 -- a)Y) with X an asymptotic dependent (AD) process, Y an asymptotic independent (AI) process and a $\in$ [0, 1]. So that, the mixing coefficient a may reveal the strength of the AD part present in the max-mixture process. In this paper we focus on two tests based on censored pairwise likelihood estimates. We compare their performance through an extensive simulation study. Monte Carlo simulation plays a fundamental tool for asymptotic variance calculations. We apply our tests to daily precipitations from the East of Australia. Drawbacks and possible developments are discussed.

[8]  arXiv:1712.03032 [pdf, other]
Title: p-Values for Credibility
Authors: Leonhard Held
Comments: 21 pages, 6 figures
Subjects: Methodology (stat.ME)

Analysis of credibility is a reverse-Bayes technique that has been proposed by Matthews (2001) to overcome some of the shortcomings of significance tests. A significant result is deemed credible if current knowledge about the effect size is in conflict with any sceptical prior that would make the effect non-significant. In this paper I formalize the approach and propose to use Bayesian predictive tail probabilities to quantify the evidence for credibility. This gives rise to a p-value for extrinsic credibility, taking into account both the internal and the external evidence for an effect. The assessment of intrinsic credibility leads to a new threshold for ordinary significance that is remarkably close to the recently proposed 0.005 level. Finally, a p-value for intrinsic credibility is proposed that is a simple function of the ordinary p-value for significance and has a direct frequentist interpretation in terms of the replication probability that a future study under identical conditions will give an estimated effect in the same direction as the first study.

[9]  arXiv:1712.03040 [pdf, other]
Title: Approximation intensity for pairwise interaction Gibbs point processes using determinantal point processes
Subjects: Methodology (stat.ME)

The intensity of a Gibbs point process is usually an intractable function of the model parameters. For repulsive pairwise interaction point processes, this intensity can be expressed as the Laplace transform of some particular function. Baddeley and Nair (2002) developped the Poisson-saddlepoint approximation which consists, for basic models, in calculating this Laplace transform with respect to a homogeneous Poisson point process. In this paper, we develop an approximation which consists in calculating the same Laplace transform with respect to a specific determinantal point process. This new approximation is efficiently implemented and turns out to be more accurate than the Poisson-saddlepoint approximation, as demonstrated by some numerical examples.

[10]  arXiv:1712.03058 [pdf, other]
Title: Iterated filtering methods for Markov process epidemic models
Authors: Theresa Stocks
Comments: This manuscript is a preprint of a chapter to appear in the Handbook of Infectious Disease Data Analysis, Held, L., Hens, N., O'Neill, P.D. and Wallinga, J. (Eds.). Chapman \& Hall/CRC, 2018. Please use the book for possible citations
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)

Dynamic epidemic models have proven valuable for public health decision makers as they provide useful insights into the understanding and prevention of infectious diseases. However, inference for these types of models can be difficult because the disease spread is typically only partially observed e.g. in form of reported incidences in given time periods. This chapter discusses how to perform likelihood-based inference for partially observed Markov epidemic models when it is relatively easy to generate samples from the Markov transmission model while the likelihood function is intractable. The first part of the chapter reviews the theoretical background of inference for partially observed Markov processes (POMP) via iterated filtering. In the second part of the chapter the performance of the method and associated practical difficulties are illustrated on two examples. In the first example a simulated data set consisting of the number of newly reported cases aggregated by week is fitted to a POMP where the underlying disease transmission model is assumed to be a simple Markovian SIR model. The second example illustrates possible model extensions by analyzing German rotavirus surveillance data from 2001 to 2008. Both examples are implemented using the R-package pomp (King et al., 2017) and the code is made available online.

[11]  arXiv:1712.03120 [pdf, other]
Title: Detecting confounding due to subject identification in clinical machine learning diagnostic applications: a permutation test approach
Subjects: Applications (stat.AP)

Recently, Saeb et al (2017) showed that, in diagnostic machine learning applications, having data of each subject randomly assigned to both training and test sets (record-wise data split) can lead to massive underestimation of the cross-validation prediction error, due to the presence of "subject identity confounding" caused by the classifier's ability to identify subjects, instead of recognizing disease. To solve this problem, the authors recommended the random assignment of the data of each subject to either the training or the test set (subject-wise data split). The adoption of subject-wise split has been criticized in Little et al (2017), on the basis that it can violate assumptions required by cross-validation to consistently estimate generalization error. In particular, adopting subject-wise splitting in heterogeneous data-sets might lead to model under-fitting and larger classification errors. Hence, Little et al argue that perhaps the overestimation of prediction errors with subject-wise cross-validation, rather than underestimation with record-wise cross-validation, is the reason for the discrepancies between prediction error estimates generated by the two splitting strategies. In order to shed light on this controversy, we focus on simpler classification performance metrics and develop permutation tests that can detect identity confounding. By focusing on permutation tests, we are able to evaluate the merits of record-wise and subject-wise data splits under more general statistical dependencies and distributional structures of the data, including situations where cross-validation breaks down. We illustrate the application of our tests using synthetic and real data from a Parkinson's disease study.

[12]  arXiv:1712.03134 [pdf, other]
Title: On Adaptive Estimation for Dynamic Bernoulli Bandits
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm's reward distribution can change, and the optimal arm can switch over time. Motivated by many real applications where rewards are binary counts, we focus on dynamic Bernoulli bandits. Standard methods like $\epsilon$-Greedy and Upper Confidence Bound (UCB), which rely on the sample mean estimator, often fail to track the changes in underlying reward for dynamic problems. In this paper, we overcome the shortcoming of slow response to change by deploying adaptive estimation in the standard methods and propose a new family of algorithms, which are adaptive versions of $\epsilon$-Greedy, UCB, and Thompson sampling. These new methods are simple and easy to implement. Moreover, they do not require any prior knowledge about the data, which is important for real applications. We examine the new algorithms numerically in different scenarios and find out that the results show solid improvements of our algorithms in dynamic environments.

[13]  arXiv:1712.03168 [pdf, ps, other]
Title: On optimal policy in the group testing with incomplete identification
Comments: Submitted for publication
Subjects: Other Statistics (stat.OT)

Consider a very large (infinite) population of items, where each item independent from the others is defective with probability p, or good with probability q=1-p. The goal is to identify N good items as quickly as possible. The following group testing policy (policy A) is considered: test items together in the groups, if the test outcome of group i of size n_i is negative, then accept all items in this group as good, otherwise discard the group. Then, move to the next group and continue until exact N good items are found. The goal is to find an optimal testing configuration, i.e., group sizes, under policy A, such that the expected waiting time to obtain N good items is minimal. Recently, Gusev (2012) found an optimal group testing configuration under the assumptions of constant group size and N=\infty. In this note, an optimal solution under policy A for finite N is provided.
Keywords: Dynamic programming; Optimal design; Partition problem; Shur-convexity

[14]  arXiv:1712.03198 [pdf, other]
Title: Using simulation studies to evaluate statistical methods
Subjects: Methodology (stat.ME)

Simulation studies are computer experiments which involve creating data by pseudorandom sampling. The key strength of simulation studies is the ability to understand the behaviour of statistical methods because some `truth' is known from the process of generating the data. This allows us to consider properties of methods, such as bias. While widely used, simulation studies are often poorly designed, analysed and reported. This article outlines the rationale for using simulation studies and offers guidance for design, execution, analysis, reporting and presentation. In particular, we provide: a structured approach for planning and reporting simulation studies; coherent terminology for simulation studies; guidance on coding simulation studies; a critical discussion of key performance measures and their computation; ideas on structuring tabular and graphical presentation of results; and new graphical presentations. With a view to describing current practice and identifying areas for improvement, we review 100 articles taken from Volume 34 of Statistics in Medicine which included at least one simulation study.

Cross-lists for Mon, 11 Dec 17

[15]  arXiv:1712.02854 (cross-list from cs.CV) [pdf, other]
Title: Stochastic reconstruction of an oolitic limestone by generative adversarial networks
Comments: 22 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geo-ph); Machine Learning (stat.ML)

Stochastic image reconstruction is a key part of modern digital rock physics and materials analysis that aims to create numerous representative samples of material micro-structures for upscaling, numerical computation of effective properties and uncertainty quantification. We present a method of three-dimensional stochastic image reconstruction based on generative adversarial neural networks (GANs). GANs represent a framework of unsupervised learning methods that require no a priori inference of the probability distribution associated with the training data. Using a fully convolutional neural network allows fast sampling of large volumetric images.We apply a GAN based workflow of network training and image generation to an oolitic Ketton limestone micro-CT dataset. Minkowski functionals, effective permeability as well as velocity distributions of simulated flow within the acquired images are compared with the synthetic reconstructions generated by the deep neural network. While our results show that GANs allow a fast and accurate reconstruction of the evaluated image dataset, we address a number of open questions and challenges involved in the evaluation of generative network-based methods.

[16]  arXiv:1712.02950 (cross-list from cs.CV) [pdf, other]
Title: CycleGAN: a Master of Steganography
Comments: NIPS 2017, workshop on Machine Deception
Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)

CycleGAN is one of the latest successful approaches to learn a correspondence between two image distributions. In a series of experiments, we demonstrate an intriguing property of the model: CycleGAN learns to "hide" information about a source image inside the generated image in nearly imperceptible, high-frequency noise. This trick ensures that the complementary generator can recover the original sample and thus satisfy the cyclic consistency requirement, but the generated image remains realistic. We connect this phenomenon with adversarial attacks by viewing CycleGAN's training procedure as training a generator of adversarial examples, thereby showing that adversarial attacks are not limited to classifiers but also may target generative models.

[17]  arXiv:1712.03010 (cross-list from cs.LG) [pdf, other]
Title: Stochastic Dual Coordinate Descent with Bandit Sampling
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

Coordinate descent methods minimize a cost function by updating a single decision variable (corresponding to one coordinate) at a time. Ideally, one would update the decision variable that yields the largest marginal decrease in the cost function. However, finding this coordinate would require checking all of them, which is not computationally practical. We instead propose a new adaptive method for coordinate descent. First, we define a lower bound on the decrease of the cost function when a coordinate is updated and, instead of calculating this lower bound for all coordinates, we use a multi-armed bandit algorithm to learn which coordinates result in the largest marginal decrease while simultaneously performing coordinate descent. We show that our approach improves the convergence of the coordinate methods (including parallel versions) both theoretically and experimentally.

[18]  arXiv:1712.03133 (cross-list from cs.CL) [pdf, other]
Title: Building competitive direct acoustics-to-word models for English conversational speech recognition
Comments: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received increasing attention compared to conventional sub-word based automatic speech recognition models using phones, characters, or context-dependent hidden Markov model states. This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple. Prior work has shown that A2W models require orders of magnitude more training data in order to perform comparably to conventional models. Our work also showed this accuracy gap when using the English Switchboard-Fisher data set. This paper describes a recipe to train an A2W model that closes this gap and is at-par with state-of-the-art sub-word based models. We achieve a word error rate of 8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder or language model. We find that model initialization, training data order, and regularization have the most impact on the A2W model performance. Next, we present a joint word-character A2W model that learns to first spell the word and then recognize it. This model provides a rich output to the user instead of simple word hypotheses, making it especially useful in the case of words unseen or rarely-seen during training.

Replacements for Mon, 11 Dec 17

[19]  arXiv:1401.6044 (replaced) [pdf, ps, other]
Title: Sequential Detection of an Abrupt Change in a Random Sequence with Unknown Initial State
Comments: 3 figures, preliminary condensed versions appeared in IEEE ISIT 2014 and Proc. 50th Annual Conference on Information Sciences and Systems (CISS) 2016, and James Falt, Bayesian Detection of a Change in a Random Sequence with Unknown Initial and Final Distributions, Queen's University, M.A.Sc. Thesis, 2017
Subjects: Statistics Theory (math.ST)
[20]  arXiv:1411.4911 (replaced) [pdf, other]
Title: Multivariate Analysis of Mixed Data: The R Package PCAmixdata
Subjects: Computation (stat.CO)
[21]  arXiv:1508.05314 (replaced) [pdf, ps, other]
Title: Asymptotic Efficiency of Godness-of-fit Tests Based on Too-Lin Characterization
Subjects: Methodology (stat.ME)
[22]  arXiv:1605.03795 (replaced) [pdf, other]
Title: Exponential Machines
Comments: ICLR-2017 workshop track paper
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[23]  arXiv:1606.03295 (replaced) [pdf, other]
Title: Simultaneous inference for misaligned multivariate functional data
Comments: 44 pages in total including tables and figures. Additional 9 pages of supplementary material and references
Subjects: Applications (stat.AP)
[24]  arXiv:1606.04819 (replaced) [pdf, other]
Title: Nonparametric Analysis of Random Utility Models
Comments: 54 pages, 2 figures
Subjects: Statistics Theory (math.ST); Econometrics (econ.EM)
[25]  arXiv:1609.05820 (replaced) [pdf, other]
Title: The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences
Comments: Accepted to Communications on Pure and Applied Mathematics
Subjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[26]  arXiv:1701.00299 (replaced) [pdf, other]
Title: Dynamic Deep Neural Networks: Optimizing Accuracy-Efficiency Trade-offs by Selective Execution
Authors: Lanlan Liu, Jia Deng
Comments: To appear in AAAI 2018
Subjects: Learning (cs.LG); Machine Learning (stat.ML)
[27]  arXiv:1703.06633 (replaced) [pdf, other]
Title: Variational inference for probabilistic Poisson PCA
Comments: 26 pages
Subjects: Methodology (stat.ME)
[28]  arXiv:1704.06537 (replaced) [pdf, other]
Title: Estimation of the discontinuous leverage effect: Evidence from the NASDAQ order book
Subjects: Statistics Theory (math.ST)
[29]  arXiv:1705.05543 (replaced) [pdf, other]
Title: In Defense of the Indefensible: A Very Naive Approach to High-Dimensional Inference
Comments: 61 pages, 3 figures, 8 tables
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[30]  arXiv:1706.03922 (replaced) [pdf, other]
Title: Analyzing the Robustness of Nearest Neighbors to Adversarial Examples
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)
[31]  arXiv:1709.00747 (replaced) [pdf, ps, other]
Title: Komlós-Major-Tusnády approximations to increments of uniform empirical processes
Authors: Abdelhakim Necir
Subjects: Statistics Theory (math.ST)
[32]  arXiv:1710.09805 (replaced) [pdf, other]
Title: Improving Negative Sampling for Word Representation using Self-embedded Features
Comments: Accepted in WSDM 2018
Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[33]  arXiv:1711.03946 (replaced) [pdf, other]
Title: Bayesian Paragraph Vectors
Comments: Presented at the NIPS 2017 workshop "Advances in Approximate Bayesian Inference"
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
[34]  arXiv:1711.05828 (replaced) [pdf, other]
Title: BoostJet: Towards Combining Statistical Aggregates with Neural Embeddings for Recommendations
Comments: 9 pages, 9 figures
Subjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
[35]  arXiv:1711.06100 (replaced) [pdf, other]
Title: Sequences, Items And Latent Links: Recommendation With Consumed Item Packs
Comments: 12 pages
Subjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[36]  arXiv:1711.08682 (replaced) [pdf, other]
Title: Deep Video Generation, Prediction and Completion of Human Action Sequences
Comments: Under review for CVPR 2018. Haoye and Chunyan have equal contribution
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[37]  arXiv:1712.00484 (replaced) [pdf, other]
Title: A Pliable Lasso
Subjects: Methodology (stat.ME)
[ total of 37 entries: 1-37 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)