Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Mon, 11 Dec 17
 [1] arXiv:1712.02831 [pdf, other]

Title: RelNN: A Deep Neural Model for Relational LearningComments: 9 pages, 8 figures, accepted at AAAI2018Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Statistical relational AI (StarAI) aims at reasoning and learning in noisy domains described in terms of objects and relationships by combining probability with firstorder logic. With huge advances in deep learning in the current years, combining deep networks with firstorder logic has been the focus of several recent studies. Many of the existing attempts, however, only focus on relations and ignore object properties. The attempts that do consider object properties are limited in terms of modelling power or scalability. In this paper, we develop relational neural networks (RelNNs) by adding hidden layers to relational logistic regression (the relational counterpart of logistic regression). We learn latent properties for objects both directly and through general rules. Backpropagation is used for training these models. A modular, layerwise architecture facilitates utilizing the techniques developed within deep learning community to our architecture. Initial experiments on eight tasks over three realworld datasets show that RelNNs are promising models for relational learning.
 [2] arXiv:1712.02845 [pdf, ps, other]

Title: The Shrinkage Variance Hotelling $T^2$ Test for Genomic Profiling StudiesAuthors: Grant IzmirlianSubjects: Methodology (stat.ME)
Designed gene expression microarray experiments, consisting of several treatment levels with a number of replicates per level, are analyzed by applying simple tests for group differences at the per gene level. The gene level statistics are sorted and a criterion for selecting important genes which takes into account multiplicity is applied. A caveat arises in that true signals (genes truly over or under expressed) are "competing" with fairly large type I error signals. False positives near the top of a sorted list can occur when genes having very small foldchange are compensated by small enough variance to yield a large test statistic. One of the first attempts around this caveat as the development of "significance analysis of microarrays (SAM)", which used a modified ttype statistic thresholded against its permutation distribution. The key innovation of the modified tstatistic was the addition of a constant to the per gene standard errors in order to stabilize the coefficient of variation of the resulting test statistic. Since then, several authors have proposed the use of shrinkage variance estimators in conjunction with ttype, and more generally, ANOVA type tests at the gene level. Our new approach proposes the use of a shrinkage variance Hotelling Tsquared statistic in which the per gene sample covariance matrix is replaced by a shrinkage estimate borrowing strength from across all genes. It is demonstrated that the new statistic retains the Fdistribution under the null, with added degrees of freedom in the denominator. Advantages of this class of tests are (i) flexibility in that a whole family of hypothesis tests is possible (ii) the gains of the abovementioned earlier innovations are enjoyed more fully. This paper summarizes our results and presents a simulation study benchmarking the new statistic against another recently proposed statistic.
 [3] arXiv:1712.02860 [pdf]

Title: Remarks on Bayesian Control ChartsSubjects: Applications (stat.AP); Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC); Probability (math.PR); Economics (qfin.EC)
There is a considerable amount of ongoing research on the use of Bayesian control charts for detecting a shift from a good quality distribution to a bad quality distribution in univariate and multivariate processes. Monitoring continuoustime multivariate processes by using Bayesian control charts is studied in Makis (2008) [Makis, V. (2008). Multivariate Bayesian control chart. Operations Research, 56(2), 487496, DOI: 10.1287/opre.1070.0495]. Makis (2008) and some other authors widely claimed that Bayesian control charts were economically optimal, compared to nonBayesian control charts. This paper first shows that the Bayesian control charts considered by Makis (2008) are not always better than nonBayesian control charts. Secondly, it demonstrates that the algorithm presented in Makis (2008) to determine the optimal control limits of Bayesian control charts fails to find the true optimal values.
 [4] arXiv:1712.02902 [pdf, other]

Title: Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm StartSubjects: Machine Learning (stat.ML)
Bayesian optimization (BO) is a modelbased approach for gradientfree blackbox function optimization. Typically, BO is powered by a Gaussian process (GP), whose algorithmic complexity is cubic in the number of evaluations. Hence, GPbased BO cannot leverage large amounts of past or related function evaluations, for example, to warm start the BO procedure. We develop a multiple adaptive Bayesian linear regression model as a scalable alternative whose complexity is linear in the number of observations. The multiple Bayesian linear regression models are coupled through a shared feedforward neural network, which learns a joint representation and transfers knowledge across machine learning problems.
 [5] arXiv:1712.02903 [pdf, other]

Title: Blind Multiclass Ensemble Learning with Unequally Reliable ClassifiersSubjects: Machine Learning (stat.ML); Learning (cs.LG); Signal Processing (eess.SP)
The rising interest in pattern recognition and data analytics has spurred the development of innovative machine learning algorithms and tools. However, as each algorithm has its strengths and limitations, one is motivated to judiciously fuse multiple algorithms in order to find the "best" performing one, for a given dataset. Ensemble learning aims at such highperformance metaalgorithm, by combining the outputs from multiple algorithms. The present work introduces a blind scheme for learning from ensembles of classifiers, using a moment matching method that leverages joint tensor and matrix factorization. Blind refers to the combiner who has no knowledge of the groundtruth labels that each classifier has been trained on. A rigorous performance analysis is derived and the proposed scheme is evaluated on synthetic and real datasets.
 [6] arXiv:1712.02964 [pdf, other]

Title: Bayesian Variable Selection in High Dimensional Survival Time Cancer Genomic Datasets using Nonlocal PriorsComments: 18 pages, 3 figuresSubjects: Applications (stat.AP); Computation (stat.CO)
Variable selection in high dimensional cancer genomic studies has become very popular in the past decade, due to the interest in discovering significant genes pertinent to a specific cancer type. Censored survival data is the main data structure in such studies and performing variable selection for such data type requires certain methodology. With recent developments in computational power, Bayesian methods have become more attractive in the context of variable selection. In this article we introduce a new Bayesian variable selection approach that exploits a mixture of a point mass at zero and an inverse moment prior which is a nonlocal prior density on the Cox proportional hazard model coefficients. Our method utilizes parallel computing structure and takes advantage of a stochastic search based method to explore the model space and to circumvent the computationally expensive MCMC procedure. It then reports highest posterior probability model, median probability model and posterior inclusion probability for each covariate in the design matrix. Bayesian model averaging is also exploited for predictive power measurements. The proposed algorithm provides improved performance in identifying true models by reducing estimation and prediction error in simulation studies as well as real genomic datasets. This algorithm is implemented in an R package named BVSNLP.
 [7] arXiv:1712.02990 [pdf, other]

Title: Censored pairwise likelihoodbased tests for mixing coefficient of spatial maxmixture modelsSubjects: Statistics Theory (math.ST); Probability (math.PR)
Maxmixture processes are defined as Z = max(aX, (1  a)Y) with X an asymptotic dependent (AD) process, Y an asymptotic independent (AI) process and a $\in$ [0, 1]. So that, the mixing coefficient a may reveal the strength of the AD part present in the maxmixture process. In this paper we focus on two tests based on censored pairwise likelihood estimates. We compare their performance through an extensive simulation study. Monte Carlo simulation plays a fundamental tool for asymptotic variance calculations. We apply our tests to daily precipitations from the East of Australia. Drawbacks and possible developments are discussed.
 [8] arXiv:1712.03032 [pdf, other]

Title: pValues for CredibilityAuthors: Leonhard HeldComments: 21 pages, 6 figuresSubjects: Methodology (stat.ME)
Analysis of credibility is a reverseBayes technique that has been proposed by Matthews (2001) to overcome some of the shortcomings of significance tests. A significant result is deemed credible if current knowledge about the effect size is in conflict with any sceptical prior that would make the effect nonsignificant. In this paper I formalize the approach and propose to use Bayesian predictive tail probabilities to quantify the evidence for credibility. This gives rise to a pvalue for extrinsic credibility, taking into account both the internal and the external evidence for an effect. The assessment of intrinsic credibility leads to a new threshold for ordinary significance that is remarkably close to the recently proposed 0.005 level. Finally, a pvalue for intrinsic credibility is proposed that is a simple function of the ordinary pvalue for significance and has a direct frequentist interpretation in terms of the replication probability that a future study under identical conditions will give an estimated effect in the same direction as the first study.
 [9] arXiv:1712.03040 [pdf, other]

Title: Approximation intensity for pairwise interaction Gibbs point processes using determinantal point processesSubjects: Methodology (stat.ME)
The intensity of a Gibbs point process is usually an intractable function of the model parameters. For repulsive pairwise interaction point processes, this intensity can be expressed as the Laplace transform of some particular function. Baddeley and Nair (2002) developped the Poissonsaddlepoint approximation which consists, for basic models, in calculating this Laplace transform with respect to a homogeneous Poisson point process. In this paper, we develop an approximation which consists in calculating the same Laplace transform with respect to a specific determinantal point process. This new approximation is efficiently implemented and turns out to be more accurate than the Poissonsaddlepoint approximation, as demonstrated by some numerical examples.
 [10] arXiv:1712.03058 [pdf, other]

Title: Iterated filtering methods for Markov process epidemic modelsAuthors: Theresa StocksComments: This manuscript is a preprint of a chapter to appear in the Handbook of Infectious Disease Data Analysis, Held, L., Hens, N., O'Neill, P.D. and Wallinga, J. (Eds.). Chapman \& Hall/CRC, 2018. Please use the book for possible citationsSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Applications (stat.AP)
Dynamic epidemic models have proven valuable for public health decision makers as they provide useful insights into the understanding and prevention of infectious diseases. However, inference for these types of models can be difficult because the disease spread is typically only partially observed e.g. in form of reported incidences in given time periods. This chapter discusses how to perform likelihoodbased inference for partially observed Markov epidemic models when it is relatively easy to generate samples from the Markov transmission model while the likelihood function is intractable. The first part of the chapter reviews the theoretical background of inference for partially observed Markov processes (POMP) via iterated filtering. In the second part of the chapter the performance of the method and associated practical difficulties are illustrated on two examples. In the first example a simulated data set consisting of the number of newly reported cases aggregated by week is fitted to a POMP where the underlying disease transmission model is assumed to be a simple Markovian SIR model. The second example illustrates possible model extensions by analyzing German rotavirus surveillance data from 2001 to 2008. Both examples are implemented using the Rpackage pomp (King et al., 2017) and the code is made available online.
 [11] arXiv:1712.03120 [pdf, other]

Title: Detecting confounding due to subject identification in clinical machine learning diagnostic applications: a permutation test approachAuthors: Elias Chaibub Neto, Abhishek Pratap, Thanneer M Perumal, Meghasyam Tummalacherla, Brian M Bot, Lara Mangravite, Larsson OmbergSubjects: Applications (stat.AP)
Recently, Saeb et al (2017) showed that, in diagnostic machine learning applications, having data of each subject randomly assigned to both training and test sets (recordwise data split) can lead to massive underestimation of the crossvalidation prediction error, due to the presence of "subject identity confounding" caused by the classifier's ability to identify subjects, instead of recognizing disease. To solve this problem, the authors recommended the random assignment of the data of each subject to either the training or the test set (subjectwise data split). The adoption of subjectwise split has been criticized in Little et al (2017), on the basis that it can violate assumptions required by crossvalidation to consistently estimate generalization error. In particular, adopting subjectwise splitting in heterogeneous datasets might lead to model underfitting and larger classification errors. Hence, Little et al argue that perhaps the overestimation of prediction errors with subjectwise crossvalidation, rather than underestimation with recordwise crossvalidation, is the reason for the discrepancies between prediction error estimates generated by the two splitting strategies. In order to shed light on this controversy, we focus on simpler classification performance metrics and develop permutation tests that can detect identity confounding. By focusing on permutation tests, we are able to evaluate the merits of recordwise and subjectwise data splits under more general statistical dependencies and distributional structures of the data, including situations where crossvalidation breaks down. We illustrate the application of our tests using synthetic and real data from a Parkinson's disease study.
 [12] arXiv:1712.03134 [pdf, other]

Title: On Adaptive Estimation for Dynamic Bernoulli BanditsSubjects: Machine Learning (stat.ML); Learning (cs.LG)
The multiarmed bandit (MAB) problem is a classic example of the explorationexploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multiarmed slot machine where each arm is associated with a reward distribution. In static MABs, the reward distributions do not change over time, while in dynamic MABs, each arm's reward distribution can change, and the optimal arm can switch over time. Motivated by many real applications where rewards are binary counts, we focus on dynamic Bernoulli bandits. Standard methods like $\epsilon$Greedy and Upper Confidence Bound (UCB), which rely on the sample mean estimator, often fail to track the changes in underlying reward for dynamic problems. In this paper, we overcome the shortcoming of slow response to change by deploying adaptive estimation in the standard methods and propose a new family of algorithms, which are adaptive versions of $\epsilon$Greedy, UCB, and Thompson sampling. These new methods are simple and easy to implement. Moreover, they do not require any prior knowledge about the data, which is important for real applications. We examine the new algorithms numerically in different scenarios and find out that the results show solid improvements of our algorithms in dynamic environments.
 [13] arXiv:1712.03168 [pdf, ps, other]

Title: On optimal policy in the group testing with incomplete identificationAuthors: Yaakov MalinovskyComments: Submitted for publicationSubjects: Other Statistics (stat.OT)
Consider a very large (infinite) population of items, where each item independent from the others is defective with probability p, or good with probability q=1p. The goal is to identify N good items as quickly as possible. The following group testing policy (policy A) is considered: test items together in the groups, if the test outcome of group i of size n_i is negative, then accept all items in this group as good, otherwise discard the group. Then, move to the next group and continue until exact N good items are found. The goal is to find an optimal testing configuration, i.e., group sizes, under policy A, such that the expected waiting time to obtain N good items is minimal. Recently, Gusev (2012) found an optimal group testing configuration under the assumptions of constant group size and N=\infty. In this note, an optimal solution under policy A for finite N is provided.
Keywords: Dynamic programming; Optimal design; Partition problem; Shurconvexity  [14] arXiv:1712.03198 [pdf, other]

Title: Using simulation studies to evaluate statistical methodsSubjects: Methodology (stat.ME)
Simulation studies are computer experiments which involve creating data by pseudorandom sampling. The key strength of simulation studies is the ability to understand the behaviour of statistical methods because some `truth' is known from the process of generating the data. This allows us to consider properties of methods, such as bias. While widely used, simulation studies are often poorly designed, analysed and reported. This article outlines the rationale for using simulation studies and offers guidance for design, execution, analysis, reporting and presentation. In particular, we provide: a structured approach for planning and reporting simulation studies; coherent terminology for simulation studies; guidance on coding simulation studies; a critical discussion of key performance measures and their computation; ideas on structuring tabular and graphical presentation of results; and new graphical presentations. With a view to describing current practice and identifying areas for improvement, we review 100 articles taken from Volume 34 of Statistics in Medicine which included at least one simulation study.
Crosslists for Mon, 11 Dec 17
 [15] arXiv:1712.02854 (crosslist from cs.CV) [pdf, other]

Title: Stochastic reconstruction of an oolitic limestone by generative adversarial networksComments: 22 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Geophysics (physics.geoph); Machine Learning (stat.ML)
Stochastic image reconstruction is a key part of modern digital rock physics and materials analysis that aims to create numerous representative samples of material microstructures for upscaling, numerical computation of effective properties and uncertainty quantification. We present a method of threedimensional stochastic image reconstruction based on generative adversarial neural networks (GANs). GANs represent a framework of unsupervised learning methods that require no a priori inference of the probability distribution associated with the training data. Using a fully convolutional neural network allows fast sampling of large volumetric images.We apply a GAN based workflow of network training and image generation to an oolitic Ketton limestone microCT dataset. Minkowski functionals, effective permeability as well as velocity distributions of simulated flow within the acquired images are compared with the synthetic reconstructions generated by the deep neural network. While our results show that GANs allow a fast and accurate reconstruction of the evaluated image dataset, we address a number of open questions and challenges involved in the evaluation of generative networkbased methods.
 [16] arXiv:1712.02950 (crosslist from cs.CV) [pdf, other]

Title: CycleGAN: a Master of SteganographyComments: NIPS 2017, workshop on Machine DeceptionSubjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
CycleGAN is one of the latest successful approaches to learn a correspondence between two image distributions. In a series of experiments, we demonstrate an intriguing property of the model: CycleGAN learns to "hide" information about a source image inside the generated image in nearly imperceptible, highfrequency noise. This trick ensures that the complementary generator can recover the original sample and thus satisfy the cyclic consistency requirement, but the generated image remains realistic. We connect this phenomenon with adversarial attacks by viewing CycleGAN's training procedure as training a generator of adversarial examples, thereby showing that adversarial attacks are not limited to classifiers but also may target generative models.
 [17] arXiv:1712.03010 (crosslist from cs.LG) [pdf, other]

Title: Stochastic Dual Coordinate Descent with Bandit SamplingSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Coordinate descent methods minimize a cost function by updating a single decision variable (corresponding to one coordinate) at a time. Ideally, one would update the decision variable that yields the largest marginal decrease in the cost function. However, finding this coordinate would require checking all of them, which is not computationally practical. We instead propose a new adaptive method for coordinate descent. First, we define a lower bound on the decrease of the cost function when a coordinate is updated and, instead of calculating this lower bound for all coordinates, we use a multiarmed bandit algorithm to learn which coordinates result in the largest marginal decrease while simultaneously performing coordinate descent. We show that our approach improves the convergence of the coordinate methods (including parallel versions) both theoretically and experimentally.
 [18] arXiv:1712.03133 (crosslist from cs.CL) [pdf, other]

Title: Building competitive direct acousticstoword models for English conversational speech recognitionComments: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Direct acousticstoword (A2W) models in the endtoend paradigm have received increasing attention compared to conventional subword based automatic speech recognition models using phones, characters, or contextdependent hidden Markov model states. This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externallytrained language model, making training and decoding with such models simple. Prior work has shown that A2W models require orders of magnitude more training data in order to perform comparably to conventional models. Our work also showed this accuracy gap when using the English SwitchboardFisher data set. This paper describes a recipe to train an A2W model that closes this gap and is atpar with stateoftheart subword based models. We achieve a word error rate of 8.8%/13.9% on the Hub52000 Switchboard/CallHome test sets without any decoder or language model. We find that model initialization, training data order, and regularization have the most impact on the A2W model performance. Next, we present a joint wordcharacter A2W model that learns to first spell the word and then recognize it. This model provides a rich output to the user instead of simple word hypotheses, making it especially useful in the case of words unseen or rarelyseen during training.
Replacements for Mon, 11 Dec 17
 [19] arXiv:1401.6044 (replaced) [pdf, ps, other]

Title: Sequential Detection of an Abrupt Change in a Random Sequence with Unknown Initial StateComments: 3 figures, preliminary condensed versions appeared in IEEE ISIT 2014 and Proc. 50th Annual Conference on Information Sciences and Systems (CISS) 2016, and James Falt, Bayesian Detection of a Change in a Random Sequence with Unknown Initial and Final Distributions, Queen's University, M.A.Sc. Thesis, 2017Subjects: Statistics Theory (math.ST)
 [20] arXiv:1411.4911 (replaced) [pdf, other]

Title: Multivariate Analysis of Mixed Data: The R Package PCAmixdataSubjects: Computation (stat.CO)
 [21] arXiv:1508.05314 (replaced) [pdf, ps, other]

Title: Asymptotic Efficiency of Godnessoffit Tests Based on TooLin CharacterizationAuthors: Bojana MiloševićSubjects: Methodology (stat.ME)
 [22] arXiv:1605.03795 (replaced) [pdf, other]

Title: Exponential MachinesComments: ICLR2017 workshop track paperSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [23] arXiv:1606.03295 (replaced) [pdf, other]

Title: Simultaneous inference for misaligned multivariate functional dataComments: 44 pages in total including tables and figures. Additional 9 pages of supplementary material and referencesSubjects: Applications (stat.AP)
 [24] arXiv:1606.04819 (replaced) [pdf, other]

Title: Nonparametric Analysis of Random Utility ModelsComments: 54 pages, 2 figuresSubjects: Statistics Theory (math.ST); Econometrics (econ.EM)
 [25] arXiv:1609.05820 (replaced) [pdf, other]

Title: The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise DifferencesComments: Accepted to Communications on Pure and Applied MathematicsSubjects: Information Theory (cs.IT); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [26] arXiv:1701.00299 (replaced) [pdf, other]

Title: Dynamic Deep Neural Networks: Optimizing AccuracyEfficiency Tradeoffs by Selective ExecutionComments: To appear in AAAI 2018Subjects: Learning (cs.LG); Machine Learning (stat.ML)
 [27] arXiv:1703.06633 (replaced) [pdf, other]

Title: Variational inference for probabilistic Poisson PCAComments: 26 pagesSubjects: Methodology (stat.ME)
 [28] arXiv:1704.06537 (replaced) [pdf, other]

Title: Estimation of the discontinuous leverage effect: Evidence from the NASDAQ order bookSubjects: Statistics Theory (math.ST)
 [29] arXiv:1705.05543 (replaced) [pdf, other]

Title: In Defense of the Indefensible: A Very Naive Approach to HighDimensional InferenceComments: 61 pages, 3 figures, 8 tablesSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [30] arXiv:1706.03922 (replaced) [pdf, other]

Title: Analyzing the Robustness of Nearest Neighbors to Adversarial ExamplesSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)
 [31] arXiv:1709.00747 (replaced) [pdf, ps, other]

Title: KomlósMajorTusnády approximations to increments of uniform empirical processesAuthors: Abdelhakim NecirSubjects: Statistics Theory (math.ST)
 [32] arXiv:1710.09805 (replaced) [pdf, other]

Title: Improving Negative Sampling for Word Representation using Selfembedded FeaturesComments: Accepted in WSDM 2018Subjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [33] arXiv:1711.03946 (replaced) [pdf, other]

Title: Bayesian Paragraph VectorsComments: Presented at the NIPS 2017 workshop "Advances in Approximate Bayesian Inference"Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
 [34] arXiv:1711.05828 (replaced) [pdf, other]

Title: BoostJet: Towards Combining Statistical Aggregates with Neural Embeddings for RecommendationsComments: 9 pages, 9 figuresSubjects: Information Retrieval (cs.IR); Learning (cs.LG); Machine Learning (stat.ML)
 [35] arXiv:1711.06100 (replaced) [pdf, other]

Title: Sequences, Items And Latent Links: Recommendation With Consumed Item PacksComments: 12 pagesSubjects: Information Retrieval (cs.IR); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
 [36] arXiv:1711.08682 (replaced) [pdf, other]

Title: Deep Video Generation, Prediction and Completion of Human Action SequencesComments: Under review for CVPR 2018. Haoye and Chunyan have equal contributionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [37] arXiv:1712.00484 (replaced) [pdf, other]
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)