Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 22 Jun 18
 [1] arXiv:1806.07921 [pdf, ps, other]

Title: Beta seasonal autoregressive moving average modelsComments: 26 pages, 5 figures, 4 tablesJournalref: Journal of Statistical Computation and Simulation, 2018Subjects: Methodology (stat.ME)
In this paper we introduce the class of beta seasonal autoregressive moving average ($\beta$SARMA) models for modeling and forecasting time series data that assume values in the standard unit interval. It generalizes the class of beta autoregressive moving average models [Rocha and CribariNeto, Test, 2009] by incorporating seasonal dynamics to the model dynamic structure. Besides introducing the new class of models, we develop parameter estimation, hypothesis testing inference, and diagnostic analysis tools. We also discuss outofsample forecasting. In particular, we provide closedform expressions for the conditional score vector and for the conditional Fisher information matrix. We also evaluate the finite sample performances of conditional maximum likelihood estimators and white noise tests using Monte Carlo simulations. An empirical application is presented and discussed.
 [2] arXiv:1806.07934 [pdf, other]

Title: A Function Emulation Approach for Intractable DistributionsComments: 32 pages, 1 figureSubjects: Computation (stat.CO)
Doubly intractable distributions arise in many settings, for example in Markov models for point processes and exponential random graph models for networks. Bayesian inference for these models is challenging because they involve intractable normalising "constants" that are actually functions of the parameters of interest. Although several clever computational methods have been developed for these models, each method suffers from computational issues that makes it computationally burdensome or even infeasible for many problems. We propose a novel algorithm that provides computational gains over existing methods by replacing Monte Carlo approximations to the normalising function with a Gaussian processbased approximation. We provide theoretical justification for this method. We also develop a closely related algorithm that is applicable more broadly to any likelihood function that is expensive to evaluate. We illustrate the application of our methods to a variety of challenging simulated and real data examples, including an exponential random graph model, a Markov point process, and a model for infectious disease dynamics. The algorithm shows significant gains in computational efficiency over existing methods, and has the potential for greater gains for more challenging problems. For a random graph model example, we show how this gain in efficiency allows us to carry out accurate Bayesian inference when other algorithms are computationally impractical.
 [3] arXiv:1806.08010 [pdf, other]

Title: Fairness Without Demographics in Repeated Loss MinimizationComments: To appear ICML 2018Subjects: Machine Learning (stat.ML); Learning (cs.LG)
Machine learning models (e.g., speech recognizers) are usually trained to minimize average loss, which results in representation disparityminority groups (e.g., nonnative speakers) contribute less to the training objective and thus tend to suffer higher loss. Worse, as model accuracy affects user retention, a minority group can shrink over time. In this paper, we first show that the status quo of empirical risk minimization (ERM) amplifies representation disparity over time, which can even make initially fair models unfair. To mitigate this, we develop an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution. We prove that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice, while remaining oblivious to the identity of the groups. We demonstrate that DRO prevents disparity amplification on examples where ERM fails, and show improvements in minority group user satisfaction in a realworld text autocomplete task.
 [4] arXiv:1806.08031 [pdf, ps, other]

Title: A Constructive Algebraic Proof of Student's TheoremAuthors: Yiping ChengComments: 4 pages, no figureSubjects: Other Statistics (stat.OT)
Student's theorem is an important result in statistics which states that for normal population, the sample variance is independent from the sample mean and has a chisquare distribution. The existing proofs of this theorem either overly rely on advanced tools such as moment generating functions, or fail to explicitly construct an orthogonal matrix used in the proof. This paper provides an elegant explicit construction of that matrix, making the algebraic proof complete. The constructive algebraic proof proposed here is thus very suitable for being included in textbooks.
 [5] arXiv:1806.08059 [pdf, other]

Title: Avoiding Bias Due to Nonrandom Scheduling When Modeling Trends in HomeField AdvantageAuthors: Andrew T. KarlSubjects: Applications (stat.AP)
Existing approaches for estimating homefield advantage (HFA) include modeling the difference between home and away scores as a function of the difference between home and away team ratings that are treated either as fixed or random effects. We uncover an upward bias in the mixed model HFA estimates that is due to the nonrandom structure of the schedule  and thus the random effect design matrix  and explore why the fixed effects model is not subject to the same bias. Intraconference HFAs and standard errors are calculated for each of 3 college sports and 3 professional sports over 18 seasons and then fitted with conferencespecific slopes and intercepts to measure the potential linear population trend in HFA.
 [6] arXiv:1806.08069 [pdf, other]

Title: Deep Gaussian ProcessBased Bayesian Inference for Contaminant Source LocalizationComments: 28 pages, 14 figures, submitted to IEEE AccessSubjects: Applications (stat.AP)
This paper proposes a Bayesian framework for localization of multiple sources in the event of accidental hazardous contaminant release. The framework assimilates sensor measurements of the contaminant concentration with an integrated multizone computational fluid dynamics (multizoneCFD) based contaminant fate and transport model. To ensure online tractability, the framework uses deep Gaussian process (DGP) based emulator of the multizoneCFD model. To effectively represent the transient response of the multizoneCFD model, the DGP emulator is reformulated using a matrixvariate Gaussian process prior. The resultant deep matrixvariate Gaussian process emulator (DMGPE) is used to define the likelihood of the Bayesian framework, while Markov Chain Monte Carlo approach is used to sample from the posterior distribution. The proposed method is evaluated for single and multiple contaminant sources localization tasks modeled by CONTAM simulator in a singlestory building of 30 zones, demonstrating that proposed approach accurately perform inference on locations of contaminant sources. Moreover, the DMGP emulator outperforms both GP and DGP emulator with fewer number of hyperparameters.
 [7] arXiv:1806.08117 [pdf, other]

Title: A datadriven model order reduction approach for Stokes flow through random porous mediaComments: 2 pages, 2 figuresSubjects: Machine Learning (stat.ML); Computational Engineering, Finance, and Science (cs.CE); Learning (cs.LG)
Direct numerical simulation of Stokes flow through an impermeable, rigid body matrix by finite elements requires meshes fine enough to resolve the poresize scale and is thus a computationally expensive task. The cost is significantly amplified when randomness in the pore microstructure is present and therefore multiple simulations need to be carried out. It is well known that in the limit of scaleseparation, Stokes flow can be accurately approximated by Darcy's law with an effective diffusivity field depending on viscosity and the porematrix topology. We propose a fully probabilistic, Darcytype, reducedorder model which, based on only a few tens of fullorder Stokes model runs, is capable of learning a map from the finescale topology to the effective diffusivity and is maximally predictive of the finescale response. The reducedorder model learned can significantly accelerate uncertainty quantification tasks as well as provide quantitative confidence metrics of the predictive estimates produced.
 [8] arXiv:1806.08141 [pdf, other]

Title: SlicedWasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and DiffusionsComments: 27 pagesSubjects: Machine Learning (stat.ML); Learning (cs.LG)
By building up on the recent theory that established the connection between implicit generative modeling and optimal transport, in this study, we propose a novel parameterfree algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem, where the resulting algorithm resembles the recent dynamicsbased Markov Chain Monte Carlo algorithms. We provide formal theoretical analysis where we prove finitetime error guarantees for the proposed algorithm. Our experimental results support our theory and shows that our algorithm is able to capture the structure of challenging distributions.
 [9] arXiv:1806.08144 [pdf, ps, other]

Title: Maximal skewness projections for scale mixtures of skewnormal vectorsSubjects: Methodology (stat.ME)
Multivariate scale mixtures of skewnormal (SMSN) variables are flexible models that account for nonnormality in multivariate data scenarios by tail weight assessment and a shape vector representing the asymmetry of the model in a directional fashion. Its stochastic representation involves a skewnormal (SN) vector and a non negative mixing scalar variable, independent of the SN vector, that injects kurtosis into the SMSN model. We address the problem of finding the maximal skewness projection for vectors that follow a SMSN distribution; when simple conditions on the moments of the mixing variable are fulfilled, it can be shown that the direction yielding the maximal skewness is proportional to the shape vector. This finding stresses the directional nature of the asymmetry in this class of distributions; it also provides the theoretical foundations for solving the skewness model based projection pursuit for SMSN vectors. Some examples that show the validity of our theoretical findings for the most famous distributions within the SMSN family are also given. For the sake of completeness we carry out a simulation experiment with artificial data, which sheds light on the usefulness and implications of our result in the statistical practice.
 [10] arXiv:1806.08151 [pdf, ps, other]

Title: Robust and Efficient Boosting Method using the Conditional RiskComments: 14 Pages, 2 figures and 5 tablesSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Wellknown for its simplicity and effectiveness in classification, AdaBoost, however, suffers from overfitting when classconditional distributions have significant overlap. Moreover, it is very sensitive to noise that appears in the labels. This article tackles the above limitations simultaneously via optimizing a modified loss function (i.e., the conditional risk). The proposed approach has the following two advantages. (1) It is able to directly take into account label uncertainty with an associated label confidence. (2) It introduces a "trustworthiness" measure on training samples via the Bayesian risk rule, and hence the resulting classifier tends to have finite sample performance that is superior to that of the original AdaBoost when there is a large overlap between class conditional distributions. Theoretical properties of the proposed method are investigated. Extensive experimental results using synthetic data and realworld data sets from UCI machine learning repository are provided. The empirical study shows the high competitiveness of the proposed method in predication accuracy and robustness when compared with the original AdaBoost and several existing robust AdaBoost algorithms.
 [11] arXiv:1806.08156 [pdf, ps, other]

Title: Identifiability of Gaussian Structural Equation Models with Dependent Errors Having Equal VariancesAuthors: Jose M. PeñaJournalref: 7th Causal Inference Workshop at UAI 2018Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)
In this paper, we prove that some Gaussian structural equation models with dependent errors having equal variances are identifiable from their corresponding Gaussian distributions. Specifically, we prove identifiability for the Gaussian structural equation models that can be represented as AnderssonMadiganPerlman chain graphs (Andersson et al., 2001). These chain graphs were originally developed to represent independence models. However, they are also suitable for representing causal models with additive noise (Pe\~{n}a, 2016. Our result implies then that these causal models can be identified from observational data alone. Our result generalizes the result by Peters and B\"{u}hlmann (2014), who considered independent errors having equal variances. The suitability of the equal error variances assumption should be assessed on a per domain basis.
 [12] arXiv:1806.08195 [pdf, other]

Title: Probabilistic PARAFAC2Authors: Philip J. H. Jørgensen, Søren F. V. Nielsen, Jesper L. Hinrich, Mikkel N. Schmidt, Kristoffer H. Madsen, Morten MørupComments: 16 pages (incl. 4 pages of supplemental material), 5 figuresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
The PARAFAC2 is a multimodal factor analysis model suitable for analyzing multiway data when one of the modes has incomparable observation units, for example because of differences in signal sampling or batch sizes. A fully probabilistic treatment of the PARAFAC2 is desirable in order to improve robustness to noise and provide a well founded principle for determining the number of factors, but challenging because the factor loadings are constrained to be orthogonal. We develop two probabilistic formulations of the PARAFAC2 along with variational procedures for inference: In the one approach, the mean values of the factor loadings are orthogonal leading to closed form variational updates, and in the other, the factor loadings themselves are orthogonal using a matrix Von MisesFisher distribution. We contrast our probabilistic formulation to the conventional direct fitting algorithm based on maximum likelihood. On simulated data and real fluorescence spectroscopy and gas chromatographymass spectrometry data, we compare our approach to the conventional PARAFAC2 model estimation and find that the probabilistic formulation is more robust to noise and model order misspecification. The probabilistic PARAFAC2 thus forms a promising framework for modeling multiway data accounting for uncertainty.
 [13] arXiv:1806.08200 [pdf, other]

Title: Mixtures of Experts ModelsComments: A chapter prepared for the forthcoming Handbook of Mixture AnalysisSubjects: Methodology (stat.ME)
Mixtures of experts models provide a framework in which covariates may be included in mixture models. This is achieved by modelling the parameters of the mixture model as functions of the concomitant covariates. Given their mixture model foundation, mixtures of experts models possess a diverse range of analytic uses, from clustering observations to capturing parameter heterogeneity in crosssectional data. This chapter focuses on delineating the mixture of experts modelling framework and demonstrates the utility and flexibility of mixtures of experts models as an analytic tool.
 [14] arXiv:1806.08212 [pdf, other]

Title: A Review of Network Inference Techniques for Neural Activation Time SeriesAuthors: George PanagopoulosComments: 8 pages, 2 figuresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Studying neural connectivity is considered one of the most promising and challenging areas of modern neuroscience. The underpinnings of cognition are hidden in the way neurons interact with each other. However, our experimental methods of studying real neural connections at a microscopic level are still arduous and costly. An efficient alternative is to infer connectivity based on the neuronal activations using computational methods. A reliable method for network inference, would not only facilitate research of neural circuits without the need of laborious experiments but also reveal insights on the underlying mechanisms of the brain. In this work, we perform a review of methods for neural circuit inference given the activation time series of the neural population. Approaching it from machine learning perspective, we divide the methodologies into unsupervised and supervised learning. The methods are based on correlation metrics, probabilistic point processes, and neural networks. Furthermore, we add a data mining methodology inspired by influence estimation in social networks as a new supervised learning approach. For comparison, we use the small version of the Chalearn Connectomics competition, that is accompanied with ground truth connections between neurons. The experiments indicate that unsupervised learning methods perform better, however, supervised methods could surpass them given enough data and resources.
 [15] arXiv:1806.08258 [pdf, other]

Title: Subgroup Identification using Covariate Adjusted Interaction TreesSubjects: Methodology (stat.ME)
We consider the problem of identifying subgroups of participants in a clinical trial that have enhanced treatment effect. Recursive partitioning methods that recursively partition the covariate space based on some measure of between groups treatment effect difference are popular for such subgroup identification. The most commonly used recursive partitioning method, the classification and regression tree algorithm, first creates a large tree by recursively partitioning the covariate space using some splitting criteria and then selects the final tree from all subtrees of the large tree. In the context of subgroup identification, calculation of the splitting criteria and the evaluation measure used for final tree selection rely on comparing differences in means between the treatment and control arm. When covariates are prognostic for the outcome, covariate adjusted estimators have the ability to improve efficiency compared to using differences in means between the treatment and control group. This manuscript develops two covariate adjusted estimators that can be used to both make splitting decisions and for final tree selection. The performance of the resulting covariate adjusted recursive partitioning algorithm is evaluated using simulations and by analyzing a clinical trial that evaluates if motivational interviews improve treatment engagement for substance abusers.
 [16] arXiv:1806.08301 [pdf, ps, other]

Title: Online Saddle Point Problem with Applications to Constrained Online Convex OptimizationSubjects: Machine Learning (stat.ML); Learning (cs.LG); Optimization and Control (math.OC)
We study an online saddle point problem where at each iteration a pair of actions need to be chosen without knowledge of the future (convexconcave) payoff functions. The objective is to minimize the gap between the cumulative payoffs and the saddle point value of the aggregate payoff function, which we measure using a metric called "SPregret". The problem generalizes the online convex optimization framework and can be interpreted as finding the Nash equilibrium for the aggregate of a sequence of twoplayer zerosum games. We propose an algorithm that achieves $\tilde{O}(\sqrt{T})$ SPregret in the general case, and $O(\log T)$ SPregret for the strongly convexconcave case. We then consider a constrained online convex optimization problem motivated by a variety of applications in dynamic pricing, auctions, and crowdsourcing. We relate this problem to an online saddle point problem and establish $O(\sqrt{T})$ regret using a primaldual algorithm.
 [17] arXiv:1806.08307 [pdf, other]

Title: WIKS: A general Bayesian nonparametric index for quantifying differences between two populationsSubjects: Statistics Theory (math.ST)
The problem of deciding whether two samples arise from the same distribution is often the question of interest in many research investigations. Numerous statistical methods have been devoted to this issue, but only few of them have considered a Bayesian nonparametric approach. We propose a nonparametric Bayesian index (WIKS) which has the goal of quantifying the difference between two populations $P_1$ and $P_2$ based on samples from them. The WIKS index is defined by a weighted posterior expectation of the KolmogorovSmirnov distance between $P_1$ and $P_2$ and, differently from most existing approaches, can be easily computed using any prior distribution over $(P_1,P_2)$. Moreover, WIKS is fast to compute and can be justified under a Bayesian decisiontheoretic framework. We present a simulation study that indicates that the WIKS method is more powerful than competing approaches in several settings, even in multivariate settings. We also prove that WIKS is a consistent procedure and controls the level of significance uniformly over the null hypothesis. Finally, we apply WIKS to a data set of scale measurements of three different groups of patients submitted to a questionnaire for Alzheimer diagnostic.
 [18] arXiv:1806.08317 [pdf, other]

Title: FashionGen: The Generative Fashion Dataset and ChallengeAuthors: Negar Rostamzadeh, Seyedarian Hosseini, Thomas Boquet, Wojciech Stokowiec, Ying Zhang, Christian Jauvin, Chris PalSubjects: Machine Learning (stat.ML); Learning (cs.LG)
We introduce a new dataset of 293,008 high definition (1360 x 1360 pixels) fashion images paired with item descriptions provided by professional stylists. Each item is photographed from a variety of angles. We provide baseline results on 1) highresolution image generation, and 2) image generation conditioned on the given text descriptions. We invite the community to improve upon these baselines. In this paper, we also outline the details of a challenge that we are launching based upon this dataset.
 [19] arXiv:1806.08320 [pdf, other]

Title: A Guide to GeneralPurpose Approximate Bayesian Computation SoftwareSubjects: Computation (stat.CO)
This Chapter, "A Guide to GeneralPurpose ABC Software", is to appear in the forthcoming Handbook of Approximate Bayesian Computation (2018). We present generalpurpose software to perform Approximate Bayesian Computation (ABC) as implemented in the Rpackages abc and EasyABC and the c++ program ABCtoolbox. With simple toy models we demonstrate how to perform parameter inference, model selection, validation and optimal choice of summary statistics. We demonstrate how to combine ABC with Markov Chain Monte Carlo and describe a realistic population genetics application.
Crosslists for Fri, 22 Jun 18
 [20] arXiv:1806.07908 (crosslist from cs.LG) [pdf, other]

Title: Como funciona o Deep LearningComments: Book chapter, in Portuguese, 31 pagesJournalref: In: T\'opicos em Gerenciamento de Dados e Informa\c{c}\~oes, SBC, Cap.3, ISBN 9788576694007, pp.6393, 2017Subjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Deep Learning methods are currently the stateoftheart in many problems which can be tackled via machine learning, in particular classification problems. However there is still lack of understanding on how those methods work, why they work and what are the limitations involved in using them. In this chapter we will describe in detail the transition from shallow to deep networks, include examples of code on how to implement them, as well as the main issues one faces when training a deep network. Afterwards, we introduce some theoretical background behind the use of deep models, and discuss their limitations.
 [21] arXiv:1806.07937 (crosslist from cs.LG) [pdf, other]

Title: A Dissection of Overfitting and Generalization in Continuous Reinforcement LearningComments: 18 pages, 14 figuresSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The risks and perils of overfitting in machine learning are well known. However most of the treatment of this, including diagnostic tools and remedies, was developed for the supervised learning case. In this work, we aim to offer new perspectives on the characterization and prevention of overfitting in deep Reinforcement Learning (RL) methods, with a particular focus on continuous domains. We examine several aspects, such as how to define and diagnose overfitting in MDPs, and how to reduce risks by injecting sufficient training diversity. This work complements recent findings on the brittleness of deep RL methods and offers practical observations for RL researchers and practitioners.
 [22] arXiv:1806.07944 (crosslist from cs.SI) [pdf, ps, other]

Title: Searching for a Single Community in a GraphComments: ACM Journal on Modeling and Performance Evaluation of Computing Systems (TOMPECS) [to appear]Subjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
In standard graph clustering/community detection, one is interested in partitioning the graph into more densely connected subsets of nodes. In contrast, the "search" problem of this paper aims to only find the nodes in a "single" such community, the target, out of the many communities that may exist. To do so , we are given suitable side information about the target; for example, a very small number of nodes from the target are labeled as such.
We consider a general yet simple notion of side information: all nodes are assumed to have random weights, with nodes in the target having higher weights on average. Given these weights and the graph, we develop a variant of the method of moments that identifies nodes in the target more reliably, and with lower computation, than generic community detection methods that do not use side information and partition the entire graph. Our empirical results show significant gains in runtime, and also gains in accuracy over other graph clustering algorithms.  [23] arXiv:1806.07956 (crosslist from cs.SI) [pdf, other]

Title: Reconstructing networks with unknown and heterogeneous errorsAuthors: Tiago P. PeixotoComments: 27 pages, 17 figuresSubjects: Social and Information Networks (cs.SI); Learning (cs.LG); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
The vast majority of network datasets contains errors and omissions, although this is rarely incorporated in traditional network analysis. Recently, an increasing effort has been made to fill this methodological gap by developing network reconstruction approaches based on Bayesian inference. These approaches, however, rely on assumptions of uniform error rates and on direct estimations of the existence of each edge via repeated measurements, something that is currently unavailable for the majority of network data. Here we develop a Bayesian reconstruction approach that lifts these limitations by not only allowing for heterogeneous errors, but also for individual edge measurements without direct error estimates. Our approach works by coupling the inference approach with structured generative network models, which enable the correlations between edges to be used as reliable error estimates. Although our approach is general, we focus on the stochastic block model as the basic generative process, from which efficient nonparametric inference can be performed, and yields a principled method to infer hierarchical community structure from noisy data. We demonstrate the efficacy of our approach with a variety of empirical and artificial networks.
 [24] arXiv:1806.07963 (crosslist from cs.SI) [pdf, other]

Title: Latent heterogeneous multilayer community detectionSubjects: Social and Information Networks (cs.SI); Learning (cs.LG); Machine Learning (stat.ML)
We propose a method for simultaneously detecting shared and unshared communities in heterogeneous multilayer weighted and undirected networks. The multilayer network is assumed to follow a generative probabilistic model that takes into account the similarities and dissimilarities between the communities. We make use of a variational Bayes approach for jointly inferring the shared and unshared hidden communities from multilayer network observations. We show the robustness of our approach compared to stateofthe art algorithms in detecting disparate (shared and private) communities on synthetic data as well as on real genomewide fibroblast proliferation dataset.
 [25] arXiv:1806.07978 (crosslist from cs.LG) [pdf, other]

Title: The Corpus Replication TaskAuthors: Tobias EichingerComments: the references might not render appropriately. contact the author for detailsSubjects: Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
In the field of Natural Language Processing (NLP), we revisit the wellknown word embedding algorithm word2vec. Word embeddings identify words by vectors such that the words' distributional similarity is captured. Unexpectedly, besides semantic similarity even relational similarity has been shown to be captured in word embeddings generated by word2vec, whence two questions arise. Firstly, which kind of relations are representable in continuous space and secondly, how are relations built. In order to tackle these questions we propose a bottomup point of view. We call generating input text for which word2vec outputs target relations solving the Corpus Replication Task. Deeming generalizations of this approach to any set of relations possible, we expect solving of the Corpus Replication Task to provide partial answers to the questions.
 [26] arXiv:1806.08028 (crosslist from cs.LG) [pdf, other]

Title: Gradient Adversarial Training of Neural NetworksComments: 13 pages, 4 figuresSubjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose gradient adversarial training, an auxiliary deep learning framework applicable to different machine learning problems. In gradient adversarial training, we leverage a prior belief that in many contexts, simultaneous gradient updates should be statistically indistinguishable from each other. We enforce this consistency using an auxiliary network that classifies the origin of the gradient tensor, and the main network serves as an adversary to the auxiliary network in addition to performing standard taskbased training. We demonstrate gradient adversarial training for three different scenarios: (1) as a defense to adversarial examples we classify gradient tensors and tune them to be agnostic to the class of their corresponding example, (2) for knowledge distillation, we do binary classification of gradient tensors derived from the student or teacher network and tune the student gradient tensor to mimic the teacher's gradient tensor; and (3) for multitask learning we classify the gradient tensors derived from different task loss functions and tune them to be statistically indistinguishable. For each of the three scenarios we show the potential of gradient adversarial training procedure. Specifically, gradient adversarial training increases the robustness of a network to adversarial attacks, is able to better distill the knowledge from a teacher network to a student network compared to soft targets, and boosts multitask learning by aligning the gradient tensors derived from the task specific loss functions. Overall, our experiments demonstrate that gradient tensors contain latent information about whatever tasks are being trained, and can support diverse machine learning problems when intelligently guided through adversarialization using a auxiliary network.
 [27] arXiv:1806.08049 (crosslist from cs.LG) [pdf, other]

Title: On the Robustness of Interpretability MethodsComments: presented at 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, SwedenSubjects: Learning (cs.LG); Machine Learning (stat.ML)
We argue that robustness of explanationsi.e., that similar inputs should give rise to similar explanationsis a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.
 [28] arXiv:1806.08065 (crosslist from cs.LG) [pdf, other]

Title: Learning Cognitive Models using Neural NetworksSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
A cognitive model of human learning provides information about skills a learner must acquire to perform accurately in a task domain. Cognitive models of learning are not only of scientific interest, but are also valuable in adaptive online tutoring systems. A more accurate model yields more effective tutoring through better instructional decisions. Prior methods of automated cognitive model discovery have typically focused on wellstructured domains, relied on student performance data or involved substantial human knowledge engineering. In this paper, we propose Cognitive Representation Learner (CogRL), a novel framework to learn accurate cognitive models in illstructured domains with no data and little to no human knowledge engineering. Our contribution is twofold: firstly, we show that representations learnt using CogRL can be used for accurate automatic cognitive model discovery without using any student performance data in several illstructured domains: Rumble Blocks, Chinese Character, and Article Selection. This is especially effective and useful in domains where an accurate humanauthored cognitive model is unavailable or authoring a cognitive model is difficult. Secondly, for domains where a cognitive model is available, we show that representations learned through CogRL can be used to get accurate estimates of skill difficulty and learning rate parameters without using any student performance data. These estimates are shown to highly correlate with estimates using student performance data on an Article Selection dataset.
 [29] arXiv:1806.08079 (crosslist from cs.LG) [pdf, other]

Title: GrCAN: Gradient Boost Convolutional Autoencoder with Neural Decision ForestSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Random forest and deep neural network are two schools of effective classification methods in machine learning. While the random forest is robust irrespective of the data domain, the deep neural network has advantages in handling high dimensional data. In view that a differentiable neural decision forest can be added to the neural network to fully exploit the benefits of both models, in our work, we further combine convolutional autoencoder with neural decision forest, where autoencoder has its advantages in finding the hidden representations of the input data. We develop a gradient boost module and embed it into the proposed convolutional autoencoder with neural decision forest to improve the performance. The idea of gradient boost is to learn and use the residual in the prediction. In addition, we design a structure to learn the parameters of the neural decision forest and gradient boost module at contiguous steps. The extensive experiments on several public datasets demonstrate that our proposed model achieves good efficiency and prediction performance compared with a series of baseline methods.
 [30] arXiv:1806.08160 (crosslist from math.PR) [pdf, ps, other]

Title: Sharp large deviations for the drift parameter of the explosive CoxIngersollRoss processAuthors: marie du Roy de ChaumaraySubjects: Probability (math.PR); Statistics Theory (math.ST)
We consider a nonstationary CoxIngersollRoss process. We establish a sharp large deviation principle for the maximum likelihood estimator of its drift parameter.
 [31] arXiv:1806.08235 (crosslist from cs.CV) [pdf, other]

Title: Semisupervised Seizure Prediction with Generative Adversarial NetworksComments: 6 pages, 5 figures, 3 tables. arXiv admin note: text overlap with arXiv:1707.01976Subjects: Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)
In this article, we propose an approach that can make use of not only labeled EEG signals but also the unlabeled ones which is more accessible. We also suggest the use of data fusion to further improve the seizure prediction accuracy. Data fusion in our vision includes EEG signals, cardiogram signals, body temperature and time. We use the shorttime Fourier transform on 28s EEG windows as a preprocessing step. A generative adversarial network (GAN) is trained in an unsupervised manner where information of seizure onset is disregarded. The trained Discriminator of the GAN is then used as feature extractor. Features generated by the feature extractor are classified by two fullyconnected layers (can be replaced by any classifier) for the labeled EEG signals. This semisupervised seizure prediction method achieves area under the operating characteristic curve (AUC) of 77.68% and 75.47% for the CHBMIT scalp EEG dataset and the Freiburg Hospital intracranial EEG dataset, respectively. Unsupervised training without the need of labeling is important because not only it can be performed in realtime during EEG signal recording, but also it does not require feature engineering effort for each patient.
 [32] arXiv:1806.08240 (crosslist from cs.LG) [pdf, other]

Title: InfoCatVAE: Representation Learning with Categorical Variational AutoencodersComments: 9 pages, 3 appendix, 5 figures. arXiv admin note: text overlap with arXiv:1606.03657 by other authorsSubjects: Learning (cs.LG); Machine Learning (stat.ML)
This paper describes InfoCatVAE, an extension of the variational autoencoder that enables unsupervised disentangled representation learning. InfoCatVAE uses multimodal distributions for the prior and the inference network and then maximizes the evidence lower bound objective (ELBO). We connect the new ELBO derived for our model with a natural soft clustering objective which explains the robustness of our approach. We then adapt the InfoGANs method to our setting in order to maximize the mutual information between the categorical code and the generated inputs and obtain an improved model.
 [33] arXiv:1806.08267 (crosslist from cs.LG) [pdf, other]

Title: Gated Complex Recurrent Neural NetworksSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Complex numbers have long been favoured for digital signal processing, yet complex representations rarely appear in deep learning architectures. RNNs, widely used to process time series and sequence information, could greatly benefit from complex representations. We present a novel complex gate recurrent cell. When used together with normpreserving state transition matrices, our complex gated RNN exhibits excellent stability and convergence properties. We demonstrate competitive performance of our complex gated RNN on the synthetic memory and adding task, as well as on the realworld task of human motion prediction.
 [34] arXiv:1806.08295 (crosslist from cs.LG) [pdf, other]

Title: How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning ExperimentsSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Consistently checking the statistical significance of experimental results is one of the mandatory methodological steps to address the socalled "reproducibility crisis" in deep reinforcement learning. In this tutorial paper, we explain how to determine the number of random seeds one should use to provide a statistically significant comparison of the performance of two algorithms. We also discuss the influence of deviations from the assumptions usually made by statistical tests, we provide guidelines to counter their negative effects and some code to perform the tests.
 [35] arXiv:1806.08297 (crosslist from cs.FL) [pdf, other]

Title: Learning Graph Weighted Models on PicturesSubjects: Formal Languages and Automata Theory (cs.FL); Learning (cs.LG); Machine Learning (stat.ML)
Graph Weighted Models (GWMs) have recently been proposed as a natural generalization of weighted automata over strings and trees to arbitrary families of labeled graphs (and hypergraphs). A GWM generically associates a labeled graph with a tensor network and computes a value by successive contractions directed by its edges. In this paper, we consider the problem of learning GWMs defined over the graph family of pictures (or 2dimensional words). As a proof of concept, we consider regression and classification tasks over the simple Bars & Stripes and Shifting Bits picture languages and provide an experimental study investigating whether these languages can be learned in the form of a GWM from positive and negative examples using gradientbased methods. Our results suggest that this is indeed possible and that investigating the use of gradientbased methods to learn picture series and functions computed by GWMs over other families of graphs could be a fruitful direction.
 [36] arXiv:1806.08324 (crosslist from cs.LG) [pdf, other]

Title: Countdown Regression: Sharp and Calibrated Survival PredictionsSubjects: Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
Personalized probabilistic forecasts of time to event (such as mortality) can be crucial in decision making, especially in the clinical setting. Inspired by ideas from the meteorology literature, we approach this problem through the paradigm of maximizing sharpness of prediction distributions, subject to calibration. In regression problems, it has been shown that optimizing the continuous ranked probability score (CRPS) instead of maximum likelihood leads to sharper prediction distributions while maintaining calibration. We introduce the SurvivalCRPS, a generalization of the CRPS to the time to event setting, and present rightcensored and intervalcensored variants. To holistically evaluate the quality of predicted distributions over time to event, we present the SurvivalAUPRC evaluation metric, an analog to area under the precisionrecall curve. We apply these ideas by building a recurrent neural network for mortality prediction, using an Electronic Health Record dataset covering millions of patients. We demonstrate significant benefits in models trained by the SurvivalCRPS objective instead of maximum likelihood.
 [37] arXiv:1806.08340 (crosslist from cs.LG) [pdf, other]

Title: Interpretable Discovery in Large Image Data SetsComments: Presented at the 2018 ICML Workshop on Human Interpretability in Machine Learning (WHI 2018), Stockholm, SwedenSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Automated detection of new, interesting, unusual, or anomalous images within large data sets has great value for applications from surveillance (e.g., airport security) to science (observations that don't fit a given theory can lead to new discoveries). Many image data analysis systems are turning to convolutional neural networks (CNNs) to represent image content due to their success in achieving high classification accuracy rates. However, CNN representations are notoriously difficult for humans to interpret. We describe a new strategy that combines novelty detection with CNN image features to achieve rapid discovery with interpretable explanations of novel image content. We applied this technique to familiar images from ImageNet as well as to a scientific image collection from planetary science.
 [38] arXiv:1806.08342 (crosslist from cs.LG) [pdf, other]

Title: Quantizing deep convolutional networks for efficient inference: A whitepaperAuthors: Raghuraman KrishnamoorthiComments: 37 pagesSubjects: Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Perchannel quantization of weights and perlayer quantization of activations to 8bits of precision posttraining produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8bits, even when 8bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX.
Quantizationaware training can provide further improvements, reducing the gap to floating point to 1% at 8bit precision. Quantizationaware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantizationaware training to obtain high accuracy with quantized weights and activations. We recommend that perchannel quantization of weights and perlayer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.  [39] arXiv:1806.08354 (crosslist from cs.CV) [pdf, other]

Title: Learning Instance Segmentation by InteractionAuthors: Deepak Pathak, Yide Shentu, Dian Chen, Pulkit Agrawal, Trevor Darrell, Sergey Levine, Jitendra MalikComments: Website at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely selfsupervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. To deal with noisy training signal for segmenting objects obtained by selfsupervised interactions, we propose robust set loss. A dataset of robot's interactions alongwith a few human labeled examples is provided as a benchmark for future research. We test the utility of the learned segmentation model by providing results on a downstream visionbased control task of rearranging multiple objects into target configurations from visual inputs alone. Videos, code, and robotic interaction dataset are available at https://pathak22.github.io/segbyinteraction/
Replacements for Fri, 22 Jun 18
 [40] arXiv:1105.2454 (replaced) [pdf, other]

Title: Highdimensional instrumental variables regression and confidence setsSubjects: Statistics Theory (math.ST)
 [41] arXiv:1606.03275 (replaced) [pdf, other]

Title: Analysis of the maximal posterior partition in the Dirichlet Process Gaussian Mixture ModelAuthors: Łukasz RajkowskiComments: 50 pages, 7 figuresSubjects: Statistics Theory (math.ST)
 [42] arXiv:1611.08618 (replaced) [pdf, other]

Title: A Benchmark and Comparison of Active Learning for Logistic RegressionComments: accepted by Pattern RecognitionSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [43] arXiv:1703.02111 (replaced) [pdf, other]

Title: Classification and clustering for observations of event time data using nonhomogeneous Poisson process modelsComments: cleaned up figures and textSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [44] arXiv:1706.04546 (replaced) [pdf, other]

Title: Reinforcement Learning with BudgetConstrained Nonparametric Function Approximation for Opportunistic Spectrum AccessComments: 6 pages, submittedSubjects: Information Theory (cs.IT); Learning (cs.LG); Machine Learning (stat.ML)
 [45] arXiv:1707.05745 (replaced) [pdf, other]

Title: Modeling temporal treatment effects with zero inflated semiparametric regression models: the case of local development policies in FranceSubjects: Applications (stat.AP)
 [46] arXiv:1708.02883 (replaced) [pdf, other]

Title: Maximum Volume Inscribed Ellipsoid: A New SimplexStructured Matrix Factorization Framework via Facet Enumeration and Convex OptimizationSubjects: Machine Learning (stat.ML)
 [47] arXiv:1710.08269 (replaced) [pdf, other]

Title: A PottsMixture Spatiotemporal Joint Model for Combined MEG and EEG DataSubjects: Applications (stat.AP)
 [48] arXiv:1712.06695 (replaced) [pdf, other]

Title: Accurate Inference for Adaptive Linear ModelsComments: 20 pages; Updated after acceptance to ICML 2018Subjects: Machine Learning (stat.ML); Learning (cs.LG)
 [49] arXiv:1801.01973 (replaced) [pdf, other]

Title: A Note on the Inception ScoreComments: Proc. ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative ModelsSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [50] arXiv:1802.06054 (replaced) [pdf, other]

Title: Learning Patterns for Detection with Multiscale Scan StatisticsAuthors: James SharpnackSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Methodology (stat.ME)
 [51] arXiv:1803.05112 (replaced) [pdf, other]

Title: Uplift Modeling from Separate LabelsComments: 21 pages, 7 figuresSubjects: Machine Learning (stat.ML)
 [52] arXiv:1805.01532 (replaced) [pdf, other]

Title: Lifted Neural NetworksSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [53] arXiv:1805.01907 (replaced) [pdf, other]

Title: Exploration by Distributional Reinforcement LearningComments: IJCAI 2018Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [54] arXiv:1805.03963 (replaced) [pdf, ps, other]

Title: Monotone Learning with Rectified Wire NetworksComments: 37 pages, 19 figures, improved section 3Subjects: Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [55] arXiv:1806.00730 (replaced) [pdf, other]

Title: Minnorm training: an algorithm for training overparameterized deep neural networksSubjects: Machine Learning (stat.ML); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
 [56] arXiv:1806.01811 (replaced) [pdf, other]

Title: AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initializationComments: 17 pages, 3 figuresSubjects: Machine Learning (stat.ML); Learning (cs.LG)
 [57] arXiv:1806.01845 (replaced) [pdf, other]

Title: Deep Neural Networks with MultiBranch Architectures Are Less NonConvexComments: 26 pages, 6 figures, 3 tables; v2 fixes some typosSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [58] arXiv:1806.02199 (replaced) [pdf, other]

Title: Deep SelfOrganization: Interpretable Discrete Representation Learning on Time SeriesSubjects: Learning (cs.LG); Machine Learning (stat.ML)
 [59] arXiv:1806.05769 (replaced) [pdf, other]

Title: Bayesian Uncertainty Quantification and Information Fusion in CALPHADbased Thermodynamic ModelingAuthors: Pejman Honarmandi, Thien Chi Duong, Seyede Fatemeh Ghoreishi, Dou Allaire, Raymundo ArroyaveComments: 22 pages, 8 FiguresSubjects: Materials Science (condmat.mtrlsci); Applications (stat.AP)
 [60] arXiv:1806.06784 (replaced) [pdf, other]

Title: Flexible Collaborative Estimation of the Average Causal Effect of a Treatment using the OutcomeHighlyAdaptive LassoComments: The first two authors contributed equally to this workSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
 [61] arXiv:1806.07172 (replaced) [pdf, ps, other]

Title: Surrogate Outcomes and TransportabilityComments: Submitted to International Journal of Approximate ReasoningSubjects: Artificial Intelligence (cs.AI); Methodology (stat.ME)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)