Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Thu, 9 Jul 20
 [1] arXiv:2007.03814 [pdf, ps, other]

Title: A Variational Formula for Rényi DivergencesComments: 11 pages, 2 figuresSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR)
We derive a new variational formula for the R\'enyi family of divergences, $R_\alpha(Q\P)$, generalizing the classical DonskerVaradhan variational formula for the KullbackLeibler divergence. The objective functional in this new variational representation is expressed in terms of expectations under $Q$ and $P$, and hence can be estimated using samples from the two distributions. We illustrate the utility of such a variational formula by constructing neuralnetwork estimators for the R\'enyi divergences.
 [2] arXiv:2007.03898 [pdf, other]

Title: NVAE: A Deep Hierarchical Variational AutoencoderComments: Some images are downsized to meet arXiv requirements. Check this https URL for a highresolution version (24 MB)Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energybased models are among competing likelihoodbased frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easytoaccess encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depthwise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves stateoftheart results among nonautoregressive likelihoodbased models on the MNIST, CIFAR10, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR10, NVAE pushes the stateoftheart from 2.98 to 2.91 bits per dimension, and it produces highquality images on CelebA HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels.
 [3] arXiv:2007.04005 [pdf, other]

Title: Statistical postprocessing of wind speed forecasts using convolutional neural networksComments: 44 pages, 5 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.aoph); Applications (stat.AP)
Current statistical postprocessing methods for probabilistic weather forecasting are not capable of using full spatial patterns from the numerical weather prediction (NWP) model. In this paper we incorporate spatial wind speed information by using convolutional neural networks (CNNs) and obtain probabilistic wind speed forecasts in the Netherlands for 48 hours ahead, based on KNMI's HarmonieArome NWP model. The CNNs are shown to have higher Brier skill scores for medium to higher wind speeds, as well as a better continuous ranked probability score (CRPS), than fully connected neural networks and quantile regression forests.
 [4] arXiv:2007.04006 [pdf, other]

Title: Accelerated Sparse Bayesian Learning via Screening Test and Its ApplicationsComments: 15 pages, 23 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In highdimensional settings, sparse structures are critical for efficiency in term of memory and computation complexity. For a linear system, to find the sparsest solution provided with an overcomplete dictionary of features directly is typically NPhard, and thus alternative approximate methods should be considered. In this paper, our choice for alternative method is sparse Bayesian learning, which, as empirical Bayesian approaches, uses a parameterized prior to encourage sparsity in solution, rather than the other methods with fixed priors such as LASSO. Screening test, however, aims at quickly identifying a subset of features whose coefficients are guaranteed to be zero in the optimal solution, and then can be safely removed from the complete dictionary to obtain a smaller, more easily solved problem. Next, we solve the smaller problem, after which the solution of the original problem can be recovered by padding the smaller solution with zeros. The performance of the proposed method will be examined on various data sets and applications.
 [5] arXiv:2007.04131 [pdf, other]

Title: Pitfalls to Avoid when Interpreting Machine Learning ModelsAuthors: Christoph Molnar, Gunnar König, Julia Herbinger, Timo Freiesleben, Susanne Dandl, Christian A. Scholbeck, Giuseppe Casalicchio, Moritz GrosseWentrup, Bernd BischlComments: This article was accepted at the ICML 2020 workshop XXAI: Extending Explainable AI Beyond Deep Models and Classifiers (see this http URL )Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Modern requirements for machine learning (ML) models include both high predictive performance and model interpretability. A growing number of techniques provide model interpretations, but can lead to wrong conclusions if applied incorrectly. We illustrate pitfalls of ML model interpretation such as bad model generalization, dependent features, feature interactions or unjustified causal interpretations. Our paper addresses ML practitioners by raising awareness of pitfalls and pointing out solutions for correct model interpretation, as well as ML researchers by discussing open issues for further research.
 [6] arXiv:2007.04287 [pdf, other]

Title: Learning from DPPs via Sampling: Beyond HKPV and symmetrySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral metaalgorithm due to Hough, Krishnapur, Peres and Vir\'ag (henceforth HKPV), which is in general time and resource intensive. For DPPs with symmetric kernels, scalable HKPV samplers have been proposed that either first downsample the ground set of items, or force the kernel to be lowrank, using e.g. Nystr\"omtype decompositions.
In the present work, we contribute a radically different approach than HKPV. Exploiting the fact that many statistical and learning objectives can be effectively accomplished by only sampling certain key observables of a DPP (socalled linear statistics), we invoke an expression for the Laplace transform of such an observable as a single determinant, which holds in complete generality. Combining traditional lowrank approximation techniques with Laplace inversion algorithms from numerical analysis, we show how to directly approximate the distribution function of a linear statistic of a DPP. This distribution function can then be used in hypothesis testing or to actually sample the linear statistic, as per requirement. Our approach is scalable and applies to very general DPPs, beyond traditional symmetric kernels.
Crosslists for Thu, 9 Jul 20
 [7] arXiv:2007.03681 (crosslist from stat.ME) [pdf, other]

Title: Fast Bayesian Estimation of Spatial Count Data ModelsSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
Spatial count data models are used to explain and predict the frequency of phenomena such as traffic accidents in geographically distinct entities such as census tracts or road segments. These models are typically estimated using Bayesian Markov chain Monte Carlo (MCMC) simulation methods, which, however, are computationally expensive and do not scale well to large datasets. Variational Bayes (VB), a method from machine learning, addresses the shortcomings of MCMC by casting Bayesian estimation as an optimisation problem instead of a simulation problem. In this paper, we derive a VB method for posterior inference in negative binomial models with unobserved parameter heterogeneity and spatial dependence. The proposed method uses PolyaGamma augmentation to deal with the nonconjugacy of the negative binomial likelihood and an integrated nonfactorised specification of the variational distribution to capture posterior dependencies. We demonstrate the benefits of the approach using simulated data and real data on youth pedestrian injury counts in the census tracts of New York City boroughs Bronx and Manhattan. The empirical analysis suggests that the VB approach is between 7 and 13 times faster than MCMC on a regular eightcore processor, while offering similar estimation and predictive accuracy. Conditional on the availability of computational resources, the embarrassingly parallel architecture of the proposed VB method can be exploited to further accelerate the estimation by up to 100 times.
 [8] arXiv:2007.03714 (crosslist from cs.LG) [pdf, other]

Title: Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)Comments: 72 pages, 1 figureSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
Gradient descent yields zero training loss in polynomial time for deep neural networks despite nonconvex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in \cite{Jacot2018Neural}. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in \cite{Huang2019Dynamics}. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width $m$ with respect to the number of training samples $n$ from quartic to cubic. Our analysis suggests strongly that the particular skipconnection structure of ResNet is the main reason for its triumph over fullyconnected network.
 [9] arXiv:2007.03722 (crosslist from stat.AP) [pdf, other]

Title: Learning excursion sets of vectorvalued Gaussian random fields for autonomous ocean samplingSubjects: Applications (stat.AP); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
Improving and optimizing oceanographic sampling is a crucial task for marine science and maritime resource management. Faced with limited resources in understanding processes in the watercolumn, the combination of statistics and autonomous systems provide new opportunities for experimental design. In this work we develop efficient spatial sampling methods for characterizing regions defined by simultaneous exceedances above prescribed thresholds of several responses, with an application focus on mapping coastal ocean phenomena based on temperature and salinity measurements. Specifically, we define a design criterion based on uncertainty in the excursions of vectorvalued Gaussian random fields, and derive tractable expressions for the expected integrated Bernoulli variance reduction in such a framework. We demonstrate how this criterion can be used to prioritize sampling efforts at locations that are ambiguous, making exploration more effective. We use simulations to study and compare properties of the considered approaches, followed by results from field deployments with an autonomous underwater vehicle as part of a study mapping the boundary of a river plume. The results demonstrate the potential of combining statistical methods and robotic platforms to effectively inform and execute datadriven environmental sampling.
 [10] arXiv:2007.03742 (crosslist from cs.LG) [pdf, other]

Title: Metaactive Learning in ProbabilisticallySafe OptimizationAuthors: Mariah L. Schrum, Mark Connolly, Eric Cole, Mihir Ghetiya, Robert Gross, Matthew C. GombolayComments: 9 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning to control a safetycritical system with latent dynamics (e.g. for deep brain stimulation) requires taking calculated risks to gain information as efficiently as possible. To address this problem, we present a probabilisticallysafe, metaactive learning approach to efficiently learn system dynamics and optimal configurations. We cast this problem as metalearning an acquisition function, which is represented by a LongShort Term Memory Network (LSTM) encoding sampling history. This acquisition function is metalearned offline to learn high quality sampling strategies. We employ a mixedinteger linear program as our policy with the final, linearized layers of our LSTM acquisition function directly encoded into the objective to trade off expected information gain (e.g., improvement in the accuracy of the model of system dynamics) with the likelihood of safe control. We set a new stateoftheart in active learning for control of a highdimensional system with altered dynamics (i.e., a damaged aircraft), achieving a 46% increase in information gain and a 20% speedup in computation time over baselines. Furthermore, we demonstrate our system's ability to learn the optimal parameter settings for deep brain stimulation in a rat's brain while avoiding unwanted side effects (i.e., triggering seizures), outperforming prior stateoftheart approaches with a 58% increase in information gain. Additionally, our algorithm achieves a 97% likelihood of terminating in a safe state while losing only 15% of information gain.
 [11] arXiv:2007.03744 (crosslist from eess.SP) [pdf, other]

Title: Predictive Analytics for Water Asset Management: Machine Learning and Survival AnalysisComments: 19 pages, 7 figuresSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
Understanding performance and prioritizing resources for the maintenance of the drinkingwater pipe network throughout its lifecycle is a key part of water asset management. Renovation of this vital network is generally hindered by the difficulty or impossibility to gain physical access to the pipes. We study a statistical and machine learning framework for the prediction of water pipe failures. We employ classical and modern classifiers for a shortterm prediction and survival analysis to provide a broader perspective and longterm forecast, usually needed for the economic analysis of the renovation. To enrich these models, we introduce new predictors based on water distribution domain knowledge and employ a modern oversampling technique to remedy the high imbalance coming from the few failures observed each year. For our case study, we use a dataset containing the failure records of all pipes within the water distribution network in Barcelona, Spain. The results shed light on the effect of important risk factors, such as pipe geometry, age, material, and soil cover, among others, and can help utility managers conduct more informed predictive maintenance tasks.
 [12] arXiv:2007.03746 (crosslist from eess.SP) [pdf, ps, other]

Title: Transfer Learning for BrainComputer Interfaces: A Complete PipelineSubjects: Signal Processing (eess.SP); HumanComputer Interaction (cs.HC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Transfer learning (TL) has been widely used in electroencephalogram (EEG) based braincomputer interfaces (BCIs) to reduce the calibration effort for a new subject, and demonstrated promising performance. After EEG signal acquisition, a closedloop EEGbased BCI system also includes signal processing, feature engineering, and classification/regression blocks before sending out the control signal, whereas previous approaches only considered TL in one or two such components. This paper proposes that TL could be considered in all three components (signal processing, feature engineering, and classification/regression). Furthermore, it is also very important to specifically add a data alignment component before signal processing to make the data from different subjects more consistent, and hence to facilitate subsequential TL. Offline calibration experiments on two MI datasets verified our proposal. Especially, integrating data alignment and sophisticated TL approaches can significantly improve the classification performance, and hence greatly reduce the calibration effort.
 [13] arXiv:2007.03747 (crosslist from eess.SP) [pdf, ps, other]

Title: On Cokriging, Neural Networks, and Spatial Blind Source Separation for Multivariate Spatial PredictionSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
Multivariate measurements taken at irregularly sampled locations are a common form of data, for example in geochemical analysis of soil. In practical considerations predictions of these measurements at unobserved locations are of great interest. For standard multivariate spatial prediction methods it is mandatory to not only model spatial dependencies but also crossdependencies which makes it a demanding task. Recently, a blind source separation approach for spatial data was suggested. When using this spatial blind source separation method prior the actual spatial prediction, modelling of spatial crossdependencies is avoided, which in turn simplifies the spatial prediction task significantly. In this paper we investigate the use of spatial blind source separation as a preprocessing tool for spatial prediction and compare it with predictions from Cokriging and neural networks in an extensive simulation study as well as a geochemical dataset.
 [14] arXiv:2007.03749 (crosslist from cs.LG) [pdf, ps, other]

Title: Sharp Analysis of Smoothed Bellman Error EmbeddingComments: Accepted at the ICML 2020 Workshop on Theoretical Foundations of Reinforcement LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation. It has been successfully implemented with neural networks and achieved strong empirical results. In this work, we study the theoretical behavior of SBEED in batchmode reinforcement learning. We prove a nearoptimal performance guarantee that depends on the representation power of the used function classes and a tight notion of the distribution shift. Our results improve upon prior guarantees for SBEED in ~\citet{dai2018sbeed} in terms of the dependence on the planning horizon and on the sample size. Our analysis builds on the recent work of ~\citet{Xie2020} which studies a related algorithm MSBO, that could be interpreted as a \textit{nonsmooth} counterpart of SBEED.
 [15] arXiv:2007.03758 (crosslist from cs.CE) [pdf, other]

Title: Deep learning of thermodynamicsaware reducedorder models from dataComments: 16 pages, 7 figuresSubjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Machine Learning (stat.ML)
We present an algorithm to learn the relevant latent variables of a largescale discretized physical system and predict its time evolution using thermodynamicallyconsistent deep neural networks. Our method relies on sparse autoencoders, which reduce the dimensionality of the full order model to a set of sparse latent variables with no prior knowledge of the coded space dimensionality. Then, a second neural network is trained to learn the metriplectic structure of those reduced physical variables and predict its time evolution with a socalled structurepreserving neural network. This databased integrator is guaranteed to conserve the total energy of the system and the entropy inequality, and can be applied to both conservative and dissipative systems. The integrated paths can then be decoded to the original fulldimensional manifold and be compared to the ground truth solution. This method is tested with two examples applied to fluid and solid mechanics.
 [16] arXiv:2007.03760 (crosslist from cs.LG) [pdf, ps, other]

Title: Near Optimal Provable Uniform Convergence in OffPolicy Evaluation for Reinforcement LearningComments: Appendix includedSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The OffPolicy Evaluation aims at estimating the performance of target policy $\pi$ using offline data rolled in by a logging policy $\mu$. Intensive studies have been conducted and the recent marginalized importance sampling (MIS) achieves the sample efficiency for OPE. However, it is rarely known if uniform convergence guarantees in OPE can be obtained efficiently. In this paper, we consider this new question and reveal the comprehensive relationship between OPE and offline learning for the first time. For the global policy class, by using the fully modelbased OPE estimator, our best result is able to achieve $\epsilon$uniform convergence with complexity $\widetilde{O}(H^3\cdot\min(S,H)/d_m\epsilon^2)$, where $d_m$ is an instancedependent quantity decided by $\mu$. This result is only one factor away from our uniform convergence lower bound up to a logarithmic factor. For the local policy class, $\epsilon$uniform convergence is achieved with the optimal complexity $\widetilde{O}(H^3/d_m\epsilon^2)$ in the offpolicy setting. This result complements the work of sparse modelbased planning (Agarwal et al. 2019) with generative model. Lastly, one interesting corollary of our intermediate result implies a refined analysis over simulation lemma.
 [17] arXiv:2007.03762 (crosslist from eess.SP) [pdf, other]

Title: Transfer Learning for Electricity Price ForecastingSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
Electricity price forecasting is an essential task for all the deregulated markets of the world. The accurate prediction of the dayahead electricity prices is an active research field and available data from various markets can be used as an input for forecasting. A collection of models have been proposed for this task, but the fundamental question on how to use the available big data is often neglected. In this paper, we propose to use transfer learning as a tool for utilizing information from other electricity price markets for forecasting. We pretrain a bidirectional Gated Recurrent Units (BGRU) network on source markets and finally do a finetuning for the target market. Moreover, we test different ways to use the input data from various markets in the models. Our experiments on five different dayahead markets indicate that transfer learning improves the performance of electricity price forecasting in a statistically significant manner.
 [18] arXiv:2007.03767 (crosslist from cs.LG) [pdf, other]

Title: Defending Against Backdoors in Federated Learning with Robust Learning RateSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Federated Learning (FL) allows a set of agents to collaboratively train a model in a decentralized fashion without sharing their potentially sensitive data. This makes FL suitable for privacypreserving applications. At the same time, FL is susceptible to adversarial attacks due to decentralized and unvetted data. One important line of attacks against FL is the backdoor attacks. In a backdoor attack, an adversary tries to embed a backdoor trigger functionality to the model during training which can later be activated to cause a desired misclassification. To prevent such backdoor attacks, we propose a lightweight defense that requires no change to the FL structure. At a high level, our defense is based on carefully adjusting the server's learning rate, per dimension, at each round based on the sign information of agent's updates. We first conjecture the necessary steps to carry a successful backdoor attack in FL setting, and then, explicitly formulate the defense based on our conjecture. Through experiments, we provide empirical evidence to the support of our conjecture. We test our defense against backdoor attacks under different settings, and, observe that either backdoor is completely eliminated, or its accuracy is significantly reduced. Overall, our experiments suggests that our approach significantly outperforms some of the recently proposed defenses in the literature. We achieve this by having minimal influence over the accuracy of the trained models.
 [19] arXiv:2007.03774 (crosslist from cs.CL) [pdf, other]

Title: The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brainAuthors: Xin WangComments: 9 pages, 5 figures, 1 tableSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Neurons and Cognition (qbio.NC); Machine Learning (stat.ML)
In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression. Just like perceptual and cognitive neurophysiology has inspired effective deep neural network architectures which in turn make a useful model for understanding the brain, here we explore how biological neural development might inspire efficient and robust optimization procedures which in turn serve as a useful model for the maturation and aging of the brain.
 [20] arXiv:2007.03775 (crosslist from cs.LG) [pdf, other]

Title: README: REpresentation learning by fairnessAware Disentangling MEthodComments: 8 pages, 3 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Fair representation learning aims to encode invariant representation with respect to the protected attribute, such as gender or age. In this paper, we design Fairnessaware Disentangling Variational AutoEncoder (FDVAE) for fair representation learning. This network disentangles latent space into three subspaces with a decorrelation loss that encourages each subspace to contain independent information: 1) target attribute information, 2) protected attribute information, 3) mutual attribute information. After the representation learning, this disentangled representation is leveraged for fairer downstream classification by excluding the subspace with the protected attribute information. We demonstrate the effectiveness of our model through extensive experiments on CelebA and UTK Face datasets. Our method outperforms the previous stateoftheart method by large margins in terms of equal opportunity and equalized odds.
 [21] arXiv:2007.03795 (crosslist from cs.LG) [pdf, other]

Title: Conditional gradient methods for stochastically constrained convex minimizationSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We propose two novel conditional gradientbased methods for solving structured stochastic convex optimization problems with a large number of linear constraints. Instances of this template naturally arise from SDPrelaxations of combinatorial problems, which involve a number of constraints that is polynomial in the problem dimension. The most important feature of our framework is that only a subset of the constraints is processed at each iteration, thus gaining a computational advantage over prior works that require full passes. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees. Preliminary numerical experiments are provided for illustrating the practical performance of the methods.
 [22] arXiv:2007.03797 (crosslist from cs.LG) [pdf, other]

Title: Personalized Federated Learning: An Attentive Collaboration ApproachComments: Under reviewSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
For the challenging computational environment of IOT/edge computing, personalized federated learning allows every client to train a strong personalized cloud model by effectively collaborating with the other clients in a privacypreserving manner. The performance of personalized federated learning is largely determined by the effectiveness of interclient collaboration. However, when the data is nonIID across all clients, it is challenging to infer the collaboration relationships between clients without knowing their data distributions. In this paper, we propose to tackle this problem by a novel framework named federated attentive message passing (FedAMP) that allows each client to collaboratively train its own personalized cloud model without using a global model. FedAMP implements an attentive collaboration mechanism by iteratively encouraging clients with more similar model parameters to have stronger collaborations. This adaptively discovers the underlying collaboration relationships between clients, which significantly boosts effectiveness of collaboration and leads to the outstanding performance of FedAMP. We establish the convergence of FedAMP for both convex and nonconvex models, and further propose a heuristic method that resembles the FedAMP framework to further improve its performance for federated learning with deep neural networks. Extensive experiments demonstrate the superior performance of our methods in handling nonIID data, dirty data and dropped clients.
 [23] arXiv:2007.03800 (crosslist from cs.LG) [pdf, ps, other]

Title: Efficient and Parallel Separable Dictionary LearningSubjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Separable, or Kronecker product, dictionaries provide natural decompositions for 2D signals, such as images. In this paper, we describe an algorithm to learn such dictionaries which is highly parallelizable and which reaches sparse representations competitive with the previous state of the art dictionary learning algorithms from the literature. We highlight the performance of the proposed method to sparsely represent image data and for image denoising applications.
 [24] arXiv:2007.03807 (crosslist from cs.LG) [pdf, other]

Title: Towards a practical measure of interference for reinforcement learningComments: 18 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Catastrophic interference is common in many networkbased learning systems, and many proposals exist for mitigating it. But, before we overcome interference we must understand it better. In this work, we provide a definition of interference for control in reinforcement learning. We systematically evaluate our new measures, by assessing correlation with several measures of learning performance, including stability, sample efficiency, and online and offline control performance across a variety of learning architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures. In particular we show that target network frequency is a dominating factor for interference, and that updates on the last layer result in significantly higher interference than updates internal to the network. This new measure can be expensive to compute; we conclude with motivation for an efficient proxy measure and empirically demonstrate it is correlated with our definition of interference.
 [25] arXiv:2007.03812 (crosslist from cs.LG) [pdf, other]

Title: Robust MultiAgent MultiArmed BanditsSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
There has been recent interest in collaborative multiagent bandits, where groups of agents share recommendations to decrease peragent regret. However, these works assume that each agent always recommends their individual bestarm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include honest and malicious agents who recommend bestarm estimates and arbitrary arms, respectively. We show that even with a single malicious agent, existing collaborationbased algorithms fail to improve regret guarantees over a singleagent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with them, i.e., "blacklist" them. We show that collaboration indeed decreases regret for this algorithm, when the number of malicious agents is small compared to the number of arms, and crucially without assumptions on the malicious agents' behavior. Thus, our algorithm is robust against any malicious recommendation strategy.
 [26] arXiv:2007.03813 (crosslist from cs.LG) [pdf, other]

Title: Bypassing the Ambient Dimension: Private SGD with Gradient Subspace IdentificationSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Differentially private SGD (DPSGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM). Due to its noisy perturbation on each gradient update, the error rate of DPSGD scales with the ambient dimension $p$, the number of parameters in the model. Such dependence can be problematic for overparameterized models where $p \gg n$, the number of training samples. Existing lower bounds on private ERM show that such dependence on $p$ is inevitable in the worst case. In this paper, we circumvent the dependence on the ambient dimension by leveraging a lowdimensional structure of gradient space in deep networksthat is, the stochastic gradients for deep nets usually stay in a low dimensional subspace in the training process. We propose Projected DPSGD that performs noise reduction by projecting the noisy gradients to a lowdimensional subspace, which is given by the top gradient eigenspace on a small public dataset. We provide a general sample complexity analysis on the public dataset for the gradient subspace identification problem and demonstrate that under certain lowdimensional assumptions the public sample complexity only grows logarithmically in $p$. Finally, we provide a theoretical analysis and empirical evaluations to show that our method can substantially improve the accuracy of DPSGD.
 [27] arXiv:2007.03828 (crosslist from astroph.IM) [pdf, other]

Title: Deep Ensemble Analysis for Imaging Xray PolarimetryComments: 14 pages, 9 figures. Submitted to Nuclear Instruments and Methods in Physics Research Section A on 3rd July 2020Subjects: Instrumentation and Methods for Astrophysics (astroph.IM); High Energy Astrophysical Phenomena (astroph.HE); Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a method for enhancing the sensitivity of Xray telescopic observations with imaging polarimeters, with a focus on the gas pixel detectors (GPDs) to be flown on the Imaging Xray Polarimetry Explorer (IXPE). Our analysis measures photoelectron directions, Xray absorption points and Xray energies for 28keV event tracks, with estimates for both the statistical and systematic (reconstruction) uncertainties. We use a weighted maximum likelihood combination of predictions from a deep ensemble of ResNet convolutional neural networks, trained on Monte Carlo event simulations. We define a figure of merit to compare the polarization biasvariance tradeoff in track reconstruction algorithms. For powerlaw source spectra, our method improves on current stateoftheart (and previous deep learning approaches), providing ~45% increase in effective exposure times. For individual energies, our method produces 2030% absolute improvements in modulation factor for simulated 100% polarized events, while keeping residual systematic modulation within 1 sigma of the finite sample minimum. Absorption point location and photon energy estimates are also significantly improved. We have validated our method with sample data from real GPD detectors.
 [28] arXiv:2007.03832 (crosslist from cs.LG) [pdf, other]

Title: Fast Training of Deep Neural Networks Robust to Adversarial PerturbationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Deep neural networks are capable of training fast and generalizing well within many domains. Despite their promising performance, deep networks have shown sensitivities to perturbations of their inputs (e.g., adversarial examples) and their learned feature representations are often difficult to interpret, raising concerns about their true capability and trustworthiness. Recent work in adversarial training, a form of robust optimization in which the model is optimized against adversarial examples, demonstrates the ability to improve performance sensitivities to perturbations and yield feature representations that are more interpretable. Adversarial training, however, comes with an increased computational cost over that of standard (i.e., nonrobust) training, rendering it impractical for use in largescale problems. Recent work suggests that a fast approximation to adversarial training shows promise for reducing training time and maintaining robustness in the presence of perturbations bounded by the infinity norm. In this work, we demonstrate that this approach extends to the Euclidean norm and preserves the humanaligned feature representations that are common for robust models. Additionally, we show that using a distributed training scheme can further reduce the time to train robust deep networks. Fast adversarial training is a promising approach that will provide increased security and explainability in machine learning applications for which robust optimization was previously thought to be impractical.
 [29] arXiv:2007.03844 (crosslist from cs.LG) [pdf, other]

Title: Consistency Regularization with Generative Adversarial Networks for SemiSupervised Image ClassificationComments: 10 pages, 5 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Generative Adversarial Networks (GANs) based semisupervised learning (SSL) approaches are shown to improve classification performance by utilizing a large number of unlabeled samples in conjunction with limited labeled samples. However, their performance still lags behind the stateoftheart nonGAN based SSL approaches. One main reason we identify is the lack of consistency in class probability predictions on the same image under local perturbations. This problem was addressed in the past in a generic setting using the label consistency regularization, which enforces the class probability predictions for an input image to be unchanged under various semanticpreserving perturbations. In this work, we incorporate the consistency regularization in the vanilla semiGAN to address this critical limitation. In particular, we present a new composite consistency regularization method which, in spirit, combines two wellknown consistencybased techniques  Mean Teacher and Interpolation Consistency Training. We demonstrate the efficacy of our approach on two SSL image classification benchmark datasets, SVHN and CIFAR10. Our experiments show that this new composite consistency regularization based semiGAN significantly improves its performance and achieves new stateoftheart performance among GANbased SSL approaches.
 [30] arXiv:2007.03856 (crosslist from cs.LG) [pdf, other]

Title: BlockFLow: An Accountable and PrivacyPreserving Solution for Federated LearningSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Federated learning enables the development of a machine learning model among collaborating agents without requiring them to share their underlying data. However, malicious agents who train on random data, or worse, on datasets with the result classes inverted, can weaken the combined model. BlockFLow is an accountable federated learning system that is fully decentralized and privacypreserving. Its primary goal is to reward agents proportional to the quality of their contribution while protecting the privacy of the underlying datasets and being resilient to malicious adversaries. Specifically, BlockFLow incorporates differential privacy, introduces a novel auditing mechanism for model contribution, and uses Ethereum smart contracts to incentivize good behavior. Unlike existing auditing and accountability methods for federated learning systems, our system does not require a centralized test dataset, sharing of datasets between the agents, or one or more trusted auditors; it is fully decentralized and resilient up to a 50% collusion attack in a malicious trust model. When run on the public Ethereum blockchain, BlockFLow uses the results from the audit to reward parties with cryptocurrency based on the quality of their contribution. We evaluated BlockFLow on two datasets that offer classification tasks solvable via logistic regression models. Our results show that the resultant auditing scores reflect the quality of the honest agents' datasets. Moreover, the scores from dishonest agents are statistically lower than those from the honest agents. These results, along with the reasonable blockchain costs, demonstrate the effectiveness of BlockFLow as an accountable federated learning system.
 [31] arXiv:2007.03899 (crosslist from cs.LG) [pdf, other]

Title: Density Fixing: Simple yet Effective Regularization Method based on the Class PriorSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Machine learning models suffer from overfitting, which is caused by a lack of labeled data. To tackle this problem, we proposed a framework of regularization methods, called densityfixing, that can be used commonly for supervised and semisupervised learning. Our proposed regularization method improves the generalization performance by forcing the model to approximate the class's prior distribution or the frequency of occurrence. This regularization term is naturally derived from the formula of maximum likelihood estimation and is theoretically justified. We further investigated the asymptotic behavior of the proposed method and how the regularization terms behave when assuming a prior distribution of several classes in practice. Experimental results on multiple benchmark datasets are sufficient to support our argument, and we suggest that this simple and effective regularization method is useful in realworld machine learning problems.
 [32] arXiv:2007.03912 (crosslist from cs.LG) [pdf, other]

Title: Linear Tensor Projection Revealing NonlinearityComments: 13 pages, 6 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Dimensionality reduction is an effective method for learning highdimensional data, which can provide better understanding of decision boundaries in humanreadable lowdimensional subspace. Linear methods, such as principal component analysis and linear discriminant analysis, make it possible to capture the correlation between many variables; however, there is no guarantee that the correlations that are important in predicting data can be captured. Moreover, if the decision boundary has strong nonlinearity, the guarantee becomes increasingly difficult. This problem is exacerbated when the data are matrices or tensors that represent relationships between variables. We propose a learning method that searches for a subspace that maximizes the prediction accuracy while retaining as much of the original data information as possible, even if the prediction model in the subspace has strong nonlinearity. This makes it easier to interpret the mechanism of the group of variables behind the prediction problem that the user wants to know. We show the effectiveness of our method by applying it to various types of data including matrices and tensors.
 [33] arXiv:2007.03920 (crosslist from cs.LG) [pdf, other]

Title: Binary Stochastic Filtering: feature selection and beyondSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Feature selection is one of the most decisive tools in understanding data and machine learning models. Among other methods, sparsity induced by $L^{1}$ penalty is one of the simplest and best studied approaches to this problem. Although such regularization is frequently used in neural networks to achieve sparsity of weights or unit activations, it is unclear how it can be employed in the feature selection problem. This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used, namely, by stochastically penalizing feature involvement instead of the layer weights. The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead, and can be directly applied to any existing architecture. Furthermore, the method is easily generalizable for neuron pruning and selection of regions of importance for spectral data.
 [34] arXiv:2007.03937 (crosslist from cs.LG) [pdf, ps, other]

Title: A Nearest Neighbor Characterization of Lebesgue Points in Metric Measure SpacesSubjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
The property of almost every point being a Lebesgue point has proven to be crucial for the consistency of several classification algorithms based on nearest neighbors. We characterize Lebesgue points in terms of a 1Nearest Neighbor regression algorithm for pointwise estimation, fleshing out the role played by tiebreaking rules in the corresponding convergence problem. We then give an application of our results, proving the convergence of the risk of a large class of 1Nearest Neighbor classification algorithms in general metric spaces where almost every point is a Lebesgue point.
 [35] arXiv:2007.03938 (crosslist from cs.LG) [pdf, other]

Title: OperationAware Soft Channel Pruning using Differentiable MasksComments: ICML 2020Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose a simple but effective datadriven channel pruning algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations. The proposed approach makes a joint consideration of batch normalization (BN) and rectified linear unit (ReLU) for channel pruning; it estimates how likely the two successive operations deactivate each feature map and prunes the channels with high probabilities. To this end, we learn differentiable masks for individual channels and make soft decisions throughout the optimization procedure, which facilitates to explore larger search space and train more stable networks. The proposed framework enables us to identify compressed models via a joint learning of model parameters and channel pruning without an extra procedure of finetuning. We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks given the same amount of resources when compared with the stateoftheart methods.
 [36] arXiv:2007.03961 (crosslist from cs.LG) [pdf, other]

Title: Double Prioritized State Recycled Experience ReplayAuthors: Fanchen BuSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Experience replay enables online reinforcement learning agents to store and reuse the experiences generated in previous interaction with the environment. In the original method, the experiences are sampled and replayed to train the Qnetwork at the same possibility, i.e. uniformly. In prior work, a method called prioritized experience replay was developed where experiences in the memory are prioritized, so as to replay experiences which seem to be more important in higher frequencies for training the Qnetwork more efficiently. In this paper, we develop a method called doubleprioritized staterecycled (DPSR) experience replay, prioritizing the experience both for training stage and storing stage, as well as replacing the experiences in the memory with state recycling to make the best of experiences which seem to have low priorities temporarily. We use this method in Deep QNetworks (DQN), and achieve a stateoftheart result, outperforming the original method and prioritized experience replay on many Atari games.
 [37] arXiv:2007.03966 (crosslist from cs.LG) [pdf, other]

Title: SemiSupervised Learning with MetaGradientComments: 17 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this work, we propose a simple yet effective metalearning algorithm in thesemisupervised settings. We notice that existing consistencybased approachesmostly do not consider the essential role of the label information for consistencyregularization. To alleviate this issue, we bridge the relationship between theconsistency loss and label information by unfolding and differentiating throughone optimization step. Specifically, we exploit the pseudo labels of the unlabeledexamples which are guided by the metagradients of the labeled data loss so thatthe model can generalize well on the labeled examples. In addition, we introduce asimple firstorder approximation to avoid computing higherorder derivatives andguarantee scalability. Extensive evaluations on the SVHN, CIFAR, and ImageNetdatasets demonstrate that the proposed algorithm performs favorably against thestateoftheart methods.
 [38] arXiv:2007.03995 (crosslist from cs.LG) [pdf, other]

Title: MCUNet: A framework towards uncertainty representations for decision support system patient referrals in healthcare contextsAuthors: Nabeel SeedatComments: 4 pages, 4 figures, Accepted to ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning & Machine Learning for Global HealthSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Incorporating a humanintheloop system when deploying automated decision support is critical in healthcare contexts to create trust, as well as provide reliable performance on a patienttopatient basis. Deep learning methods while having high performance, do not allow for this patientcentered approach due to the lack of uncertainty representation. Thus, we present a framework of uncertainty representation evaluated for medical image segmentation, using MCUNet which combines a UNet with Monte Carlo Dropout, evaluated with four different uncertainty metrics. The framework augments this by adding a humanintheloop aspect based on an uncertainty threshold for automated referral of uncertain cases to a medical professional. We demonstrate that MCUNet combined with epistemic uncertainty and an uncertainty threshold tuned for this application maximizes automated performance on an individual patient level, yet refers truly uncertain cases. This is a step towards uncertainty representations when deploying machine learning based decision support in healthcare settings.
 [39] arXiv:2007.04001 (crosslist from cs.LG) [pdf, other]

Title: Supervised machine learning techniques for data matching based on similarity metricsAuthors: Pim Verschuuren, Serena Palazzo, Tom Powell, Steve Sutton, Alfred Pilgrim, Michele Faucci GiannelliSubjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)
Businesses, governmental bodies and NGO's have an everincreasing amount of data at their disposal from which they try to extract valuable information. Often, this needs to be done not only accurately but also within a short time frame. Clean and consistent data is therefore crucial. Data matching is the field that tries to identify instances in data that refer to the same realworld entity. In this study, machine learning techniques are combined with string similarity functions to the field of data matching. A dataset of invoices from a variety of businesses and organizations was preprocessed with a grouping scheme to reduce pair dimensionality and a set of similarity functions was used to quantify similarity between invoice pairs. The resulting invoice pair dataset was then used to train and validate a neural network and a boosted decision tree. The performance was compared with a solution from FISCAL Technologies as a benchmark against currently available deduplication solutions. Both the neural network and boosted decision tree showed equal to better performance.
 [40] arXiv:2007.04002 (crosslist from cs.LG) [pdf, other]

Title: Unbiased Liftbased Bidding SystemSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Conventional bidding strategies for online display ad auction heavily relies on observed performance indicators such as clicks or conversions. A bidding strategy naively pursuing these easily observable metrics, however, fails to optimize the profitability of the advertisers. Rather, the bidding strategy that leads to the maximum revenue is a strategy pursuing the performance \textit{lift} of showing ads to a specific user. Therefore, it is essential to predict the lifteffect of showing ads to each user on their target variables from observed log data. However, there is a difficulty in predicting the lifteffect, as the training data gathered by a past bidding strategy may have a strong bias towards the winning impressions. In this study, we develop \textit{Unbiased Liftbased Bidding System}, which maximizes the advertisers' profit by accurately predicting the lifteffect from biased log data. Our system is the first to enable highperforming liftbased bidding strategy by theoretically alleviating the inherent bias in the log. Realworld, largescale A/B testing successfully demonstrates the superiority and practicability of the proposed system.
 [41] arXiv:2007.04028 (crosslist from cs.LG) [pdf, other]

Title: How benign is benign overfitting?Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorly) trained models. When trained with SGD, deep neural networks essentially achieve zero training error, even in the presence of label noise, while also exhibiting good generalization on natural test data, something referred to as benign overfitting [2, 10]. However, these models are vulnerable to adversarial attacks. We identify label noise as one of the causes for adversarial vulnerability, and provide theoretical and empirical evidence in support of this. Surprisingly, we find several instances of label noise in datasets such as MNIST and CIFAR, and that robustly trained models incur training error on some of these, i.e. they don't fit the noise. However, removing noisy labels alone does not suffice to achieve adversarial robustness. Standard training procedures bias neural networks towards learning "simple" classification boundaries, which may be less robust than more complex ones. We observe that adversarial training does produce more complex decision boundaries. We conjecture that in part the need for complex decision boundaries arises from suboptimal representation learning. By means of simple toy examples, we show theoretically how the choice of representation can drastically affect adversarial robustness.
 [42] arXiv:2007.04030 (crosslist from cs.LG) [pdf, other]

Title: Incorporating prior knowledge about structural constraints in model identificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Model identification is a crucial problem in chemical industries. In recent years, there has been increasing interest in learning datadriven models utilizing partial knowledge about the system of interest. Most techniques for model identification do not provide the freedom to incorporate any partial information such as the structure of the model. In this article, we propose model identification techniques that could leverage such partial information to produce better estimates. Specifically, we propose Structural Principal Component Analysis (SPCA) which improvises over existing methods like PCA by utilizing the essential structural information about the model. Most of the existing methods or closely related methods use sparsity constraints which could be computationally expensive. Our proposed method is a wise modification of PCA to utilize structural information. The efficacy of the proposed approach is demonstrated using synthetic and industrial casestudies.
 [43] arXiv:2007.04043 (crosslist from cs.LG) [pdf, ps, other]

Title: A Onestep Approach to Covariate Shift AdaptationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. However, such an assumption is often violated in the real world due to nonstationarity of the environment or bias in sample selection. In this work, we consider a prevalent setting called covariate shift, where the input distribution differs between the training and test stages while the conditional distribution of the output given the input remains unchanged. Most of the existing methods for covariate shift adaptation are twostep approaches, which first calculate the importance weights and then conduct importanceweighted empirical risk minimization. In this paper, we propose a novel onestep approach that jointly learns the predictive model and the associated weights in one optimization by minimizing an upper bound of the test risk. We theoretically analyze the proposed method and provide a generalization error bound. We also empirically demonstrate the effectiveness of the proposed method.
 [44] arXiv:2007.04068 (crosslist from cs.CY) [pdf, ps, other]

Title: Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial IntelligenceComments: 28 Pages. Accepted, to appear in: Philosophy and Technology (405), Springer. Submitted 16 January, Accepted 26 May 2020Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper explores the important role of critical science, and in particular of postcolonial and decolonial theories, in understanding and shaping the ongoing advances in artificial intelligence. Artificial Intelligence (AI) is viewed as amongst the technological advances that will reshape modern societies and their relations. Whilst the design and deployment of systems that continually adapt holds the promise of farreaching positive change, they simultaneously pose significant risks, especially to already vulnerable peoples. Values and power are central to this discussion. Decolonial theories use historical hindsight to explain patterns of power that shape our intellectual, political, economic, and social world. By embedding a decolonial critical approach within its technical practice, AI communities can develop foresight and tactics that can better align research and technology development with established ethical principles, centring vulnerable peoples who continue to bear the brunt of negative impacts of innovation and scientific progress. We highlight problematic applications that are instances of coloniality, and using a decolonial lens, submit three tactics that can form a decolonial field of artificial intelligence: creating a critical technical practice of AI, seeking reverse tutelage and reverse pedagogies, and the renewal of affective and political communities. The years ahead will usher in a wave of new scientific breakthroughs and technologies driven by AI research, making it incumbent upon AI communities to strengthen the social contract through ethical foresight and the multiplicity of intellectual perspectives available to us; ultimately supporting future technologies that enable greater wellbeing, with the goal of beneficence and justice for all.
 [45] arXiv:2007.04074 (crosslist from cs.LG) [pdf, other]

Title: AutoSklearn 2.0: The Next GenerationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Automated Machine Learning, which supports practitioners and researchers with the tedious task of manually designing machine learning pipelines, has recently achieved substantial success. In this paper we introduce new Automated Machine Learning (AutoML) techniques motivated by our winning submission to the second ChaLearn AutoML challenge, PoSH Autosklearn. For this, we extend Autosklearn with a new, simpler metalearning technique, improve its way of handling iterative algorithms and enhance it with a successful bandit strategy for budget allocation. Furthermore, we go one step further and study the design space of AutoML itself and propose a solution towards truly handfree AutoML. Together, these changes give rise to the next generation of our AutoML system, Autosklearn (2.0). We verify the improvement by these additions in a large experimental study on 39 AutoML benchmark datasets and conclude the paper by comparing to Autosklearn (1.0), reducing the regret by up to a factor of five.
 [46] arXiv:2007.04087 (crosslist from cs.LG) [pdf, other]

Title: Hyperparameter Optimization in Neural Networks via Structured Sparse RecoveryComments: arXiv admin note: text overlap with arXiv:1906.02869Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we study two important problems in the automated design of neural networks  Hyperparameter Optimization (HPO), and Neural Architecture Search (NAS)  through the lens of sparse recovery methods. In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery. In particular, we show that a special encoding of the hyperparameter space enables a natural groupsparse recovery formulation, which when coupled with HyperBand (a multiarmed bandit strategy), leads to improvement over existing hyperparameter optimization methods. Experimental results on image datasets such as CIFAR10 confirm the benefits of our approach. In the second part of this paper, we establish a connection between NAS and structured sparse recovery. Building upon ``oneshot'' approaches in NAS, we propose a novel algorithm that we call CoNAS by merging ideas from oneshot approaches with a techniques for learning lowdegree sparse Boolean polynomials. We provide theoretical analysis on the number of validation error measurements. Finally, we validate our approach on several datasets and discover novel architectures hitherto unreported, achieving competitive (or better) results in both performance and search time compared to the existing NAS approaches.
 [47] arXiv:2007.04091 (crosslist from cs.LG) [pdf, other]

Title: Bespoke vs. PrêtàPorter Lottery Tickets: Exploiting Mask Similarity for Trainable SubNetwork FindingComments: arXiv admin note: text overlap with arXiv:2001.05050Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The observation of sparse trainable subnetworks within overparametrized networks  also known as Lottery Tickets (LTs)  has prompted inquiries around their trainability, scaling, uniqueness, and generalization properties. Across 28 combinations of image classification tasks and architectures, we discover differences in the connectivity structure of LTs found through different iterative pruning techniques, thus disproving their uniqueness and connecting emergent mask structure to the choice of pruning. In addition, we propose a consensusbased method for generating refined lottery tickets. This lottery ticket denoising procedure, based on the principle that parameters that always go unpruned across different tasks more reliably identify important subnetworks, is capable of selecting a meaningful portion of the architecture in an embarrassingly parallel way, while quickly discarding extra parameters without the need for further pruning iterations. We successfully train these subnetworks to performance comparable to that of ordinary lottery tickets.
 [48] arXiv:2007.04146 (crosslist from cs.LG) [pdf, other]

Title: FewShot OneClass Classification via MetaLearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Although fewshot learning and oneclass classification (OCC), i.e. learning a binary classifier with data from only one class, have been separately well studied, their intersection remains rather unexplored. Our work addresses the fewshot OCC problem and presents a method to modify the episodic data sampling strategy of the modelagnostic metalearning (MAML) algorithm to learn a model initialization particularly suited for learning fewshot OCC tasks. This is done by explicitly optimizing for an initialization which only requires few gradient steps with oneclass minibatches to yield a performance increase on classbalanced test data. We provide a theoretical analysis that explains why our approach works in the fewshot OCC scenario, while other metalearning algorithms, including MAML, fail. Our experiments on eight datasets from the image and timeseries domains show that our method leads to higher results than classical OCC and fewshot classification approaches, and demonstrate the ability to learn unseen tasks from only few normal class samples. Moreover, we successfully train anomaly detectors for a realworld application on sensor readings recorded during industrial manufacturing of workpieces with a CNC milling machine using few normal examples. Finally, we empirically demonstrate that the proposed data sampling technique increases the performance of more recent metalearning algorithms in fewshot OCC.
 [49] arXiv:2007.04154 (crosslist from qfin.MF) [pdf, other]

Title: Robust pricing and hedging via neural SDEsSubjects: Mathematical Finance (qfin.MF); Machine Learning (cs.LG); Machine Learning (stat.ML)
Mathematical modelling is ubiquitous in the financial industry and drives key decision processes. Any given model provides only a crude approximation to reality and the risk of using an inadequate model is hard to detect and quantify. By contrast, modern data science techniques are opening the door to more robust and datadriven model selection mechanisms. However, most machine learning models are "blackboxes" as individual parameters do not have meaningful interpretation. The aim of this paper is to combine the above approaches achieving the best of both worlds. Combining neural networks with risk models based on classical stochastic differential equations (SDEs), we find robust bounds for prices of derivatives and the corresponding hedging strategies while incorporating relevant market data. The resulting model called neural SDE is an instantiation of generative models and is closely linked with the theory of causal optimal transport. Neural SDEs allow consistent calibration under both the riskneutral and the realworld measures. Thus the model can be used to simulate market scenarios needed for assessing risk profiles and hedging strategies. We develop and analyse novel algorithms needed for efficient use of neural SDEs. We validate our approach with numerical experiments using both local and stochastic volatility models.
 [50] arXiv:2007.04169 (crosslist from cs.LG) [pdf, other]

Title: An exploration of the influence of path choice in gametheoretic attribution algorithmsComments: 21 pages, 23 figures, submitted to JMLR 7/7/2020Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We compare machine learning explainability methods based on the theory of atomic (Shapley, 1953) and infinitesimal (Aumann and Shapley, 1974) games, in a theoretical and experimental investigation into how the model and choice of integration path can influence the resulting feature attributions. To gain insight into differences in attributions resulting from interventional Shapley values (Sundararajan and Najmi, 2019; Janzing et al., 2019; Chen et al., 2019) and Generalized Integrated Gradients (GIG) (Merrill et al., 2019) we note interventional Shapley is equivalent to a multipath integration along $n!$ paths where $n$ is the number of model input features. Applying Stoke's theorem we show that the path symmetry of these two methods results in the same attributions when the model is composed of a sum of separable functions of individual features and a sum of twofeature products. We then perform a series of experiments with varying degrees of data missingness to demonstrate how interventional Shapley's multipath approach can yield less consistent attributions than the single straightline path of AumannShapley. We argue this is because the multiple paths employed by interventional Shaply extend away from the training data manifold and are therefore more likely to pass through regions where the model has little support. In the absence of a more meaningful path choice, we therefore advocate the straightline path since it will almost always pass closer to the data manifold. Among straightline path attribution algorithms, GIG is uniquely robust since it will still yield Shapley values for atomic games modeled by decision trees.
 [51] arXiv:2007.04176 (crosslist from astroph.IM) [pdf, other]

Title: Detection of Gravitational Waves Using Bayesian Neural NetworksComments: 15 pages, 13 figuresSubjects: Instrumentation and Methods for Astrophysics (astroph.IM); Cosmology and Nongalactic Astrophysics (astroph.CO); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
We propose a new model of Bayesian Neural Networks to not only detect the events of compact binary coalescence in the observational data of gravitational waves (GW) but also identify the time periods of the associated GW waveforms before the events. This is achieved by incorporating the Bayesian approach into the CLDNN classifier, which integrates together the Convolutional Neural Network (CNN) and the Long ShortTerm Memory Recurrent Neural Network (LSTM). Our model successfully detect all seven BBH events in the LIGO Livingston O2 data, with the periods of their GW waveforms correctly labeled. The ability of a Bayesian approach for uncertainty estimation enables a newly defined `awareness' state for recognizing the possible presence of signals of unknown types, which is otherwise rejected in a nonBayesian model. Such data chunks labeled with the awareness state can then be further investigated rather than overlooked. Performance tests show that our model recognizes 90% of the events when the optimal signaltonoise ratio $\rho_\text{opt} >7$ (100% when $\rho_\text{opt} >8.5$) and successfully labels more than 95% of the waveform periods when $\rho_\text{opt} >8$. The latency between the arrival of peak signal and generating an alert with the associated waveform period labeled is only about 20 seconds for an unoptimized code on a moderate GPUequipped personal computer. This makes our model possible for nearly realtime detection and for forecasting the coalescence events when assisted with deeper training on a larger dataset using the stateofart HPCs.
 [52] arXiv:2007.04202 (crosslist from cs.LG) [pdf, other]

Title: Stochastic Hamiltonian Gradient Methods for Smooth GamesAuthors: Nicolas Loizou, Hugo Berard, Alexia JolicoeurMartineau, Pascal Vincent, Simon LacosteJulien, Ioannis MitliagkasComments: ICML 2020  Proceedings of the 37th International Conference on Machine LearningSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing stepsize and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global nonasymptotic lastiterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some nonconvex nonconcave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.
 [53] arXiv:2007.04203 (crosslist from cs.LG) [pdf, other]

Title: A Natural ActorCritic Algorithm with Downside Risk ConstraintsComments: 14 pages, 5 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Finance (qfin.CP); Portfolio Management (qfin.PM); Machine Learning (stat.ML)
Existing work on risksensitive reinforcement learning  both for symmetric and downside risk measures  has typically used direct MonteCarlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporaldifference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its nonlinearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sampleefficient, online estimation of partial moments. For risksensitive control, we instantiate Reward Constrained Policy Optimization, a recent actorcritic method for finding constrained policies, with our proxy for the lower partial moment. We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risksensitive reinforcement learning.
 [54] arXiv:2007.04205 (crosslist from cs.CL) [pdf, other]

Title: Analysis of Predictive Coding Models for Phonemic Representation Learning in Small DatasetsComments: 7 pages, 5 figures, 5 tables. Accepted paper at the workshop on Selfsupervision in Audio and Speech at ICML 2020Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)
Neural network models using predictive coding are interesting from the viewpoint of computational modelling of human language acquisition, where the objective is to understand how linguistic units could be learned from speech without any labels. Even though several promising predictive coding based learning algorithms have been proposed in the literature, it is currently unclear how well they generalise to different languages and training dataset sizes. In addition, despite that such models have shown to be effective phonemic feature learners, it is unclear whether minimisation of the predictive loss functions of these models also leads to optimal phonemelike representations. The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task (ABX task) for two languages with different dataset sizes. Our experiments show a strong correlation between the autoregressive loss and the phoneme discrimination scores with the two datasets. However, to our surprise, the CPC model shows rapid convergence already after one pass over the training data, and, on average, its representations outperform those of APC on both languages.
 [55] arXiv:2007.04206 (crosslist from cs.LG) [pdf, ps, other]

Title: Diverse Ensembles Improve CalibrationComments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Modern deep neural networks can produce badly calibrated predictions, especially when train and test distributions are mismatched. Training an ensemble of models and averaging their predictions can help alleviate these issues. We propose a simple technique to improve calibration, using a different data augmentation for each ensemble member. We additionally use the idea of `mixing' unaugmented and augmented inputs to improve calibration when test and training distributions are the same. These simple techniques improve calibration and accuracy over strong baselines on the CIFAR10 and CIFAR100 benchmarks, and outofdomain data from their corrupted versions.
 [56] arXiv:2007.04212 (crosslist from cs.LG) [pdf, other]

Title: The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical ReasoningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Machine Learning (stat.ML)
In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM). To discover compositional structures of the data, we propose the Scattering Compositional Learner (SCL), an architecture that composes neural networks in a sequence. Our SCL achieves stateoftheart performance on two RPM datasets, with a 48.7% relative improvement on BalancedRAVEN and 26.4% on PGM over the previous stateoftheart. We additionally show that our model discovers compositional representations of objects' attributes (e.g., shape color, size), and their relationships (e.g., progression, union). We also find that the compositional representation makes the SCL significantly more robust to testtime domain shifts and greatly improves zeroshot generalization to previously unseen analogies.
 [57] arXiv:2007.04214 (crosslist from cs.LG) [pdf, ps, other]

Title: LinearTime Algorithms for Adaptive Submodular MaximizationAuthors: Shaojie TangSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we develop fast algorithms for two stochastic submodular maximization problems. We start with the wellstudied adaptive submodular maximization problem subject to a cardinality constraint. We develop the first lineartime algorithm which achieves a $(11/e\epsilon)$ approximation ratio. Notably, the time complexity of our algorithm is $O(n\log\frac{1}{\epsilon})$ (number of function evaluations) which is independent of the cardinality constraint, where $n$ is the size of the ground set. Then we introduce the concept of fully adaptive submodularity, and develop a lineartime algorithm for maximizing a fully adaptive submoudular function subject to a partition matroid constraint. We show that our algorithm achieves a $\frac{11/e\epsilon}{42/e2\epsilon}$ approximation ratio using only $O(n\log\frac{1}{\epsilon})$ number of function evaluations.
 [58] arXiv:2007.04216 (crosslist from cs.LG) [pdf, other]

Title: RicciNets: Curvatureguided Pruning of Highperformance Neural Networks Using Ricci FlowComments: To appear at ICML 2020, AutoML Workshop. Contains 11 pages, 5 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
A novel method to identify salient computational paths within randomly wired neural networks before training is proposed. The computational graph is pruned based on a node mass probability function defined by local graph measures and weighted by hyperparameters produced by a reinforcement learningbased controller neural network. We use the definition of Ricci curvature to remove edges of low importance before mapping the computational graph to a neural network. We show a reduction of almost $35\%$ in the number of floatingpoint operations (FLOPs) per pass, with no degradation in performance. Further, our method can successfully regularize randomly wired neural networks based on purely structural properties, and also find that the favourable characteristics identified in one network generalise to other networks. The method produces networks with better performance under similar compression to those pruned by lowestmagnitude weights. To our best knowledge, this is the first work on pruning randomly wired neural networks, as well as the first to utilize the topological measure of Ricci curvature in the pruning mechanism.
 [59] arXiv:2007.04234 (crosslist from cs.CV) [pdf, other]

Title: Transfer Learning or Selfsupervised Learning? A Tale of Two Pretraining ParadigmsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Pretraining has become a standard technique in computer vision and natural language processing, which usually helps to improve performance substantially. Previously, the most dominant pretraining method is transfer learning (TL), which uses labeled data to learn a good representation network. Recently, a new pretraining approach  selfsupervised learning (SSL)  has demonstrated promising results on a wide range of applications. SSL does not require annotated labels. It is purely conducted on input data by solving auxiliary tasks defined on the input data examples. The current reported results show that in certain applications, SSL outperforms TL and the other way around in other applications. There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other. Without an informed guideline, ML researchers have to try both methods to find out which one is better empirically. It is usually timeconsuming to do so. In this work, we aim to address this problem. We perform a comprehensive comparative study between SSL and TL regarding which one works better under different properties of data and tasks, including domain difference between source and target tasks, the amount of pretraining data, class imbalance in source data, and usage of target data for additional pretraining, etc. The insights distilled from our comparative studies can help ML researchers decide which method to use based on the properties of their applications.
 [60] arXiv:2007.04238 (crosslist from cs.LG) [pdf, other]

Title: Predicting the Accuracy of a FewShot ClassifierSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
In the context of fewshot learning, one cannot measure the generalization ability of a trained classifier using validation sets, due to the small number of labeled samples. In this paper, we are interested in finding alternatives to answer the question: is my classifier generalizing well to previously unseen data? We first analyze the reasons for the variability of generalization performances. We then investigate the case of using transferbased solutions, and consider three settings: i) supervised where we only have access to a few labeled samples, ii) semisupervised where we have access to both a few labeled samples and a set of unlabeled samples and iii) unsupervised where we only have access to unlabeled samples. For each setting, we propose reasonable measures that we empirically demonstrate to be correlated with the generalization ability of considered classifiers. We also show that these simple measures can be used to predict generalization up to a certain confidence. We conduct our experiments on standard fewshot vision datasets.
 [61] arXiv:2007.04239 (crosslist from cs.CL) [pdf, other]

Title: A Survey on Transfer Learning in Natural Language ProcessingSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Deep learning models usually require a huge amount of data. However, these large datasets are not always attainable. This is common in many challenging NLP tasks. Consider Neural Machine Translation, for instance, where curating such large datasets may not be possible specially for low resource languages. Another limitation of deep learning models is the demand for huge computing resources. These obstacles motivate research to question the possibility of knowledge transfer using large trained models. The demand for transfer learning is increasing as many large models are emerging. In this survey, we feature the recent transfer learning advances in the field of NLP. We also provide a taxonomy for categorizing different transfer learning approaches from the literature.
 [62] arXiv:2007.04249 (crosslist from cs.CL) [pdf, other]

Title: Cooking Is All About People: Comment Classification On Cookery Channels Using BERT and Classification Models (MalayalamEnglish MixCode)Authors: Subramaniam Kazhuparambil (1), Abhishek Kaushik (1 and 2) ((1) Dublin Business School, (2) Dublin City University)Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
The scope of a lucrative career promoted by Google through its video distribution platform YouTube has attracted a large number of users to become content creators. An important aspect of this line of work is the feedback received in the form of comments which show how well the content is being received by the audience. However, volume of comments coupled with spam and limited tools for comment classification makes it virtually impossible for a creator to go through each and every comment and gather constructive feedback. Automatic classification of comments is a challenge even for established classification models, since comments are often of variable lengths riddled with slang, symbols and abbreviations. This is a greater challenge where comments are multilingual as the messages are often rife with the respective vernacular. In this work, we have evaluated topperforming classification models and four different vectorizers, for classifying comments which are a mix of different combinations of English and Malayalam (only English, only Malayalam and Mix of English and Malayalam). The statistical analysis of results indicates that Multinomial Naive Bayes, KNearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest and Decision Trees offer similar level of accuracy in comment classification. Further, we have also evaluated 3 multilingual subtypes of the novel NLP language model, BERT and compared its performance to the conventional machine learning classification techniques. XLM was the topperforming BERT model with an accuracy of 67.31. Random Forest with Term Frequency Vectorizer was the best the topperforming model out of all the traditional classification models with an accuracy of 63.59.
 [63] arXiv:2007.04250 (crosslist from cs.LG) [pdf, other]

Title: A Benchmark of Medical Out of Distribution DetectionComments: Oral presentation, Uncertainty & Robustness in Deep Learning workshop at ICML. 4 pages, 9 pages totalSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
There is a rise in the use of deep learning for automated medical diagnosis, most notably in medical imaging. Such an automated system uses a set of images from a patient to diagnose whether they have a disease. However, systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be filtered out by an OutofDistribution Detection (OoDD) method prior to diagnosis. This paper benchmarks popular OoDD methods in three domains of medical imaging: chest xrays, fundus images, and histology slides. Our experiments show that despite methods yielding good results on some types of outofdistribution samples, they fail to recognize images close to the training distribution.
 [64] arXiv:2007.04275 (crosslist from cs.LG) [pdf, other]

Title: Graph Neural Networks for the Prediction of SubstrateSpecific Organic Reaction ConditionsAuthors: Serim Ryou, Michael R. Maser, Alexander Y. Cui, Travis J. DeLano, Yisong Yue, Sarah E. ReismanComments: 23 pages, 10 tables, 13 figures, to appear in the ICML 2020 Workshop on Graph Representation Learning and Beyond (GRLB)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions. To do so, we prepared a dataset collection of four ubiquitous reactions from the organic chemistry literature. We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions. We find that models are able to identify specific graph features that affect reaction conditions and lead to accurate predictions. The results herein show great promise in advancing molecular machine learning.
 [65] arXiv:2007.04285 (crosslist from stat.ME) [pdf, ps, other]

Title: Deep Fiducial InferenceComments: 20 pages, 7 figuresSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
Since the mid2000s, there has been a resurrection of interest in modern modifications of fiducial inference. To date, the main computational tool to extract a generalized fiducial distribution is Markov chain Monte Carlo (MCMC). We propose an alternative way of computing a generalized fiducial distribution that could be used in complex situations. In particular, to overcome the difficulty when the unnormalized fiducial density (needed for MCMC), we design a fiducial autoencoder (FAE). The fitted autoencoder is used to generate generalized fiducial samples of the unknown parameters. To increase accuracy, we then apply an approximate fiducial computation (AFC) algorithm, by rejecting samples that when plugged into a decoder do not replicate the observed data well enough. Our numerical experiments show the effectiveness of our FAEbased inverse solution and the excellent coverage performance of the AFC corrected FAE solution.
 [66] arXiv:2007.04309 (crosslist from cs.LG) [pdf, other]

Title: SelfSupervised Policy Adaptation during DeploymentSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. Our work explores the use of selfsupervision to allow the policy to continue training after deployment without using any rewards. While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements. Empirical evaluations are performed on diverse environments from DeepMind Control suite and ViZDoom. Our method improves generalization in 25 out of 30 environments across various tasks, and outperforms domain randomization on a majority of environments.
Replacements for Thu, 9 Jul 20
 [67] arXiv:1805.11845 (replaced) [pdf, other]

Title: An InformationTheoretic Analysis for Thompson Sampling with Many ActionsSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
 [68] arXiv:2002.01600 (replaced) [pdf, other]

Title: Linearly Constrained Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computational Physics (physics.compph)
 [69] arXiv:2002.11537 (replaced) [pdf, other]

Title: ICEBeeM: Identifiable Conditional EnergyBased Deep Models Based on Nonlinear ICASubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [70] arXiv:2004.11734 (replaced) [pdf, ps, other]

Title: Robust subgaussian estimation with VCdimensionAuthors: Jules DepersinSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [71] arXiv:2006.04295 (replaced) [pdf, other]

Title: Efficient MCMC Sampling for Bayesian Matrix Factorization by Breaking Posterior SymmetriesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [72] arXiv:2006.10921 (replaced) [pdf, other]

Title: Meta Learning in the Continuous Time LimitComments: 25 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [73] arXiv:2007.00810 (replaced) [pdf, other]

Title: On Linear Identifiability of Learned RepresentationsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [74] arXiv:2007.02794 (replaced) [pdf]

Title: Towards Efficient Connected and Automated Driving System via Multiagent Graph Reinforcement LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [75] arXiv:2007.03502 (replaced) [pdf, ps, other]

Title: srMOBO3GP: A sequential regularized multiobjective constrained Bayesian optimization for design applicationsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
 [76] arXiv:1810.04472 (replaced) [src]

Title: Domain Confusion with Self Ensembling for Unsupervised AdaptationComments: The expression is ambiguous, which is not convenient for readers to understand, and in today's view, the conclusion of the paper is of little significance, so it is no longer openSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [77] arXiv:1811.02141 (replaced) [pdf, other]

Title: Extended Isolation ForestComments: 12 pages; 21 figures, Published. Open source code in this https URLSubjects: Machine Learning (cs.LG); Instrumentation and Methods for Astrophysics (astroph.IM); Machine Learning (stat.ML)
 [78] arXiv:1812.11954 (replaced) [pdf, other]

Title: Exact Cluster Recovery via Classical Multidimensional ScalingComments: 42 pages in cluding appendixSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
 [79] arXiv:1901.11141 (replaced) [pdf, other]

Title: On the Consistency of Topk Surrogate LossesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [80] arXiv:1905.04497 (replaced) [pdf, other]

Title: Stability Properties of Graph Neural NetworksComments: Submitted to IEEE Transactions on Signal ProcessingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [81] arXiv:1906.01687 (replaced) [pdf, other]

Title: Stochastic Gradients for LargeScale Tensor DecompositionSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [82] arXiv:1907.01343 (replaced) [pdf, other]

Title: LowRank Subspace Override for Unsupervised Domain AdaptationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [83] arXiv:1907.01739 (replaced) [pdf, other]

Title: Solving Partial Assignment Problems using Random Clique ComplexesComments: Accepted as a long talk at ICML 2018Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [84] arXiv:1908.00709 (replaced) [pdf, other]

Title: AutoML: A Survey of the StateoftheArtComments: automated machine learning (AutoML), Submitted to Knowledge Based Systems for reviewSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [85] arXiv:1909.02261 (replaced) [pdf, other]

Title: Populationbased Gradient Descent Weight Learning for Graph Coloring ProblemsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [86] arXiv:1909.13188 (replaced) [pdf, other]

Title: Understanding and Stabilizing GANs' Training Dynamics with Control TheorySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [87] arXiv:1910.08828 (replaced) [pdf, ps, other]

Title: Dictionary Learning with Almost Sure Error ConstraintsSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [88] arXiv:1912.04981 (replaced) [pdf, other]

Title: Phase Retrieval Using Conditional Generative Adversarial NetworksComments: Accepted at the 25th International Conference on Pattern Recognition 2020 (ICPR)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [89] arXiv:2002.02730 (replaced) [pdf, other]

Title: Machine Unlearning: Linear Filtration for Logitbased ClassifiersSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [90] arXiv:2002.03704 (replaced) [pdf, other]

Title: Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior ApproximationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [91] arXiv:2002.04131 (replaced) [pdf, other]

Title: QLearning Algorithm for MeanField Controls, with Convergence and Complexity AnalysisSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [92] arXiv:2002.08423 (replaced) [pdf, other]

Title: PrivacyFL: A simulator for privacypreserving and secure federated learningSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
 [93] arXiv:2002.09089 (replaced) [pdf, other]

Title: Safe Imitation Learning via Fast Bayesian Reward Inference from PreferencesComments: In proceedings ICML 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [94] arXiv:2002.09928 (replaced) [pdf, other]

Title: Predictive Sampling with Forecasting Autoregressive ModelsComments: Accepted at the 37th International Conference on Machine Learning (ICML 2020). 14 pages, 13 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [95] arXiv:2002.10301 (replaced) [pdf, other]

Title: Qlearning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast LearningComments: 33 pages, 4 figuresSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [96] arXiv:2003.01794 (replaced) [pdf, other]

Title: Good Subnetworks Provably Exist: Pruning via Greedy Forward SelectionComments: ICML 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [97] arXiv:2003.01820 (replaced) [pdf, other]

Title: Robust Market Making via Adversarial Reinforcement LearningComments: 7 pages, 3 figures; IJCAIPRICAI '20 Conference ProceedingsSubjects: Trading and Market Microstructure (qfin.TR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [98] arXiv:2003.07521 (replaced) [pdf, other]

Title: EnergyBased Processes for Exchangeable DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [99] arXiv:2004.03329 (replaced) [pdf, other]

Title: MedDialog: Two Largescale Medical Dialogue DatasetsAuthors: Xuehai He, Shu Chen, Zeqian Ju, Xiangyu Dong, Hongchao Fang, Sicheng Wang, Yue Yang, Jiaqi Zeng, Ruisi Zhang, Ruoyu Zhang, Meng Zhou, Penghui Zhu, Pengtao XieSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [100] arXiv:2004.09691 (replaced) [pdf, other]

Title: A Data and Compute Efficient Design for LimitedResources Deep LearningComments: Accepted for poster presentation at the Practical Machine Learning for Developing Countries (PML4DC) workshop, ICLR 2020Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
 [101] arXiv:2005.11418 (replaced) [pdf, other]

Title: FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to NonIID DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [102] arXiv:2005.13625 (replaced) [pdf, other]

Title: Parameter Sharing is Surprisingly Useful for MultiAgent Deep Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
 [103] arXiv:2006.06889 (replaced) [pdf, ps, other]

Title: Fast Objective and Duality Gap Convergence for Nonconvex Stronglyconcave Minmax ProblemsComments: Zhishuai Guo, Zhuoning Yuan and Yan Yan contributed equally to this workSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [104] arXiv:2006.08386 (replaced) [pdf, other]

Title: COALA: CoAligned Autoencoders for Learning Semantically Enriched Audio RepresentationsComments: 8 pages, 1 figure, workshop on Selfsupervision in Audio and Speech at the 37th International Conference on Machine Learning (ICML), 2020, Vienna, AustriaSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
 [105] arXiv:2006.09437 (replaced) [pdf, other]

Title: A Study of Compositional Generalization in Neural ModelsAuthors: Tim Klinger, Dhaval Adjodah, Vincent Marois, Josh Joseph, Matthew Riemer, Alex 'Sandy' Pentland, Murray CampbellComments: 28 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [106] arXiv:2006.15396 (replaced) [pdf, other]

Title: Approximating Posterior Predictive Distributions by Averaging Output From Many Particle FiltersAuthors: Taylor R. BrownSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
 [107] arXiv:2006.16811 (replaced) [pdf, other]

Title: Path Integral Based Convolution and Pooling for Graph Neural NetworksComments: 15 pages, 4 figures, 6 tables. arXiv admin note: text overlap with arXiv:1904.10996Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); Networking and Internet Architecture (cs.NI); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
 [108] arXiv:2007.01888 (replaced) [pdf, other]

Title: Inference on the change point in high dimensional time series models via plug in least squareSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
 [109] arXiv:2007.02126 (replaced) [pdf, other]

Title: Deep Graph Random Process for RelationalThinkingBased Speech RecognitionComments: Accepted at ICML 2020Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
 [110] arXiv:2007.02334 (replaced) [pdf, other]

Title: MultiManifold Learning for Largescale Targeted Advertising SystemComments: Accepted at AdKDD 2020Journalref: AdKDD 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [111] arXiv:2007.02500 (replaced) [pdf, other]

Title: Deep Learning for Anomaly Detection: A ReviewComments: Survey paper, 36 pages, 180 references, 2 figures, 3 tablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [112] arXiv:2007.02520 (replaced) [pdf, other]

Title: Explaining Fast Improvement in Online Policy OptimizationComments: 20 pages, 2 figures; typos correctedSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
 [113] arXiv:2007.02719 (replaced) [pdf, other]

Title: Splintering with distributions: A stochastic decoy scheme for private computationComments: 28 pages, 6 figuresSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [114] arXiv:2007.02777 (replaced) [pdf, other]

Title: Parametric machines: a fresh approach to architecture searchComments: 31 pages, 4 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [115] arXiv:2007.03117 (replaced) [pdf, ps, other]

Title: MultiFidelity Bayesian Optimization via Deep Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [116] arXiv:2007.03629 (replaced) [pdf, other]

Title: Strong Generalization and Efficiency in Neural ProgramsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2007, contact, help (Access key information)