We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 116 entries: 1-116 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 9 Jul 20

[1]  arXiv:2007.03814 [pdf, ps, other]
Title: A Variational Formula for Rényi Divergences
Comments: 11 pages, 2 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR)

We derive a new variational formula for the R\'enyi family of divergences, $R_\alpha(Q\|P)$, generalizing the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. The objective functional in this new variational representation is expressed in terms of expectations under $Q$ and $P$, and hence can be estimated using samples from the two distributions. We illustrate the utility of such a variational formula by constructing neural-network estimators for the R\'enyi divergences.

[2]  arXiv:2007.03898 [pdf, other]
Title: NVAE: A Deep Hierarchical Variational Autoencoder
Comments: Some images are downsized to meet arXiv requirements. Check this https URL for a high-resolution version (24 MB)
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256$\times$256 pixels.

[3]  arXiv:2007.04005 [pdf, other]
Title: Statistical post-processing of wind speed forecasts using convolutional neural networks
Comments: 44 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Applications (stat.AP)

Current statistical post-processing methods for probabilistic weather forecasting are not capable of using full spatial patterns from the numerical weather prediction (NWP) model. In this paper we incorporate spatial wind speed information by using convolutional neural networks (CNNs) and obtain probabilistic wind speed forecasts in the Netherlands for 48 hours ahead, based on KNMI's Harmonie-Arome NWP model. The CNNs are shown to have higher Brier skill scores for medium to higher wind speeds, as well as a better continuous ranked probability score (CRPS), than fully connected neural networks and quantile regression forests.

[4]  arXiv:2007.04006 [pdf, other]
Title: Accelerated Sparse Bayesian Learning via Screening Test and Its Applications
Comments: 15 pages, 23 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In high-dimensional settings, sparse structures are critical for efficiency in term of memory and computation complexity. For a linear system, to find the sparsest solution provided with an over-complete dictionary of features directly is typically NP-hard, and thus alternative approximate methods should be considered. In this paper, our choice for alternative method is sparse Bayesian learning, which, as empirical Bayesian approaches, uses a parameterized prior to encourage sparsity in solution, rather than the other methods with fixed priors such as LASSO. Screening test, however, aims at quickly identifying a subset of features whose coefficients are guaranteed to be zero in the optimal solution, and then can be safely removed from the complete dictionary to obtain a smaller, more easily solved problem. Next, we solve the smaller problem, after which the solution of the original problem can be recovered by padding the smaller solution with zeros. The performance of the proposed method will be examined on various data sets and applications.

[5]  arXiv:2007.04131 [pdf, other]
Title: Pitfalls to Avoid when Interpreting Machine Learning Models
Comments: This article was accepted at the ICML 2020 workshop XXAI: Extending Explainable AI Beyond Deep Models and Classifiers (see this http URL )
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Modern requirements for machine learning (ML) models include both high predictive performance and model interpretability. A growing number of techniques provide model interpretations, but can lead to wrong conclusions if applied incorrectly. We illustrate pitfalls of ML model interpretation such as bad model generalization, dependent features, feature interactions or unjustified causal interpretations. Our paper addresses ML practitioners by raising awareness of pitfalls and pointing out solutions for correct model interpretation, as well as ML researchers by discussing open issues for further research.

[6]  arXiv:2007.04287 [pdf, other]
Title: Learning from DPPs via Sampling: Beyond HKPV and symmetry
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral meta-algorithm due to Hough, Krishnapur, Peres and Vir\'ag (henceforth HKPV), which is in general time and resource intensive. For DPPs with symmetric kernels, scalable HKPV samplers have been proposed that either first downsample the ground set of items, or force the kernel to be low-rank, using e.g. Nystr\"om-type decompositions.
In the present work, we contribute a radically different approach than HKPV. Exploiting the fact that many statistical and learning objectives can be effectively accomplished by only sampling certain key observables of a DPP (so-called linear statistics), we invoke an expression for the Laplace transform of such an observable as a single determinant, which holds in complete generality. Combining traditional low-rank approximation techniques with Laplace inversion algorithms from numerical analysis, we show how to directly approximate the distribution function of a linear statistic of a DPP. This distribution function can then be used in hypothesis testing or to actually sample the linear statistic, as per requirement. Our approach is scalable and applies to very general DPPs, beyond traditional symmetric kernels.

Cross-lists for Thu, 9 Jul 20

[7]  arXiv:2007.03681 (cross-list from stat.ME) [pdf, other]
Title: Fast Bayesian Estimation of Spatial Count Data Models
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Spatial count data models are used to explain and predict the frequency of phenomena such as traffic accidents in geographically distinct entities such as census tracts or road segments. These models are typically estimated using Bayesian Markov chain Monte Carlo (MCMC) simulation methods, which, however, are computationally expensive and do not scale well to large datasets. Variational Bayes (VB), a method from machine learning, addresses the shortcomings of MCMC by casting Bayesian estimation as an optimisation problem instead of a simulation problem. In this paper, we derive a VB method for posterior inference in negative binomial models with unobserved parameter heterogeneity and spatial dependence. The proposed method uses Polya-Gamma augmentation to deal with the non-conjugacy of the negative binomial likelihood and an integrated non-factorised specification of the variational distribution to capture posterior dependencies. We demonstrate the benefits of the approach using simulated data and real data on youth pedestrian injury counts in the census tracts of New York City boroughs Bronx and Manhattan. The empirical analysis suggests that the VB approach is between 7 and 13 times faster than MCMC on a regular eight-core processor, while offering similar estimation and predictive accuracy. Conditional on the availability of computational resources, the embarrassingly parallel architecture of the proposed VB method can be exploited to further accelerate the estimation by up to 100 times.

[8]  arXiv:2007.03714 (cross-list from cs.LG) [pdf, other]
Title: Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)
Comments: 72 pages, 1 figure
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)

Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in \cite{Jacot2018Neural}. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in \cite{Huang2019Dynamics}. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width $m$ with respect to the number of training samples $n$ from quartic to cubic. Our analysis suggests strongly that the particular skip-connection structure of ResNet is the main reason for its triumph over fully-connected network.

[9]  arXiv:2007.03722 (cross-list from stat.AP) [pdf, other]
Title: Learning excursion sets of vector-valued Gaussian random fields for autonomous ocean sampling
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)

Improving and optimizing oceanographic sampling is a crucial task for marine science and maritime resource management. Faced with limited resources in understanding processes in the water-column, the combination of statistics and autonomous systems provide new opportunities for experimental design. In this work we develop efficient spatial sampling methods for characterizing regions defined by simultaneous exceedances above prescribed thresholds of several responses, with an application focus on mapping coastal ocean phenomena based on temperature and salinity measurements. Specifically, we define a design criterion based on uncertainty in the excursions of vector-valued Gaussian random fields, and derive tractable expressions for the expected integrated Bernoulli variance reduction in such a framework. We demonstrate how this criterion can be used to prioritize sampling efforts at locations that are ambiguous, making exploration more effective. We use simulations to study and compare properties of the considered approaches, followed by results from field deployments with an autonomous underwater vehicle as part of a study mapping the boundary of a river plume. The results demonstrate the potential of combining statistical methods and robotic platforms to effectively inform and execute data-driven environmental sampling.

[10]  arXiv:2007.03742 (cross-list from cs.LG) [pdf, other]
Title: Meta-active Learning in Probabilistically-Safe Optimization
Comments: 9 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Learning to control a safety-critical system with latent dynamics (e.g. for deep brain stimulation) requires taking calculated risks to gain information as efficiently as possible. To address this problem, we present a probabilistically-safe, meta-active learning approach to efficiently learn system dynamics and optimal configurations. We cast this problem as meta-learning an acquisition function, which is represented by a Long-Short Term Memory Network (LSTM) encoding sampling history. This acquisition function is meta-learned offline to learn high quality sampling strategies. We employ a mixed-integer linear program as our policy with the final, linearized layers of our LSTM acquisition function directly encoded into the objective to trade off expected information gain (e.g., improvement in the accuracy of the model of system dynamics) with the likelihood of safe control. We set a new state-of-the-art in active learning for control of a high-dimensional system with altered dynamics (i.e., a damaged aircraft), achieving a 46% increase in information gain and a 20% speedup in computation time over baselines. Furthermore, we demonstrate our system's ability to learn the optimal parameter settings for deep brain stimulation in a rat's brain while avoiding unwanted side effects (i.e., triggering seizures), outperforming prior state-of-the-art approaches with a 58% increase in information gain. Additionally, our algorithm achieves a 97% likelihood of terminating in a safe state while losing only 15% of information gain.

[11]  arXiv:2007.03744 (cross-list from eess.SP) [pdf, other]
Title: Predictive Analytics for Water Asset Management: Machine Learning and Survival Analysis
Comments: 19 pages, 7 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

Understanding performance and prioritizing resources for the maintenance of the drinking-water pipe network throughout its life-cycle is a key part of water asset management. Renovation of this vital network is generally hindered by the difficulty or impossibility to gain physical access to the pipes. We study a statistical and machine learning framework for the prediction of water pipe failures. We employ classical and modern classifiers for a short-term prediction and survival analysis to provide a broader perspective and long-term forecast, usually needed for the economic analysis of the renovation. To enrich these models, we introduce new predictors based on water distribution domain knowledge and employ a modern oversampling technique to remedy the high imbalance coming from the few failures observed each year. For our case study, we use a dataset containing the failure records of all pipes within the water distribution network in Barcelona, Spain. The results shed light on the effect of important risk factors, such as pipe geometry, age, material, and soil cover, among others, and can help utility managers conduct more informed predictive maintenance tasks.

[12]  arXiv:2007.03746 (cross-list from eess.SP) [pdf, ps, other]
Title: Transfer Learning for Brain-Computer Interfaces: A Complete Pipeline
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Machine Learning (stat.ML)

Transfer learning (TL) has been widely used in electroencephalogram (EEG) based brain-computer interfaces (BCIs) to reduce the calibration effort for a new subject, and demonstrated promising performance. After EEG signal acquisition, a closed-loop EEG-based BCI system also includes signal processing, feature engineering, and classification/regression blocks before sending out the control signal, whereas previous approaches only considered TL in one or two such components. This paper proposes that TL could be considered in all three components (signal processing, feature engineering, and classification/regression). Furthermore, it is also very important to specifically add a data alignment component before signal processing to make the data from different subjects more consistent, and hence to facilitate subsequential TL. Offline calibration experiments on two MI datasets verified our proposal. Especially, integrating data alignment and sophisticated TL approaches can significantly improve the classification performance, and hence greatly reduce the calibration effort.

[13]  arXiv:2007.03747 (cross-list from eess.SP) [pdf, ps, other]
Title: On Cokriging, Neural Networks, and Spatial Blind Source Separation for Multivariate Spatial Prediction
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

Multivariate measurements taken at irregularly sampled locations are a common form of data, for example in geochemical analysis of soil. In practical considerations predictions of these measurements at unobserved locations are of great interest. For standard multivariate spatial prediction methods it is mandatory to not only model spatial dependencies but also cross-dependencies which makes it a demanding task. Recently, a blind source separation approach for spatial data was suggested. When using this spatial blind source separation method prior the actual spatial prediction, modelling of spatial cross-dependencies is avoided, which in turn simplifies the spatial prediction task significantly. In this paper we investigate the use of spatial blind source separation as a pre-processing tool for spatial prediction and compare it with predictions from Cokriging and neural networks in an extensive simulation study as well as a geochemical dataset.

[14]  arXiv:2007.03749 (cross-list from cs.LG) [pdf, ps, other]
Title: Sharp Analysis of Smoothed Bellman Error Embedding
Comments: Accepted at the ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation. It has been successfully implemented with neural networks and achieved strong empirical results. In this work, we study the theoretical behavior of SBEED in batch-mode reinforcement learning. We prove a near-optimal performance guarantee that depends on the representation power of the used function classes and a tight notion of the distribution shift. Our results improve upon prior guarantees for SBEED in ~\citet{dai2018sbeed} in terms of the dependence on the planning horizon and on the sample size. Our analysis builds on the recent work of ~\citet{Xie2020} which studies a related algorithm MSBO, that could be interpreted as a \textit{non-smooth} counterpart of SBEED.

[15]  arXiv:2007.03758 (cross-list from cs.CE) [pdf, other]
Title: Deep learning of thermodynamics-aware reduced-order models from data
Comments: 16 pages, 7 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Machine Learning (stat.ML)

We present an algorithm to learn the relevant latent variables of a large-scale discretized physical system and predict its time evolution using thermodynamically-consistent deep neural networks. Our method relies on sparse autoencoders, which reduce the dimensionality of the full order model to a set of sparse latent variables with no prior knowledge of the coded space dimensionality. Then, a second neural network is trained to learn the metriplectic structure of those reduced physical variables and predict its time evolution with a so-called structure-preserving neural network. This data-based integrator is guaranteed to conserve the total energy of the system and the entropy inequality, and can be applied to both conservative and dissipative systems. The integrated paths can then be decoded to the original full-dimensional manifold and be compared to the ground truth solution. This method is tested with two examples applied to fluid and solid mechanics.

[16]  arXiv:2007.03760 (cross-list from cs.LG) [pdf, ps, other]
Title: Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning
Comments: Appendix included
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The Off-Policy Evaluation aims at estimating the performance of target policy $\pi$ using offline data rolled in by a logging policy $\mu$. Intensive studies have been conducted and the recent marginalized importance sampling (MIS) achieves the sample efficiency for OPE. However, it is rarely known if uniform convergence guarantees in OPE can be obtained efficiently. In this paper, we consider this new question and reveal the comprehensive relationship between OPE and offline learning for the first time. For the global policy class, by using the fully model-based OPE estimator, our best result is able to achieve $\epsilon$-uniform convergence with complexity $\widetilde{O}(H^3\cdot\min(S,H)/d_m\epsilon^2)$, where $d_m$ is an instance-dependent quantity decided by $\mu$. This result is only one factor away from our uniform convergence lower bound up to a logarithmic factor. For the local policy class, $\epsilon$-uniform convergence is achieved with the optimal complexity $\widetilde{O}(H^3/d_m\epsilon^2)$ in the off-policy setting. This result complements the work of sparse model-based planning (Agarwal et al. 2019) with generative model. Lastly, one interesting corollary of our intermediate result implies a refined analysis over simulation lemma.

[17]  arXiv:2007.03762 (cross-list from eess.SP) [pdf, other]
Title: Transfer Learning for Electricity Price Forecasting
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)

Electricity price forecasting is an essential task for all the deregulated markets of the world. The accurate prediction of the day-ahead electricity prices is an active research field and available data from various markets can be used as an input for forecasting. A collection of models have been proposed for this task, but the fundamental question on how to use the available big data is often neglected. In this paper, we propose to use transfer learning as a tool for utilizing information from other electricity price markets for forecasting. We pre-train a bidirectional Gated Recurrent Units (BGRU) network on source markets and finally do a fine-tuning for the target market. Moreover, we test different ways to use the input data from various markets in the models. Our experiments on five different day-ahead markets indicate that transfer learning improves the performance of electricity price forecasting in a statistically significant manner.

[18]  arXiv:2007.03767 (cross-list from cs.LG) [pdf, other]
Title: Defending Against Backdoors in Federated Learning with Robust Learning Rate
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Federated Learning (FL) allows a set of agents to collaboratively train a model in a decentralized fashion without sharing their potentially sensitive data. This makes FL suitable for privacy-preserving applications. At the same time, FL is susceptible to adversarial attacks due to decentralized and unvetted data. One important line of attacks against FL is the backdoor attacks. In a backdoor attack, an adversary tries to embed a backdoor trigger functionality to the model during training which can later be activated to cause a desired misclassification. To prevent such backdoor attacks, we propose a lightweight defense that requires no change to the FL structure. At a high level, our defense is based on carefully adjusting the server's learning rate, per dimension, at each round based on the sign information of agent's updates. We first conjecture the necessary steps to carry a successful backdoor attack in FL setting, and then, explicitly formulate the defense based on our conjecture. Through experiments, we provide empirical evidence to the support of our conjecture. We test our defense against backdoor attacks under different settings, and, observe that either backdoor is completely eliminated, or its accuracy is significantly reduced. Overall, our experiments suggests that our approach significantly outperforms some of the recently proposed defenses in the literature. We achieve this by having minimal influence over the accuracy of the trained models.

[19]  arXiv:2007.03774 (cross-list from cs.CL) [pdf, other]
Title: The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain
Authors: Xin Wang
Comments: 9 pages, 5 figures, 1 table
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression. Just like perceptual and cognitive neurophysiology has inspired effective deep neural network architectures which in turn make a useful model for understanding the brain, here we explore how biological neural development might inspire efficient and robust optimization procedures which in turn serve as a useful model for the maturation and aging of the brain.

[20]  arXiv:2007.03775 (cross-list from cs.LG) [pdf, other]
Title: README: REpresentation learning by fairness-Aware Disentangling MEthod
Comments: 8 pages, 3 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Fair representation learning aims to encode invariant representation with respect to the protected attribute, such as gender or age. In this paper, we design Fairness-aware Disentangling Variational AutoEncoder (FD-VAE) for fair representation learning. This network disentangles latent space into three subspaces with a decorrelation loss that encourages each subspace to contain independent information: 1) target attribute information, 2) protected attribute information, 3) mutual attribute information. After the representation learning, this disentangled representation is leveraged for fairer downstream classification by excluding the subspace with the protected attribute information. We demonstrate the effectiveness of our model through extensive experiments on CelebA and UTK Face datasets. Our method outperforms the previous state-of-the-art method by large margins in terms of equal opportunity and equalized odds.

[21]  arXiv:2007.03795 (cross-list from cs.LG) [pdf, other]
Title: Conditional gradient methods for stochastically constrained convex minimization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We propose two novel conditional gradient-based methods for solving structured stochastic convex optimization problems with a large number of linear constraints. Instances of this template naturally arise from SDP-relaxations of combinatorial problems, which involve a number of constraints that is polynomial in the problem dimension. The most important feature of our framework is that only a subset of the constraints is processed at each iteration, thus gaining a computational advantage over prior works that require full passes. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees. Preliminary numerical experiments are provided for illustrating the practical performance of the methods.

[22]  arXiv:2007.03797 (cross-list from cs.LG) [pdf, other]
Title: Personalized Federated Learning: An Attentive Collaboration Approach
Comments: Under review
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

For the challenging computational environment of IOT/edge computing, personalized federated learning allows every client to train a strong personalized cloud model by effectively collaborating with the other clients in a privacy-preserving manner. The performance of personalized federated learning is largely determined by the effectiveness of inter-client collaboration. However, when the data is non-IID across all clients, it is challenging to infer the collaboration relationships between clients without knowing their data distributions. In this paper, we propose to tackle this problem by a novel framework named federated attentive message passing (FedAMP) that allows each client to collaboratively train its own personalized cloud model without using a global model. FedAMP implements an attentive collaboration mechanism by iteratively encouraging clients with more similar model parameters to have stronger collaborations. This adaptively discovers the underlying collaboration relationships between clients, which significantly boosts effectiveness of collaboration and leads to the outstanding performance of FedAMP. We establish the convergence of FedAMP for both convex and non-convex models, and further propose a heuristic method that resembles the FedAMP framework to further improve its performance for federated learning with deep neural networks. Extensive experiments demonstrate the superior performance of our methods in handling non-IID data, dirty data and dropped clients.

[23]  arXiv:2007.03800 (cross-list from cs.LG) [pdf, ps, other]
Title: Efficient and Parallel Separable Dictionary Learning
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Separable, or Kronecker product, dictionaries provide natural decompositions for 2D signals, such as images. In this paper, we describe an algorithm to learn such dictionaries which is highly parallelizable and which reaches sparse representations competitive with the previous state of the art dictionary learning algorithms from the literature. We highlight the performance of the proposed method to sparsely represent image data and for image denoising applications.

[24]  arXiv:2007.03807 (cross-list from cs.LG) [pdf, other]
Title: Towards a practical measure of interference for reinforcement learning
Comments: 18 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. But, before we overcome interference we must understand it better. In this work, we provide a definition of interference for control in reinforcement learning. We systematically evaluate our new measures, by assessing correlation with several measures of learning performance, including stability, sample efficiency, and online and offline control performance across a variety of learning architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures. In particular we show that target network frequency is a dominating factor for interference, and that updates on the last layer result in significantly higher interference than updates internal to the network. This new measure can be expensive to compute; we conclude with motivation for an efficient proxy measure and empirically demonstrate it is correlated with our definition of interference.

[25]  arXiv:2007.03812 (cross-list from cs.LG) [pdf, other]
Title: Robust Multi-Agent Multi-Armed Bandits
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

There has been recent interest in collaborative multi-agent bandits, where groups of agents share recommendations to decrease per-agent regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include honest and malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with them, i.e., "blacklist" them. We show that collaboration indeed decreases regret for this algorithm, when the number of malicious agents is small compared to the number of arms, and crucially without assumptions on the malicious agents' behavior. Thus, our algorithm is robust against any malicious recommendation strategy.

[26]  arXiv:2007.03813 (cross-list from cs.LG) [pdf, other]
Title: Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM). Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. Such dependence can be problematic for over-parameterized models where $p \gg n$, the number of training samples. Existing lower bounds on private ERM show that such dependence on $p$ is inevitable in the worst case. In this paper, we circumvent the dependence on the ambient dimension by leveraging a low-dimensional structure of gradient space in deep networks---that is, the stochastic gradients for deep nets usually stay in a low dimensional subspace in the training process. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace, which is given by the top gradient eigenspace on a small public dataset. We provide a general sample complexity analysis on the public dataset for the gradient subspace identification problem and demonstrate that under certain low-dimensional assumptions the public sample complexity only grows logarithmically in $p$. Finally, we provide a theoretical analysis and empirical evaluations to show that our method can substantially improve the accuracy of DP-SGD.

[27]  arXiv:2007.03828 (cross-list from astro-ph.IM) [pdf, other]
Title: Deep Ensemble Analysis for Imaging X-ray Polarimetry
Comments: 14 pages, 9 figures. Submitted to Nuclear Instruments and Methods in Physics Research Section A on 3rd July 2020
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); High Energy Astrophysical Phenomena (astro-ph.HE); Machine Learning (cs.LG); Machine Learning (stat.ML)

We present a method for enhancing the sensitivity of X-ray telescopic observations with imaging polarimeters, with a focus on the gas pixel detectors (GPDs) to be flown on the Imaging X-ray Polarimetry Explorer (IXPE). Our analysis measures photoelectron directions, X-ray absorption points and X-ray energies for 2-8keV event tracks, with estimates for both the statistical and systematic (reconstruction) uncertainties. We use a weighted maximum likelihood combination of predictions from a deep ensemble of ResNet convolutional neural networks, trained on Monte Carlo event simulations. We define a figure of merit to compare the polarization bias-variance trade-off in track reconstruction algorithms. For power-law source spectra, our method improves on current state-of-the-art (and previous deep learning approaches), providing ~45% increase in effective exposure times. For individual energies, our method produces 20-30% absolute improvements in modulation factor for simulated 100% polarized events, while keeping residual systematic modulation within 1 sigma of the finite sample minimum. Absorption point location and photon energy estimates are also significantly improved. We have validated our method with sample data from real GPD detectors.

[28]  arXiv:2007.03832 (cross-list from cs.LG) [pdf, other]
Title: Fast Training of Deep Neural Networks Robust to Adversarial Perturbations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep neural networks are capable of training fast and generalizing well within many domains. Despite their promising performance, deep networks have shown sensitivities to perturbations of their inputs (e.g., adversarial examples) and their learned feature representations are often difficult to interpret, raising concerns about their true capability and trustworthiness. Recent work in adversarial training, a form of robust optimization in which the model is optimized against adversarial examples, demonstrates the ability to improve performance sensitivities to perturbations and yield feature representations that are more interpretable. Adversarial training, however, comes with an increased computational cost over that of standard (i.e., nonrobust) training, rendering it impractical for use in large-scale problems. Recent work suggests that a fast approximation to adversarial training shows promise for reducing training time and maintaining robustness in the presence of perturbations bounded by the infinity norm. In this work, we demonstrate that this approach extends to the Euclidean norm and preserves the human-aligned feature representations that are common for robust models. Additionally, we show that using a distributed training scheme can further reduce the time to train robust deep networks. Fast adversarial training is a promising approach that will provide increased security and explainability in machine learning applications for which robust optimization was previously thought to be impractical.

[29]  arXiv:2007.03844 (cross-list from cs.LG) [pdf, other]
Title: Consistency Regularization with Generative Adversarial Networks for Semi-Supervised Image Classification
Comments: 10 pages, 5 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Generative Adversarial Networks (GANs) based semi-supervised learning (SSL) approaches are shown to improve classification performance by utilizing a large number of unlabeled samples in conjunction with limited labeled samples. However, their performance still lags behind the state-of-the-art non-GAN based SSL approaches. One main reason we identify is the lack of consistency in class probability predictions on the same image under local perturbations. This problem was addressed in the past in a generic setting using the label consistency regularization, which enforces the class probability predictions for an input image to be unchanged under various semantic-preserving perturbations. In this work, we incorporate the consistency regularization in the vanilla semi-GAN to address this critical limitation. In particular, we present a new composite consistency regularization method which, in spirit, combines two well-known consistency-based techniques -- Mean Teacher and Interpolation Consistency Training. We demonstrate the efficacy of our approach on two SSL image classification benchmark datasets, SVHN and CIFAR-10. Our experiments show that this new composite consistency regularization based semi-GAN significantly improves its performance and achieves new state-of-the-art performance among GAN-based SSL approaches.

[30]  arXiv:2007.03856 (cross-list from cs.LG) [pdf, other]
Title: BlockFLow: An Accountable and Privacy-Preserving Solution for Federated Learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Federated learning enables the development of a machine learning model among collaborating agents without requiring them to share their underlying data. However, malicious agents who train on random data, or worse, on datasets with the result classes inverted, can weaken the combined model. BlockFLow is an accountable federated learning system that is fully decentralized and privacy-preserving. Its primary goal is to reward agents proportional to the quality of their contribution while protecting the privacy of the underlying datasets and being resilient to malicious adversaries. Specifically, BlockFLow incorporates differential privacy, introduces a novel auditing mechanism for model contribution, and uses Ethereum smart contracts to incentivize good behavior. Unlike existing auditing and accountability methods for federated learning systems, our system does not require a centralized test dataset, sharing of datasets between the agents, or one or more trusted auditors; it is fully decentralized and resilient up to a 50% collusion attack in a malicious trust model. When run on the public Ethereum blockchain, BlockFLow uses the results from the audit to reward parties with cryptocurrency based on the quality of their contribution. We evaluated BlockFLow on two datasets that offer classification tasks solvable via logistic regression models. Our results show that the resultant auditing scores reflect the quality of the honest agents' datasets. Moreover, the scores from dishonest agents are statistically lower than those from the honest agents. These results, along with the reasonable blockchain costs, demonstrate the effectiveness of BlockFLow as an accountable federated learning system.

[31]  arXiv:2007.03899 (cross-list from cs.LG) [pdf, other]
Title: Density Fixing: Simple yet Effective Regularization Method based on the Class Prior
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Machine learning models suffer from overfitting, which is caused by a lack of labeled data. To tackle this problem, we proposed a framework of regularization methods, called density-fixing, that can be used commonly for supervised and semi-supervised learning. Our proposed regularization method improves the generalization performance by forcing the model to approximate the class's prior distribution or the frequency of occurrence. This regularization term is naturally derived from the formula of maximum likelihood estimation and is theoretically justified. We further investigated the asymptotic behavior of the proposed method and how the regularization terms behave when assuming a prior distribution of several classes in practice. Experimental results on multiple benchmark datasets are sufficient to support our argument, and we suggest that this simple and effective regularization method is useful in real-world machine learning problems.

[32]  arXiv:2007.03912 (cross-list from cs.LG) [pdf, other]
Title: Linear Tensor Projection Revealing Nonlinearity
Comments: 13 pages, 6 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Dimensionality reduction is an effective method for learning high-dimensional data, which can provide better understanding of decision boundaries in human-readable low-dimensional subspace. Linear methods, such as principal component analysis and linear discriminant analysis, make it possible to capture the correlation between many variables; however, there is no guarantee that the correlations that are important in predicting data can be captured. Moreover, if the decision boundary has strong nonlinearity, the guarantee becomes increasingly difficult. This problem is exacerbated when the data are matrices or tensors that represent relationships between variables. We propose a learning method that searches for a subspace that maximizes the prediction accuracy while retaining as much of the original data information as possible, even if the prediction model in the subspace has strong nonlinearity. This makes it easier to interpret the mechanism of the group of variables behind the prediction problem that the user wants to know. We show the effectiveness of our method by applying it to various types of data including matrices and tensors.

[33]  arXiv:2007.03920 (cross-list from cs.LG) [pdf, other]
Title: Binary Stochastic Filtering: feature selection and beyond
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Feature selection is one of the most decisive tools in understanding data and machine learning models. Among other methods, sparsity induced by $L^{1}$ penalty is one of the simplest and best studied approaches to this problem. Although such regularization is frequently used in neural networks to achieve sparsity of weights or unit activations, it is unclear how it can be employed in the feature selection problem. This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used, namely, by stochastically penalizing feature involvement instead of the layer weights. The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead, and can be directly applied to any existing architecture. Furthermore, the method is easily generalizable for neuron pruning and selection of regions of importance for spectral data.

[34]  arXiv:2007.03937 (cross-list from cs.LG) [pdf, ps, other]
Title: A Nearest Neighbor Characterization of Lebesgue Points in Metric Measure Spaces
Authors: Tommaso Cesari (ANITI, TSE), Roberto Colomboni
Subjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

The property of almost every point being a Lebesgue point has proven to be crucial for the consistency of several classification algorithms based on nearest neighbors. We characterize Lebesgue points in terms of a 1-Nearest Neighbor regression algorithm for pointwise estimation, fleshing out the role played by tie-breaking rules in the corresponding convergence problem. We then give an application of our results, proving the convergence of the risk of a large class of 1-Nearest Neighbor classification algorithms in general metric spaces where almost every point is a Lebesgue point.

[35]  arXiv:2007.03938 (cross-list from cs.LG) [pdf, other]
Title: Operation-Aware Soft Channel Pruning using Differentiable Masks
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We propose a simple but effective data-driven channel pruning algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations. The proposed approach makes a joint consideration of batch normalization (BN) and rectified linear unit (ReLU) for channel pruning; it estimates how likely the two successive operations deactivate each feature map and prunes the channels with high probabilities. To this end, we learn differentiable masks for individual channels and make soft decisions throughout the optimization procedure, which facilitates to explore larger search space and train more stable networks. The proposed framework enables us to identify compressed models via a joint learning of model parameters and channel pruning without an extra procedure of fine-tuning. We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks given the same amount of resources when compared with the state-of-the-art methods.

[36]  arXiv:2007.03961 (cross-list from cs.LG) [pdf, other]
Title: Double Prioritized State Recycled Experience Replay
Authors: Fanchen Bu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Experience replay enables online reinforcement learning agents to store and reuse the experiences generated in previous interaction with the environment. In the original method, the experiences are sampled and replayed to train the Q-network at the same possibility, i.e. uniformly. In prior work, a method called prioritized experience replay was developed where experiences in the memory are prioritized, so as to replay experiences which seem to be more important in higher frequencies for training the Q-network more efficiently. In this paper, we develop a method called double-prioritized state-recycled (DPSR) experience replay, prioritizing the experience both for training stage and storing stage, as well as replacing the experiences in the memory with state recycling to make the best of experiences which seem to have low priorities temporarily. We use this method in Deep Q-Networks (DQN), and achieve a state-of-the-art result, outperforming the original method and prioritized experience replay on many Atari games.

[37]  arXiv:2007.03966 (cross-list from cs.LG) [pdf, other]
Title: Semi-Supervised Learning with Meta-Gradient
Comments: 17 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this work, we propose a simple yet effective meta-learning algorithm in thesemi-supervised settings. We notice that existing consistency-based approachesmostly do not consider the essential role of the label information for consistencyregularization. To alleviate this issue, we bridge the relationship between theconsistency loss and label information by unfolding and differentiating throughone optimization step. Specifically, we exploit the pseudo labels of the unlabeledexamples which are guided by the meta-gradients of the labeled data loss so thatthe model can generalize well on the labeled examples. In addition, we introduce asimple first-order approximation to avoid computing higher-order derivatives andguarantee scalability. Extensive evaluations on the SVHN, CIFAR, and ImageNetdatasets demonstrate that the proposed algorithm performs favorably against thestate-of-the-art methods.

[38]  arXiv:2007.03995 (cross-list from cs.LG) [pdf, other]
Title: MCU-Net: A framework towards uncertainty representations for decision support system patient referrals in healthcare contexts
Authors: Nabeel Seedat
Comments: 4 pages, 4 figures, Accepted to ICML 2020 Workshop on Uncertainty & Robustness in Deep Learning & Machine Learning for Global Health
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Incorporating a human-in-the-loop system when deploying automated decision support is critical in healthcare contexts to create trust, as well as provide reliable performance on a patient-to-patient basis. Deep learning methods while having high performance, do not allow for this patient-centered approach due to the lack of uncertainty representation. Thus, we present a framework of uncertainty representation evaluated for medical image segmentation, using MCU-Net which combines a U-Net with Monte Carlo Dropout, evaluated with four different uncertainty metrics. The framework augments this by adding a human-in-the-loop aspect based on an uncertainty threshold for automated referral of uncertain cases to a medical professional. We demonstrate that MCU-Net combined with epistemic uncertainty and an uncertainty threshold tuned for this application maximizes automated performance on an individual patient level, yet refers truly uncertain cases. This is a step towards uncertainty representations when deploying machine learning based decision support in healthcare settings.

[39]  arXiv:2007.04001 (cross-list from cs.LG) [pdf, other]
Title: Supervised machine learning techniques for data matching based on similarity metrics
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)

Businesses, governmental bodies and NGO's have an ever-increasing amount of data at their disposal from which they try to extract valuable information. Often, this needs to be done not only accurately but also within a short time frame. Clean and consistent data is therefore crucial. Data matching is the field that tries to identify instances in data that refer to the same real-world entity. In this study, machine learning techniques are combined with string similarity functions to the field of data matching. A dataset of invoices from a variety of businesses and organizations was preprocessed with a grouping scheme to reduce pair dimensionality and a set of similarity functions was used to quantify similarity between invoice pairs. The resulting invoice pair dataset was then used to train and validate a neural network and a boosted decision tree. The performance was compared with a solution from FISCAL Technologies as a benchmark against currently available deduplication solutions. Both the neural network and boosted decision tree showed equal to better performance.

[40]  arXiv:2007.04002 (cross-list from cs.LG) [pdf, other]
Title: Unbiased Lift-based Bidding System
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Conventional bidding strategies for online display ad auction heavily relies on observed performance indicators such as clicks or conversions. A bidding strategy naively pursuing these easily observable metrics, however, fails to optimize the profitability of the advertisers. Rather, the bidding strategy that leads to the maximum revenue is a strategy pursuing the performance \textit{lift} of showing ads to a specific user. Therefore, it is essential to predict the lift-effect of showing ads to each user on their target variables from observed log data. However, there is a difficulty in predicting the lift-effect, as the training data gathered by a past bidding strategy may have a strong bias towards the winning impressions. In this study, we develop \textit{Unbiased Lift-based Bidding System}, which maximizes the advertisers' profit by accurately predicting the lift-effect from biased log data. Our system is the first to enable high-performing lift-based bidding strategy by theoretically alleviating the inherent bias in the log. Real-world, large-scale A/B testing successfully demonstrates the superiority and practicability of the proposed system.

[41]  arXiv:2007.04028 (cross-list from cs.LG) [pdf, other]
Title: How benign is benign overfitting?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorly) trained models. When trained with SGD, deep neural networks essentially achieve zero training error, even in the presence of label noise, while also exhibiting good generalization on natural test data, something referred to as benign overfitting [2, 10]. However, these models are vulnerable to adversarial attacks. We identify label noise as one of the causes for adversarial vulnerability, and provide theoretical and empirical evidence in support of this. Surprisingly, we find several instances of label noise in datasets such as MNIST and CIFAR, and that robustly trained models incur training error on some of these, i.e. they don't fit the noise. However, removing noisy labels alone does not suffice to achieve adversarial robustness. Standard training procedures bias neural networks towards learning "simple" classification boundaries, which may be less robust than more complex ones. We observe that adversarial training does produce more complex decision boundaries. We conjecture that in part the need for complex decision boundaries arises from sub-optimal representation learning. By means of simple toy examples, we show theoretically how the choice of representation can drastically affect adversarial robustness.

[42]  arXiv:2007.04030 (cross-list from cs.LG) [pdf, other]
Title: Incorporating prior knowledge about structural constraints in model identification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Model identification is a crucial problem in chemical industries. In recent years, there has been increasing interest in learning data-driven models utilizing partial knowledge about the system of interest. Most techniques for model identification do not provide the freedom to incorporate any partial information such as the structure of the model. In this article, we propose model identification techniques that could leverage such partial information to produce better estimates. Specifically, we propose Structural Principal Component Analysis (SPCA) which improvises over existing methods like PCA by utilizing the essential structural information about the model. Most of the existing methods or closely related methods use sparsity constraints which could be computationally expensive. Our proposed method is a wise modification of PCA to utilize structural information. The efficacy of the proposed approach is demonstrated using synthetic and industrial case-studies.

[43]  arXiv:2007.04043 (cross-list from cs.LG) [pdf, ps, other]
Title: A One-step Approach to Covariate Shift Adaptation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. However, such an assumption is often violated in the real world due to non-stationarity of the environment or bias in sample selection. In this work, we consider a prevalent setting called covariate shift, where the input distribution differs between the training and test stages while the conditional distribution of the output given the input remains unchanged. Most of the existing methods for covariate shift adaptation are two-step approaches, which first calculate the importance weights and then conduct importance-weighted empirical risk minimization. In this paper, we propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization by minimizing an upper bound of the test risk. We theoretically analyze the proposed method and provide a generalization error bound. We also empirically demonstrate the effectiveness of the proposed method.

[44]  arXiv:2007.04068 (cross-list from cs.CY) [pdf, ps, other]
Title: Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence
Comments: 28 Pages. Accepted, to appear in: Philosophy and Technology (405), Springer. Submitted 16 January, Accepted 26 May 2020
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper explores the important role of critical science, and in particular of post-colonial and decolonial theories, in understanding and shaping the ongoing advances in artificial intelligence. Artificial Intelligence (AI) is viewed as amongst the technological advances that will reshape modern societies and their relations. Whilst the design and deployment of systems that continually adapt holds the promise of far-reaching positive change, they simultaneously pose significant risks, especially to already vulnerable peoples. Values and power are central to this discussion. Decolonial theories use historical hindsight to explain patterns of power that shape our intellectual, political, economic, and social world. By embedding a decolonial critical approach within its technical practice, AI communities can develop foresight and tactics that can better align research and technology development with established ethical principles, centring vulnerable peoples who continue to bear the brunt of negative impacts of innovation and scientific progress. We highlight problematic applications that are instances of coloniality, and using a decolonial lens, submit three tactics that can form a decolonial field of artificial intelligence: creating a critical technical practice of AI, seeking reverse tutelage and reverse pedagogies, and the renewal of affective and political communities. The years ahead will usher in a wave of new scientific breakthroughs and technologies driven by AI research, making it incumbent upon AI communities to strengthen the social contract through ethical foresight and the multiplicity of intellectual perspectives available to us; ultimately supporting future technologies that enable greater well-being, with the goal of beneficence and justice for all.

[45]  arXiv:2007.04074 (cross-list from cs.LG) [pdf, other]
Title: Auto-Sklearn 2.0: The Next Generation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Automated Machine Learning, which supports practitioners and researchers with the tedious task of manually designing machine learning pipelines, has recently achieved substantial success. In this paper we introduce new Automated Machine Learning (AutoML) techniques motivated by our winning submission to the second ChaLearn AutoML challenge, PoSH Auto-sklearn. For this, we extend Auto-sklearn with a new, simpler meta-learning technique, improve its way of handling iterative algorithms and enhance it with a successful bandit strategy for budget allocation. Furthermore, we go one step further and study the design space of AutoML itself and propose a solution towards truly hand-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn (2.0). We verify the improvement by these additions in a large experimental study on 39 AutoML benchmark datasets and conclude the paper by comparing to Auto-sklearn (1.0), reducing the regret by up to a factor of five.

[46]  arXiv:2007.04087 (cross-list from cs.LG) [pdf, other]
Title: Hyperparameter Optimization in Neural Networks via Structured Sparse Recovery
Comments: arXiv admin note: text overlap with arXiv:1906.02869
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we study two important problems in the automated design of neural networks -- Hyper-parameter Optimization (HPO), and Neural Architecture Search (NAS) -- through the lens of sparse recovery methods. In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery. In particular, we show that a special encoding of the hyperparameter space enables a natural group-sparse recovery formulation, which when coupled with HyperBand (a multi-armed bandit strategy), leads to improvement over existing hyperparameter optimization methods. Experimental results on image datasets such as CIFAR-10 confirm the benefits of our approach. In the second part of this paper, we establish a connection between NAS and structured sparse recovery. Building upon ``one-shot'' approaches in NAS, we propose a novel algorithm that we call CoNAS by merging ideas from one-shot approaches with a techniques for learning low-degree sparse Boolean polynomials. We provide theoretical analysis on the number of validation error measurements. Finally, we validate our approach on several datasets and discover novel architectures hitherto unreported, achieving competitive (or better) results in both performance and search time compared to the existing NAS approaches.

[47]  arXiv:2007.04091 (cross-list from cs.LG) [pdf, other]
Title: Bespoke vs. Prêt-à-Porter Lottery Tickets: Exploiting Mask Similarity for Trainable Sub-Network Finding
Comments: arXiv admin note: text overlap with arXiv:2001.05050
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The observation of sparse trainable sub-networks within over-parametrized networks - also known as Lottery Tickets (LTs) - has prompted inquiries around their trainability, scaling, uniqueness, and generalization properties. Across 28 combinations of image classification tasks and architectures, we discover differences in the connectivity structure of LTs found through different iterative pruning techniques, thus disproving their uniqueness and connecting emergent mask structure to the choice of pruning. In addition, we propose a consensus-based method for generating refined lottery tickets. This lottery ticket denoising procedure, based on the principle that parameters that always go unpruned across different tasks more reliably identify important sub-networks, is capable of selecting a meaningful portion of the architecture in an embarrassingly parallel way, while quickly discarding extra parameters without the need for further pruning iterations. We successfully train these sub-networks to performance comparable to that of ordinary lottery tickets.

[48]  arXiv:2007.04146 (cross-list from cs.LG) [pdf, other]
Title: Few-Shot One-Class Classification via Meta-Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Although few-shot learning and one-class classification (OCC), i.e. learning a binary classifier with data from only one class, have been separately well studied, their intersection remains rather unexplored. Our work addresses the few-shot OCC problem and presents a method to modify the episodic data sampling strategy of the model-agnostic meta-learning (MAML) algorithm to learn a model initialization particularly suited for learning few-shot OCC tasks. This is done by explicitly optimizing for an initialization which only requires few gradient steps with one-class minibatches to yield a performance increase on class-balanced test data. We provide a theoretical analysis that explains why our approach works in the few-shot OCC scenario, while other meta-learning algorithms, including MAML, fail. Our experiments on eight datasets from the image and time-series domains show that our method leads to higher results than classical OCC and few-shot classification approaches, and demonstrate the ability to learn unseen tasks from only few normal class samples. Moreover, we successfully train anomaly detectors for a real-world application on sensor readings recorded during industrial manufacturing of workpieces with a CNC milling machine using few normal examples. Finally, we empirically demonstrate that the proposed data sampling technique increases the performance of more recent meta-learning algorithms in few-shot OCC.

[49]  arXiv:2007.04154 (cross-list from q-fin.MF) [pdf, other]
Title: Robust pricing and hedging via neural SDEs
Subjects: Mathematical Finance (q-fin.MF); Machine Learning (cs.LG); Machine Learning (stat.ML)

Mathematical modelling is ubiquitous in the financial industry and drives key decision processes. Any given model provides only a crude approximation to reality and the risk of using an inadequate model is hard to detect and quantify. By contrast, modern data science techniques are opening the door to more robust and data-driven model selection mechanisms. However, most machine learning models are "black-boxes" as individual parameters do not have meaningful interpretation. The aim of this paper is to combine the above approaches achieving the best of both worlds. Combining neural networks with risk models based on classical stochastic differential equations (SDEs), we find robust bounds for prices of derivatives and the corresponding hedging strategies while incorporating relevant market data. The resulting model called neural SDE is an instantiation of generative models and is closely linked with the theory of causal optimal transport. Neural SDEs allow consistent calibration under both the risk-neutral and the real-world measures. Thus the model can be used to simulate market scenarios needed for assessing risk profiles and hedging strategies. We develop and analyse novel algorithms needed for efficient use of neural SDEs. We validate our approach with numerical experiments using both local and stochastic volatility models.

[50]  arXiv:2007.04169 (cross-list from cs.LG) [pdf, other]
Title: An exploration of the influence of path choice in game-theoretic attribution algorithms
Comments: 21 pages, 23 figures, submitted to JMLR 7/7/2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We compare machine learning explainability methods based on the theory of atomic (Shapley, 1953) and infinitesimal (Aumann and Shapley, 1974) games, in a theoretical and experimental investigation into how the model and choice of integration path can influence the resulting feature attributions. To gain insight into differences in attributions resulting from interventional Shapley values (Sundararajan and Najmi, 2019; Janzing et al., 2019; Chen et al., 2019) and Generalized Integrated Gradients (GIG) (Merrill et al., 2019) we note interventional Shapley is equivalent to a multi-path integration along $n!$ paths where $n$ is the number of model input features. Applying Stoke's theorem we show that the path symmetry of these two methods results in the same attributions when the model is composed of a sum of separable functions of individual features and a sum of two-feature products. We then perform a series of experiments with varying degrees of data missingness to demonstrate how interventional Shapley's multi-path approach can yield less consistent attributions than the single straight-line path of Aumann-Shapley. We argue this is because the multiple paths employed by interventional Shaply extend away from the training data manifold and are therefore more likely to pass through regions where the model has little support. In the absence of a more meaningful path choice, we therefore advocate the straight-line path since it will almost always pass closer to the data manifold. Among straight-line path attribution algorithms, GIG is uniquely robust since it will still yield Shapley values for atomic games modeled by decision trees.

[51]  arXiv:2007.04176 (cross-list from astro-ph.IM) [pdf, other]
Title: Detection of Gravitational Waves Using Bayesian Neural Networks
Comments: 15 pages, 13 figures
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

We propose a new model of Bayesian Neural Networks to not only detect the events of compact binary coalescence in the observational data of gravitational waves (GW) but also identify the time periods of the associated GW waveforms before the events. This is achieved by incorporating the Bayesian approach into the CLDNN classifier, which integrates together the Convolutional Neural Network (CNN) and the Long Short-Term Memory Recurrent Neural Network (LSTM). Our model successfully detect all seven BBH events in the LIGO Livingston O2 data, with the periods of their GW waveforms correctly labeled. The ability of a Bayesian approach for uncertainty estimation enables a newly defined `awareness' state for recognizing the possible presence of signals of unknown types, which is otherwise rejected in a non-Bayesian model. Such data chunks labeled with the awareness state can then be further investigated rather than overlooked. Performance tests show that our model recognizes 90% of the events when the optimal signal-to-noise ratio $\rho_\text{opt} >7$ (100% when $\rho_\text{opt} >8.5$) and successfully labels more than 95% of the waveform periods when $\rho_\text{opt} >8$. The latency between the arrival of peak signal and generating an alert with the associated waveform period labeled is only about 20 seconds for an unoptimized code on a moderate GPU-equipped personal computer. This makes our model possible for nearly real-time detection and for forecasting the coalescence events when assisted with deeper training on a larger dataset using the state-of-art HPCs.

[52]  arXiv:2007.04202 (cross-list from cs.LG) [pdf, other]
Title: Stochastic Hamiltonian Gradient Methods for Smooth Games
Comments: ICML 2020 - Proceedings of the 37th International Conference on Machine Learning
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)

The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.

[53]  arXiv:2007.04203 (cross-list from cs.LG) [pdf, other]
Title: A Natural Actor-Critic Algorithm with Downside Risk Constraints
Comments: 14 pages, 5 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Finance (q-fin.CP); Portfolio Management (q-fin.PM); Machine Learning (stat.ML)

Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion to downside risk which we gauge by the lower partial moment of the return. We introduce a new Bellman equation that upper bounds the lower partial moment, circumventing its non-linearity. We prove that this proxy for the lower partial moment is a contraction, and provide intuition into the stability of the algorithm by variance decomposition. This allows sample-efficient, on-line estimation of partial moments. For risk-sensitive control, we instantiate Reward Constrained Policy Optimization, a recent actor-critic method for finding constrained policies, with our proxy for the lower partial moment. We extend the method to use natural policy gradients and demonstrate the effectiveness of our approach on three benchmark problems for risk-sensitive reinforcement learning.

[54]  arXiv:2007.04205 (cross-list from cs.CL) [pdf, other]
Title: Analysis of Predictive Coding Models for Phonemic Representation Learning in Small Datasets
Comments: 7 pages, 5 figures, 5 tables. Accepted paper at the workshop on Self-supervision in Audio and Speech at ICML 2020
Subjects: Computation and Language (cs.CL); Machine Learning (stat.ML)

Neural network models using predictive coding are interesting from the viewpoint of computational modelling of human language acquisition, where the objective is to understand how linguistic units could be learned from speech without any labels. Even though several promising predictive coding -based learning algorithms have been proposed in the literature, it is currently unclear how well they generalise to different languages and training dataset sizes. In addition, despite that such models have shown to be effective phonemic feature learners, it is unclear whether minimisation of the predictive loss functions of these models also leads to optimal phoneme-like representations. The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task (ABX task) for two languages with different dataset sizes. Our experiments show a strong correlation between the autoregressive loss and the phoneme discrimination scores with the two datasets. However, to our surprise, the CPC model shows rapid convergence already after one pass over the training data, and, on average, its representations outperform those of APC on both languages.

[55]  arXiv:2007.04206 (cross-list from cs.LG) [pdf, ps, other]
Title: Diverse Ensembles Improve Calibration
Comments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Modern deep neural networks can produce badly calibrated predictions, especially when train and test distributions are mismatched. Training an ensemble of models and averaging their predictions can help alleviate these issues. We propose a simple technique to improve calibration, using a different data augmentation for each ensemble member. We additionally use the idea of `mixing' un-augmented and augmented inputs to improve calibration when test and training distributions are the same. These simple techniques improve calibration and accuracy over strong baselines on the CIFAR10 and CIFAR100 benchmarks, and out-of-domain data from their corrupted versions.

[56]  arXiv:2007.04212 (cross-list from cs.LG) [pdf, other]
Title: The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Machine Learning (stat.ML)

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM). To discover compositional structures of the data, we propose the Scattering Compositional Learner (SCL), an architecture that composes neural networks in a sequence. Our SCL achieves state-of-the-art performance on two RPM datasets, with a 48.7% relative improvement on Balanced-RAVEN and 26.4% on PGM over the previous state-of-the-art. We additionally show that our model discovers compositional representations of objects' attributes (e.g., shape color, size), and their relationships (e.g., progression, union). We also find that the compositional representation makes the SCL significantly more robust to test-time domain shifts and greatly improves zero-shot generalization to previously unseen analogies.

[57]  arXiv:2007.04214 (cross-list from cs.LG) [pdf, ps, other]
Title: Linear-Time Algorithms for Adaptive Submodular Maximization
Authors: Shaojie Tang
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we develop fast algorithms for two stochastic submodular maximization problems. We start with the well-studied adaptive submodular maximization problem subject to a cardinality constraint. We develop the first linear-time algorithm which achieves a $(1-1/e-\epsilon)$ approximation ratio. Notably, the time complexity of our algorithm is $O(n\log\frac{1}{\epsilon})$ (number of function evaluations) which is independent of the cardinality constraint, where $n$ is the size of the ground set. Then we introduce the concept of fully adaptive submodularity, and develop a linear-time algorithm for maximizing a fully adaptive submoudular function subject to a partition matroid constraint. We show that our algorithm achieves a $\frac{1-1/e-\epsilon}{4-2/e-2\epsilon}$ approximation ratio using only $O(n\log\frac{1}{\epsilon})$ number of function evaluations.

[58]  arXiv:2007.04216 (cross-list from cs.LG) [pdf, other]
Title: RicciNets: Curvature-guided Pruning of High-performance Neural Networks Using Ricci Flow
Comments: To appear at ICML 2020, AutoML Workshop. Contains 11 pages, 5 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A novel method to identify salient computational paths within randomly wired neural networks before training is proposed. The computational graph is pruned based on a node mass probability function defined by local graph measures and weighted by hyperparameters produced by a reinforcement learning-based controller neural network. We use the definition of Ricci curvature to remove edges of low importance before mapping the computational graph to a neural network. We show a reduction of almost $35\%$ in the number of floating-point operations (FLOPs) per pass, with no degradation in performance. Further, our method can successfully regularize randomly wired neural networks based on purely structural properties, and also find that the favourable characteristics identified in one network generalise to other networks. The method produces networks with better performance under similar compression to those pruned by lowest-magnitude weights. To our best knowledge, this is the first work on pruning randomly wired neural networks, as well as the first to utilize the topological measure of Ricci curvature in the pruning mechanism.

[59]  arXiv:2007.04234 (cross-list from cs.CV) [pdf, other]
Title: Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Pretraining has become a standard technique in computer vision and natural language processing, which usually helps to improve performance substantially. Previously, the most dominant pretraining method is transfer learning (TL), which uses labeled data to learn a good representation network. Recently, a new pretraining approach -- self-supervised learning (SSL) -- has demonstrated promising results on a wide range of applications. SSL does not require annotated labels. It is purely conducted on input data by solving auxiliary tasks defined on the input data examples. The current reported results show that in certain applications, SSL outperforms TL and the other way around in other applications. There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other. Without an informed guideline, ML researchers have to try both methods to find out which one is better empirically. It is usually time-consuming to do so. In this work, we aim to address this problem. We perform a comprehensive comparative study between SSL and TL regarding which one works better under different properties of data and tasks, including domain difference between source and target tasks, the amount of pretraining data, class imbalance in source data, and usage of target data for additional pretraining, etc. The insights distilled from our comparative studies can help ML researchers decide which method to use based on the properties of their applications.

[60]  arXiv:2007.04238 (cross-list from cs.LG) [pdf, other]
Title: Predicting the Accuracy of a Few-Shot Classifier
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

In the context of few-shot learning, one cannot measure the generalization ability of a trained classifier using validation sets, due to the small number of labeled samples. In this paper, we are interested in finding alternatives to answer the question: is my classifier generalizing well to previously unseen data? We first analyze the reasons for the variability of generalization performances. We then investigate the case of using transfer-based solutions, and consider three settings: i) supervised where we only have access to a few labeled samples, ii) semi-supervised where we have access to both a few labeled samples and a set of unlabeled samples and iii) unsupervised where we only have access to unlabeled samples. For each setting, we propose reasonable measures that we empirically demonstrate to be correlated with the generalization ability of considered classifiers. We also show that these simple measures can be used to predict generalization up to a certain confidence. We conduct our experiments on standard few-shot vision datasets.

[61]  arXiv:2007.04239 (cross-list from cs.CL) [pdf, other]
Title: A Survey on Transfer Learning in Natural Language Processing
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep learning models usually require a huge amount of data. However, these large datasets are not always attainable. This is common in many challenging NLP tasks. Consider Neural Machine Translation, for instance, where curating such large datasets may not be possible specially for low resource languages. Another limitation of deep learning models is the demand for huge computing resources. These obstacles motivate research to question the possibility of knowledge transfer using large trained models. The demand for transfer learning is increasing as many large models are emerging. In this survey, we feature the recent transfer learning advances in the field of NLP. We also provide a taxonomy for categorizing different transfer learning approaches from the literature.

[62]  arXiv:2007.04249 (cross-list from cs.CL) [pdf, other]
Title: Cooking Is All About People: Comment Classification On Cookery Channels Using BERT and Classification Models (Malayalam-English Mix-Code)
Authors: Subramaniam Kazhuparambil (1), Abhishek Kaushik (1 and 2) ((1) Dublin Business School, (2) Dublin City University)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)

The scope of a lucrative career promoted by Google through its video distribution platform YouTube has attracted a large number of users to become content creators. An important aspect of this line of work is the feedback received in the form of comments which show how well the content is being received by the audience. However, volume of comments coupled with spam and limited tools for comment classification makes it virtually impossible for a creator to go through each and every comment and gather constructive feedback. Automatic classification of comments is a challenge even for established classification models, since comments are often of variable lengths riddled with slang, symbols and abbreviations. This is a greater challenge where comments are multilingual as the messages are often rife with the respective vernacular. In this work, we have evaluated top-performing classification models and four different vectorizers, for classifying comments which are a mix of different combinations of English and Malayalam (only English, only Malayalam and Mix of English and Malayalam). The statistical analysis of results indicates that Multinomial Naive Bayes, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest and Decision Trees offer similar level of accuracy in comment classification. Further, we have also evaluated 3 multilingual sub-types of the novel NLP language model, BERT and compared its performance to the conventional machine learning classification techniques. XLM was the top-performing BERT model with an accuracy of 67.31. Random Forest with Term Frequency Vectorizer was the best the top-performing model out of all the traditional classification models with an accuracy of 63.59.

[63]  arXiv:2007.04250 (cross-list from cs.LG) [pdf, other]
Title: A Benchmark of Medical Out of Distribution Detection
Comments: Oral presentation, Uncertainty & Robustness in Deep Learning workshop at ICML. 4 pages, 9 pages total
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

There is a rise in the use of deep learning for automated medical diagnosis, most notably in medical imaging. Such an automated system uses a set of images from a patient to diagnose whether they have a disease. However, systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be filtered out by an Out-of-Distribution Detection (OoDD) method prior to diagnosis. This paper benchmarks popular OoDD methods in three domains of medical imaging: chest x-rays, fundus images, and histology slides. Our experiments show that despite methods yielding good results on some types of out-of-distribution samples, they fail to recognize images close to the training distribution.

[64]  arXiv:2007.04275 (cross-list from cs.LG) [pdf, other]
Title: Graph Neural Networks for the Prediction of Substrate-Specific Organic Reaction Conditions
Comments: 23 pages, 10 tables, 13 figures, to appear in the ICML 2020 Workshop on Graph Representation Learning and Beyond (GRLB)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions. To do so, we prepared a dataset collection of four ubiquitous reactions from the organic chemistry literature. We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions. We find that models are able to identify specific graph features that affect reaction conditions and lead to accurate predictions. The results herein show great promise in advancing molecular machine learning.

[65]  arXiv:2007.04285 (cross-list from stat.ME) [pdf, ps, other]
Title: Deep Fiducial Inference
Authors: Gang Li, Jan Hannig
Comments: 20 pages, 7 figures
Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)

Since the mid-2000s, there has been a resurrection of interest in modern modifications of fiducial inference. To date, the main computational tool to extract a generalized fiducial distribution is Markov chain Monte Carlo (MCMC). We propose an alternative way of computing a generalized fiducial distribution that could be used in complex situations. In particular, to overcome the difficulty when the unnormalized fiducial density (needed for MCMC), we design a fiducial autoencoder (FAE). The fitted autoencoder is used to generate generalized fiducial samples of the unknown parameters. To increase accuracy, we then apply an approximate fiducial computation (AFC) algorithm, by rejecting samples that when plugged into a decoder do not replicate the observed data well enough. Our numerical experiments show the effectiveness of our FAE-based inverse solution and the excellent coverage performance of the AFC corrected FAE solution.

[66]  arXiv:2007.04309 (cross-list from cs.LG) [pdf, other]
Title: Self-Supervised Policy Adaptation during Deployment
Comments: Project page: this https URL , Code: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)

In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. Our work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards. While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements. Empirical evaluations are performed on diverse environments from DeepMind Control suite and ViZDoom. Our method improves generalization in 25 out of 30 environments across various tasks, and outperforms domain randomization on a majority of environments.

Replacements for Thu, 9 Jul 20

[67]  arXiv:1805.11845 (replaced) [pdf, other]
Title: An Information-Theoretic Analysis for Thompson Sampling with Many Actions
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
[68]  arXiv:2002.01600 (replaced) [pdf, other]
Title: Linearly Constrained Neural Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[69]  arXiv:2002.11537 (replaced) [pdf, other]
Title: ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[70]  arXiv:2004.11734 (replaced) [pdf, ps, other]
Title: Robust subgaussian estimation with VC-dimension
Authors: Jules Depersin
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[71]  arXiv:2006.04295 (replaced) [pdf, other]
Title: Efficient MCMC Sampling for Bayesian Matrix Factorization by Breaking Posterior Symmetries
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[72]  arXiv:2006.10921 (replaced) [pdf, other]
Title: Meta Learning in the Continuous Time Limit
Comments: 25 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[73]  arXiv:2007.00810 (replaced) [pdf, other]
Title: On Linear Identifiability of Learned Representations
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[74]  arXiv:2007.02794 (replaced) [pdf]
Title: Towards Efficient Connected and Automated Driving System via Multi-agent Graph Reinforcement Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[75]  arXiv:2007.03502 (replaced) [pdf, ps, other]
Title: srMO-BO-3GP: A sequential regularized multi-objective constrained Bayesian optimization for design applications
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[76]  arXiv:1810.04472 (replaced) [src]
Title: Domain Confusion with Self Ensembling for Unsupervised Adaptation
Comments: The expression is ambiguous, which is not convenient for readers to understand, and in today's view, the conclusion of the paper is of little significance, so it is no longer open
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[77]  arXiv:1811.02141 (replaced) [pdf, other]
Title: Extended Isolation Forest
Comments: 12 pages; 21 figures, Published. Open source code in this https URL
Subjects: Machine Learning (cs.LG); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (stat.ML)
[78]  arXiv:1812.11954 (replaced) [pdf, other]
Title: Exact Cluster Recovery via Classical Multidimensional Scaling
Comments: 42 pages in cluding appendix
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
[79]  arXiv:1901.11141 (replaced) [pdf, other]
Title: On the Consistency of Top-k Surrogate Losses
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[80]  arXiv:1905.04497 (replaced) [pdf, other]
Title: Stability Properties of Graph Neural Networks
Comments: Submitted to IEEE Transactions on Signal Processing
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[81]  arXiv:1906.01687 (replaced) [pdf, other]
Title: Stochastic Gradients for Large-Scale Tensor Decomposition
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
[82]  arXiv:1907.01343 (replaced) [pdf, other]
Title: Low-Rank Subspace Override for Unsupervised Domain Adaptation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[83]  arXiv:1907.01739 (replaced) [pdf, other]
Title: Solving Partial Assignment Problems using Random Clique Complexes
Comments: Accepted as a long talk at ICML 2018
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[84]  arXiv:1908.00709 (replaced) [pdf, other]
Title: AutoML: A Survey of the State-of-the-Art
Comments: automated machine learning (AutoML), Submitted to Knowledge Based Systems for review
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[85]  arXiv:1909.02261 (replaced) [pdf, other]
Title: Population-based Gradient Descent Weight Learning for Graph Coloring Problems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[86]  arXiv:1909.13188 (replaced) [pdf, other]
Title: Understanding and Stabilizing GANs' Training Dynamics with Control Theory
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[87]  arXiv:1910.08828 (replaced) [pdf, ps, other]
Title: Dictionary Learning with Almost Sure Error Constraints
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
[88]  arXiv:1912.04981 (replaced) [pdf, other]
Title: Phase Retrieval Using Conditional Generative Adversarial Networks
Comments: Accepted at the 25th International Conference on Pattern Recognition 2020 (ICPR)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[89]  arXiv:2002.02730 (replaced) [pdf, other]
Title: Machine Unlearning: Linear Filtration for Logit-based Classifiers
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[90]  arXiv:2002.03704 (replaced) [pdf, other]
Title: Liberty or Depth: Deep Bayesian Neural Nets Do Not Need Complex Weight Posterior Approximations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[91]  arXiv:2002.04131 (replaced) [pdf, other]
Title: Q-Learning Algorithm for Mean-Field Controls, with Convergence and Complexity Analysis
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[92]  arXiv:2002.08423 (replaced) [pdf, other]
Title: PrivacyFL: A simulator for privacy-preserving and secure federated learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[93]  arXiv:2002.09089 (replaced) [pdf, other]
Title: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
Comments: In proceedings ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[94]  arXiv:2002.09928 (replaced) [pdf, other]
Title: Predictive Sampling with Forecasting Autoregressive Models
Comments: Accepted at the 37th International Conference on Machine Learning (ICML 2020). 14 pages, 13 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[95]  arXiv:2002.10301 (replaced) [pdf, other]
Title: Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning
Comments: 33 pages, 4 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
[96]  arXiv:2003.01794 (replaced) [pdf, other]
Title: Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[97]  arXiv:2003.01820 (replaced) [pdf, other]
Title: Robust Market Making via Adversarial Reinforcement Learning
Comments: 7 pages, 3 figures; IJCAI-PRICAI '20 Conference Proceedings
Subjects: Trading and Market Microstructure (q-fin.TR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[98]  arXiv:2003.07521 (replaced) [pdf, other]
Title: Energy-Based Processes for Exchangeable Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[99]  arXiv:2004.03329 (replaced) [pdf, other]
Title: MedDialog: Two Large-scale Medical Dialogue Datasets
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
[100]  arXiv:2004.09691 (replaced) [pdf, other]
Title: A Data and Compute Efficient Design for Limited-Resources Deep Learning
Comments: Accepted for poster presentation at the Practical Machine Learning for Developing Countries (PML4DC) workshop, ICLR 2020
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[101]  arXiv:2005.11418 (replaced) [pdf, other]
Title: FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[102]  arXiv:2005.13625 (replaced) [pdf, other]
Title: Parameter Sharing is Surprisingly Useful for Multi-Agent Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[103]  arXiv:2006.06889 (replaced) [pdf, ps, other]
Title: Fast Objective and Duality Gap Convergence for Non-convex Strongly-concave Min-max Problems
Comments: Zhishuai Guo, Zhuoning Yuan and Yan Yan contributed equally to this work
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[104]  arXiv:2006.08386 (replaced) [pdf, other]
Title: COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations
Comments: 8 pages, 1 figure, workshop on Self-supervision in Audio and Speech at the 37th International Conference on Machine Learning (ICML), 2020, Vienna, Austria
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[105]  arXiv:2006.09437 (replaced) [pdf, other]
Title: A Study of Compositional Generalization in Neural Models
Comments: 28 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[106]  arXiv:2006.15396 (replaced) [pdf, other]
Title: Approximating Posterior Predictive Distributions by Averaging Output From Many Particle Filters
Authors: Taylor R. Brown
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[107]  arXiv:2006.16811 (replaced) [pdf, other]
Title: Path Integral Based Convolution and Pooling for Graph Neural Networks
Comments: 15 pages, 4 figures, 6 tables. arXiv admin note: text overlap with arXiv:1904.10996
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Networking and Internet Architecture (cs.NI); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
[108]  arXiv:2007.01888 (replaced) [pdf, other]
Title: Inference on the change point in high dimensional time series models via plug in least square
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[109]  arXiv:2007.02126 (replaced) [pdf, other]
Title: Deep Graph Random Process for Relational-Thinking-Based Speech Recognition
Comments: Accepted at ICML 2020
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[110]  arXiv:2007.02334 (replaced) [pdf, other]
Title: Multi-Manifold Learning for Large-scale Targeted Advertising System
Comments: Accepted at AdKDD 2020
Journal-ref: AdKDD 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[111]  arXiv:2007.02500 (replaced) [pdf, other]
Title: Deep Learning for Anomaly Detection: A Review
Comments: Survey paper, 36 pages, 180 references, 2 figures, 3 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[112]  arXiv:2007.02520 (replaced) [pdf, other]
Title: Explaining Fast Improvement in Online Policy Optimization
Comments: 20 pages, 2 figures; typos corrected
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
[113]  arXiv:2007.02719 (replaced) [pdf, other]
Title: Splintering with distributions: A stochastic decoy scheme for private computation
Comments: 28 pages, 6 figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[114]  arXiv:2007.02777 (replaced) [pdf, other]
Title: Parametric machines: a fresh approach to architecture search
Comments: 31 pages, 4 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[115]  arXiv:2007.03117 (replaced) [pdf, ps, other]
Title: Multi-Fidelity Bayesian Optimization via Deep Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[116]  arXiv:2007.03629 (replaced) [pdf, other]
Title: Strong Generalization and Efficiency in Neural Programs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[ total of 116 entries: 1-116 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2007, contact, help  (Access key information)