We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 157 entries: 1-157 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 20 Feb 20

[1]  arXiv:2002.07836 [pdf, other]
Title: Multi-Step Model-Agnostic Meta-Learning: Convergence and Improved Algorithms
Comments: 67 pages, 8 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness. However, the convergence of the general multi-step MAML still remains unexplored. In this paper, we develop a new theoretical framework, under which we characterize the convergence rate and the computational complexity of multi-step MAML. Our results indicate that although the estimation bias and variance of the stochastic meta gradient involve exponential factors of $N$ (the number of the inner-stage gradient updates), MAML still attains the convergence with complexity increasing only linearly with $N$ with a properly chosen inner stepsize. We then take a further step to develop a more efficient Hessian-free MAML. We first show that the existing zeroth-order Hessian estimator contains a constant-level estimation error so that the MAML algorithm can perform unstably. To address this issue, we propose a novel Hessian estimator via a gradient-based Gaussian smoothing method, and show that it achieves a much smaller estimation bias and variance, and the resulting algorithm achieves the same performance guarantee as the original MAML under mild conditions. Our experiments validate our theory and demonstrate the effectiveness of the proposed Hessian estimator.

[2]  arXiv:2002.07839 [pdf, other]
Title: Is Local SGD Better than Minibatch SGD?
Comments: 29 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.

[3]  arXiv:2002.07863 [pdf, other]
Title: Learning Similarity Metrics for Numerical Simulations
Comments: Main paper: 8 pages, Appendix: 20 pages. Further information at this https URL
Subjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Fluid Dynamics (physics.flu-dyn); Machine Learning (stat.ML)

We propose a neural network-based approach that computes a stable and generalizing metric (LSiM), to compare field data from a variety of numerical simulation sources. Our method employs a Siamese network architecture that is motivated by the mathematical properties of a metric. We leverage a controllable data generation setup with partial differential equation (PDE) solvers to create increasingly different outputs from a reference simulation in a controlled environment. A central component of our learned metric is a specialized loss function that introduces knowledge about the correlation between single data samples into the training process. To demonstrate that the proposed approach outperforms existing simple metrics for vector spaces and other learned, image-based metrics, we evaluate the different methods on a large range of test data. Additionally, we analyze benefits for generalization and the impact of an adjustable training data difficulty. The robustness of LSiM is demonstrated via an evaluation on three real-world data sets.

[4]  arXiv:2002.07867 [pdf, other]
Title: Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A recent line of research has provided convergence guarantees for gradient descent algorithms in the excessive over-parameterization regime where the widths of all the hidden layers are required to be polynomially large in the number of training samples. However, the widths of practical deep networks are often only large in the first layer(s) and then start to decrease towards the output layer. This raises an interesting open question whether similar results also hold under this empirically relevant setting. Existing theoretical insights suggest that the loss surface of this class of networks is well-behaved, but these results usually do not provide direct algorithmic guarantees for optimization. In this paper, we close the gap by showing that one wide layer followed by pyramidal deep network topology suffices for gradient descent to find a global minimum with a geometric rate. Our proof is based on a weak form of Polyak-Lojasiewicz inequality which holds for deep pyramidal networks in the manifold of full-rank weight matrices.

[5]  arXiv:2002.07891 [pdf, other]
Title: Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent
Comments: accepted by AAAI 2020
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.

[6]  arXiv:2002.07898 [pdf, other]
Title: Deep Transform and Metric Learning Network: Wedding Deep Dictionary Learning and Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

On account of its many successes in inference tasks and denoising applications, Dictionary Learning (DL) and its related sparse optimization problems have garnered a lot of research interest. While most solutions have focused on single layer dictionaries, the improved recently proposed Deep DL (DDL) methods have also fallen short on a number of issues. We propose herein, a novel DDL approach where each DL layer can be formulated as a combination of one linear layer and a Recurrent Neural Network (RNN). The RNN is shown to flexibly account for the layer-associated and learned metric. Our proposed work unveils new insights into Neural Networks and DDL and provides a new, efficient and competitive approach to jointly learn a deep transform and a metric for inference applications. Extensive experiments are carried out to demonstrate that the proposed method can not only outperform existing DDL but also state-of-the-art generic CNNs.

[7]  arXiv:2002.07905 [pdf, other]
Title: Empirical Policy Evaluation with Supergraphs
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We devise and analyze algorithms for the empirical policy evaluation problem in reinforcement learning. Our algorithms explore backward from high-cost states to find high-value ones, in contrast to forward approaches that work forward from all states. While several papers have demonstrated the utility of backward exploration empirically, we conduct rigorous analyses which show that our algorithms can reduce average-case sample complexity from $O(S \log S)$ to as low as $O(\log S)$.

[8]  arXiv:2002.07906 [pdf, other]
Title: CAUSE: Learning Granger Causality from Event Sequences using Attribution Methods
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of learning Granger causality between event types from asynchronous, interdependent, multi-type event sequences. Existing work suffers from either limited model flexibility or poor model explainability and thus fails to uncover Granger causality across a wide variety of event sequences with diverse event interdependency. To address these weaknesses, we propose CAUSE (Causality from AttribUtions on Sequence of Events), a novel framework for the studied task. The key idea of CAUSE is to first implicitly capture the underlying event interdependency by fitting a neural point process, and then extract from the process a Granger causality statistic using an axiomatic attribution method. Across multiple datasets riddled with diverse event interdependency, we demonstrate that CAUSE achieves superior performance on correctly inferring the inter-type Granger causality over a range of state-of-the-art methods.

[9]  arXiv:2002.07911 [pdf, other]
Title: Generating Automatic Curricula via Self-Supervised Active Domain Randomization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)

Goal-directed Reinforcement Learning (RL) traditionally considers an agent interacting with an environment, prescribing a real-valued reward to an agent proportional to the completion of some goal. Goal-directed RL has seen large gains in sample efficiency, due to the ease of reusing or generating new experience by proposing goals. In this work, we build on the framework of self-play, allowing an agent to interact with itself in order to make progress on some unknown task. We use Active Domain Randomization and self-play to create a novel, coupled environment-goal curriculum, where agents learn through progressively more difficult tasks and environment variations. Our method, Self-Supervised Active Domain Randomization (SS-ADR), generates a growing curriculum, encouraging the agent to try tasks that are just outside of its current capabilities, while building a domain-randomization curriculum that enables state-of-the-art results on various sim2real transfer tasks. Our results show that a curriculum of co-evolving the environment difficulty along with the difficulty of goals set in each environment provides practical benefits in the goal-directed tasks tested.

[10]  arXiv:2002.07916 [pdf, other]
Title: Information Condensing Active Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce Information Condensing Active Learning (ICAL), a batch mode model agnostic Active Learning (AL) method targeted at Deep Bayesian Active Learning that focuses on acquiring labels for points which have as much information as possible about the still unacquired points. ICAL uses the Hilbert Schmidt Independence Criterion (HSIC) to measure the strength of the dependency between a candidate batch of points and the unlabeled set. We develop key optimizations that allow us to scale our method to large unlabeled sets. We show significant improvements in terms of model accuracy and negative log likelihood (NLL) on several image datasets compared to state of the art batch mode AL methods for deep learning.

[11]  arXiv:2002.07920 [pdf, other]
Title: Block Switching: A Stochastic Approach for Deep Learning Security
Comments: Accepted by AdvML19: Workshop on Adversarial Learning Methods for Machine Learning and Data Mining at KDD, Anchorage, Alaska, USA, August 5th, 2019, 5 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Recent study of adversarial attacks has revealed the vulnerability of modern deep learning models. That is, subtly crafted perturbations of the input can make a trained network with high accuracy produce arbitrary incorrect predictions, while maintain imperceptible to human vision system. In this paper, we introduce Block Switching (BS), a defense strategy against adversarial attacks based on stochasticity. BS replaces a block of model layers with multiple parallel channels, and the active channel is randomly assigned in the run time hence unpredictable to the adversary. We show empirically that BS leads to a more dispersed input gradient distribution and superior defense effectiveness compared with other stochastic defenses such as stochastic activation pruning (SAP). Compared to other defenses, BS is also characterized by the following features: (i) BS causes less test accuracy drop; (ii) BS is attack-independent and (iii) BS is compatible with other defenses and can be used jointly with others.

[12]  arXiv:2002.07922 [pdf, other]
Title: Short-Term Traffic Flow Prediction Using Variational LSTM Networks
Comments: 18 pages, 13 figures
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP)

Traffic flow characteristics are one of the most critical decision-making and traffic policing factors in a region. Awareness of the predicted status of the traffic flow has prime importance in traffic management and traffic information divisions. The purpose of this research is to suggest a forecasting model for traffic flow by using deep learning techniques based on historical data in the Intelligent Transportation Systems area. The historical data collected from the Caltrans Performance Measurement Systems (PeMS) for six months in 2019. The proposed prediction model is a Variational Long Short-Term Memory Encoder in brief VLSTM-E try to estimate the flow accurately in contrast to other conventional methods. VLSTM-E can provide more reliable short-term traffic flow by considering the distribution and missing values.

[13]  arXiv:2002.07933 [pdf, other]
Title: Improving Generalization by Controlling Label-Noise Information in Neural Network Weights
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In the presence of noisy or incorrect labels, neural networks have the undesirable tendency to memorize information about the noise. Standard regularization techniques such as dropout, weight decay or data augmentation sometimes help, but do not prevent this behavior. If one considers neural network weights as random variables that depend on the data and stochasticity of training, the amount of memorized information can be quantified with the Shannon mutual information between weights and the vector of all training labels given inputs, $I(w : \mathbf{y} \mid \mathbf{x})$. We show that for any training algorithm, low values of this term correspond to reduction in memorization of label-noise and better generalization bounds. To obtain these low values, we propose training algorithms that employ an auxiliary network that predicts gradients in the final layers of a classifier without accessing labels. We illustrate the effectiveness of our approach on versions of MNIST, CIFAR-10, and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels.

[14]  arXiv:2002.07942 [pdf, other]
Title: Source Separation with Deep Generative Priors
Comments: 18 pages, 15 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Despite substantial progress in signal source separation, results for richly structured data continue to contain perceptible artifacts. In contrast, recent deep generative models can produce authentic samples in a variety of domains that are indistinguishable from samples of the data distribution. This paper introduces a Bayesian approach to source separation that uses generative models as priors over the components of a mixture of sources, and Langevin dynamics to sample from the posterior distribution of sources given a mixture. This decouples the source separation problem from generative modeling, enabling us to directly use cutting-edge generative models as priors. The method achieves state-of-the-art performance for MNIST digit separation. We introduce new methodology for evaluating separation quality on richer datasets, providing quantitative evaluation of separation results on CIFAR-10. We also provide qualitative results on LSUN.

[15]  arXiv:2002.07948 [pdf, other]
Title: Personalized Federated Learning: A Meta-Learning Approach
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

The goal of federated learning is to design algorithms in which several agents communicate with a central node, in a privacy-protecting manner, to minimize the average of their loss functions. In this approach, each node not only shares the required computational budget but also has access to a larger data set, which improves the quality of the resulting model. However, this method only develops a common output for all the agents, and therefore, does not adapt the model to each user data. This is an important missing feature especially given the heterogeneity of the underlying data distribution for various agents. In this paper, we study a personalized variant of the federated learning in which our goal is to find a shared initial model in a distributed manner that can be slightly updated by either a current or a new user by performing one or a few steps of gradient descent with respect to its own loss function. This approach keeps all the benefits of the federated learning architecture while leading to a more personalized model for each user. We show this problem can be studied within the Model-Agnostic Meta-Learning (MAML) framework. Inspired by this connection, we propose a personalized variant of the well-known Federated Averaging algorithm and evaluate its performance in terms of gradient norm for non-convex loss functions. Further, we characterize how this performance is affected by the closeness of underlying distributions of user data, measured in terms of distribution distances such as Total Variation and 1-Wasserstein metric.

[16]  arXiv:2002.07956 [pdf, other]
Title: Curriculum in Gradient-Based Meta-Reinforcement Learning
Comments: 11 pages, 10 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Gradient-based meta-learners such as Model-Agnostic Meta-Learning (MAML) have shown strong few-shot performance in supervised and reinforcement learning settings. However, specifically in the case of meta-reinforcement learning (meta-RL), we can show that gradient-based meta-learners are sensitive to task distributions. With the wrong curriculum, agents suffer the effects of meta-overfitting, shallow adaptation, and adaptation instability. In this work, we begin by highlighting intriguing failure cases of gradient-based meta-RL and show that task distributions can wildly affect algorithmic outputs, stability, and performance. To address this problem, we leverage insights from recent literature on domain randomization and propose meta Active Domain Randomization (meta-ADR), which learns a curriculum of tasks for gradient-based meta-RL in a similar as ADR does for sim2real transfer. We show that this approach induces more stable policies on a variety of simulated locomotion and navigation tasks. We assess in- and out-of-distribution generalization and find that the learned task distributions, even in an unstructured task space, greatly improve the adaptation performance of MAML. Finally, we motivate the need for better benchmarking in meta-RL that prioritizes \textit{generalization} over single-task adaption performance.

[17]  arXiv:2002.07962 [pdf, other]
Title: Inductive Representation Learning on Temporal Graphs
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Inductive representation learning on temporal graphs is an important step toward salable machine learning on real-world dynamic networks. The evolving nature of temporal dynamic graphs requires handling new nodes as well as capturing temporal patterns. The node embeddings, which are now functions of time, should represent both the static node features and the evolving topological structures. Moreover, node and topological features can be temporal as well, whose patterns the node embeddings should also capture. We propose the temporal graph attention (TGAT) layer to efficiently aggregate temporal-topological neighborhood features as well as to learn the time-feature interactions. For TGAT, we use the self-attention mechanism as building block and develop a novel functional time encoding technique based on the classical Bochner's theorem from harmonic analysis. By stacking TGAT layers, the network recognizes the node embeddings as functions of time and is able to inductively infer embeddings for both new and observed nodes as the graph evolves. The proposed approach handles both node classification and link prediction task, and can be naturally extended to include the temporal edge features. We evaluate our method with transductive and inductive tasks under temporal settings with two benchmark and one industrial dataset. Our TGAT model compares favorably to state-of-the-art baselines as well as the previous temporal graph embedding approaches.

[18]  arXiv:2002.07965 [pdf, other]
Title: Being Bayesian about Categorical Probability
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Neural networks utilize the softmax as a building block in classification tasks, which contains an overconfidence problem and lacks an uncertainty representation ability. As a Bayesian alternative to the softmax, we consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label, which provides consistent gains in generalization performance in multiple challenging tasks. The proposed method inherits advantages of Bayesian approaches that achieve better uncertainty estimation and model calibration. Our method can be implemented as a plug-and-play loss function with negligible computational overhead compared to the softmax with the cross-entropy loss function.

[19]  arXiv:2002.07971 [pdf, other]
Title: Gradient Boosting Neural Networks: GrowNet
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A novel gradient boosting framework is proposed where shallow neural networks are employed as "weak learners". General loss functions are considered under this unified framework with specific examples presented for classification, regression and learning to rank. A fully corrective step is incorporated to remedy the pitfall of greedy function approximation of classic gradient boosting decision tree. The proposed model rendered state-of-the-art results in all three tasks on multiple datasets. An ablation study is performed to shed light on the effect of each model components and model hyperparameters.

[20]  arXiv:2002.07994 [pdf, other]
Title: Best-item Learning in Random Utility Models with Subset Choices
Comments: Accepted to 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based on their relative win/loss empirical counts, and can be bounded as a function of the noise distribution alone. We give a learning algorithm for general RUMs, based on pairwise relative counts of items and hierarchical elimination, along with a new PAC sample complexity guarantee of $O(\frac{n}{c^2\epsilon^2} \log \frac{k}{\delta})$ rounds to identify an $\epsilon$-optimal item with confidence $1-\delta$, when the worst case pairwise advantage in the RUM has sensitivity at least $c$ to the parameter gaps of items. Fundamental lower bounds on PAC sample complexity show that this is near-optimal in terms of its dependence on $n,k$ and $c$.

[21]  arXiv:2002.08000 [pdf, other]
Title: Action-Manipulation Attacks Against Stochastic Bandits: Attacks and Defense
Comments: 13 pages, 7 figures, submitted to IEEE Transaction on Signal Processing
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)

Due to the broad range of applications of stochastic multi-armed bandit model, understanding the effects of adversarial attacks and designing bandit algorithms robust to attacks are essential for the safe applications of this model. In this paper, we introduce a new class of attack named action-manipulation attack. In this attack, an adversary can change the action signal selected by the user. We show that without knowledge of mean rewards of arms, our proposed attack can manipulate Upper Confidence Bound (UCB) algorithm, a widely used bandit algorithm, into pulling a target arm very frequently by spending only logarithmic cost. To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks when an upper bound for the total attack cost is given. We prove that our algorithm has a pseudo-regret upper bounded by $\mathcal{O}(\max\{\log T,A\})$, where $T$ is the total number of rounds and $A$ is the upper bound of the total attack cost.

[22]  arXiv:2002.08012 [pdf, other]
Title: Indirect Adversarial Attacks via Poisoning Neighbors for Graph Convolutional Networks
Comments: Accepted in IEEE BigData 2019
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Graph convolutional neural networks, which learn aggregations over neighbor nodes, have achieved great performance in node classification tasks. However, recent studies reported that such graph convolutional node classifier can be deceived by adversarial perturbations on graphs. Abusing graph convolutions, a node's classification result can be influenced by poisoning its neighbors. Given an attributed graph and a node classifier, how can we evaluate robustness against such indirect adversarial attacks? Can we generate strong adversarial perturbations which are effective on not only one-hop neighbors, but more far from the target? In this paper, we demonstrate that the node classifier can be deceived with high-confidence by poisoning just a single node even two-hops or more far from the target. Towards achieving the attack, we propose a new approach which searches smaller perturbations on just a single node far from the target. In our experiments, our proposed method shows 99% attack success rate within two-hops from the target in two datasets. We also demonstrate that m-layer graph convolutional neural networks have chance to be deceived by our indirect attack within m-hop neighbors. The proposed attack can be used as a benchmark in future defense attempts to develop graph convolutional neural networks with having adversary robustness.

[23]  arXiv:2002.08032 [pdf, other]
Title: A Fixed point view: A Model-Based Clustering Framework
Comments: 10 pages, 2 figures
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

With the inflation of the data, clustering analysis, as a branch of unsupervised learning, lacks unified understanding and application of its mathematical law. Based on the view of fixed point, this paper restates the model-based clustering and proposes a unified clustering framework. In order to find fixed points as cluster centers, the framework iteratively constructs the contraction map, which strongly reveals the convergence mechanism and interconnections among algorithms. By specifying a contraction map, Gaussian mixture model (GMM) can be mapped to the framework as an application. We hope the fixed point framework will help the design of future clustering algorithms.

[24]  arXiv:2002.08037 [pdf, other]
Title: Efficient Deep Reinforcement Learning through Policy Transfer
Comments: Accepted by AAMAS'2020 as an EXTENDED ABSTRACT
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing transfer approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL by taking advantage of this idea. Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as the option learning problem. PTF can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.

[25]  arXiv:2002.08041 [pdf, other]
Title: Enlarging Discriminative Power by Adding an Extra Class in Unsupervised Domain Adaptation
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

In this paper, we study the problem of unsupervised domain adaptation that aims at obtaining a prediction model for the target domain using labeled data from the source domain and unlabeled data from the target domain. There exists an array of recent research based on the idea of extracting features that are not only invariant for both domains but also provide high discriminative power for the target domain. In this paper, we propose an idea of empowering the discriminativeness: Adding a new, artificial class and training the model on the data together with the GAN-generated samples of the new class. The trained model based on the new class samples is capable of extracting the features that are more discriminative by repositioning data of current classes in the target domain and therefore drawing the decision boundaries more effectively. Our idea is highly generic so that it is compatible with many existing methods such as DANN, VADA, and DIRT-T. We conduct various experiments for the standard data commonly used for the evaluation of unsupervised domain adaptations and demonstrate that our algorithm achieves the SOTA performance for many scenarios.

[26]  arXiv:2002.08053 [pdf, other]
Title: Progressive Identification of True Labels for Partial-Label Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Partial-label learning is one of the important weakly supervised learning problems, where each training example is equipped with a set of candidate labels that contains the true label. Most existing methods elaborately designed learning objectives as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. The goal of this paper is to propose a novel framework of partial-label learning without implicit assumptions on the model or optimization algorithm. More specifically, we propose a general estimator of the classification risk, theoretically analyze the classifier-consistency, and establish an estimation error bound. We then explore a progressive identification method for approximately minimizing the proposed risk estimator, where the update of the model and identification of true labels are conducted in a seamless manner. The resulting algorithm is model-independent and loss-independent, and compatible with stochastic optimization. Thorough experiments demonstrate it sets the new state of the art.

[27]  arXiv:2002.08056 [pdf, other]
Title: The Geometry of Sign Gradient Descent
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training. Furthermore, they are closely connected to so-called adaptive gradient methods like Adam. Recent works on signSGD have used a non-standard "separable smoothness" assumption, whereas some older works study sign gradient descent as steepest descent with respect to the $\ell_\infty$-norm. In this work, we unify these existing results by showing a close connection between separable smoothness and $\ell_\infty$-smoothness and argue that the latter is the weaker and more natural assumption. We then proceed to study the smoothness constant with respect to the $\ell_\infty$-norm and thereby isolate geometric properties of the objective function which affect the performance of sign-based methods. In short, we find sign-based methods to be preferable over gradient descent if (i) the Hessian is to some degree concentrated on its diagonal, and (ii) its maximal eigenvalue is much larger than the average eigenvalue. Both properties are common in deep networks.

[28]  arXiv:2002.08071 [pdf, other]
Title: Dissecting Neural ODEs
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Continuous deep learning architectures have recently re-emerged as variants of Neural Ordinary Differential Equations (Neural ODEs). The infinite-depth approach offered by these models theoretically bridges the gap between deep learning and dynamical systems; however, deciphering their inner working is still an open challenge and most of their applications are currently limited to the inclusion as generic black-box modules. In this work, we "open the box" and offer a system-theoretic perspective, including state augmentation strategies and robustness, with the aim of clarifying the influence of several design choices on the underlying dynamics. We also introduce novel architectures: among them, a Galerkin-inspired depth-varying parameter model and neural ODEs with data-controlled vector fields.

[29]  arXiv:2002.08095 [pdf, ps, other]
Title: Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
Authors: Asaf Cassel (1), Alon Cohen (2), Tomer Koren (1) ((1) School of Computer Science, Tel Aviv University, (2) Google Research, Tel Aviv)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown. Recent results in this setting have demonstrated efficient learning algorithms with regret growing with the square root of the number of decision steps. We present new efficient algorithms that achieve, perhaps surprisingly, regret that scales only (poly)logarithmically with the number of steps in two scenarios: when only the state transition matrix $A$ is unknown, and when only the state-action transition matrix $B$ is unknown and the optimal policy satisfies a certain non-degeneracy condition. On the other hand, we give a lower bound that shows that when the latter condition is violated, square root regret is unavoidable.

[30]  arXiv:2002.08104 [pdf, other]
Title: Neural Networks on Random Graphs
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We performed a massive evaluation of neural networks with architectures corresponding to random graphs of various types. Apart from the classical random graph families including random, scale-free and small world graphs, we introduced a novel and flexible algorithm for directly generating random directed acyclic graphs (DAG) and studied a class of graphs derived from functional resting state fMRI networks. A majority of the best performing networks were indeed in these new families. We also proposed a general procedure for turning a graph into a DAG necessary for a feed-forward neural network. We investigated various structural and numerical properties of the graphs in relation to neural network test accuracy. Since none of the classical numerical graph invariants by itself seems to allow to single out the best networks, we introduced new numerical characteristics that selected a set of quasi-1-dimensional graphs, which were the majority among the best performing networks.

[31]  arXiv:2002.08111 [pdf, other]
Title: Hierarchical Quantized Autoencoders
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Despite progress in training neural networks for lossy image compression, current approaches fail to maintain both perceptual quality and high-level features at very low bitrates. Encouraged by recent success in learning discrete representations with Vector Quantized Variational AutoEncoders (VQ-VAEs), we motivate the use of a hierarchy of VQ-VAEs to attain high factors of compression. We show that the combination of quantization and hierarchical latent structure aids likelihood-based image compression. This leads us to introduce a more probabilistic framing of the VQ-VAE, of which previous work is a limiting case. Our hierarchy produces a Markovian series of latent variables that reconstruct high-quality images which retain semantically meaningful features. These latents can then be further used to generate realistic samples. We provide qualitative and quantitative evaluations of reconstructions and samples on the CelebA and MNIST datasets.

[32]  arXiv:2002.08118 [pdf, other]
Title: Randomized Smoothing of All Shapes and Sizes
Comments: 9 pages main text, 40 pages total
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved state-of-the-art provable robustness against $\ell_2$ perturbations. Soon after, a number of works devised new randomized smoothing schemes for other metrics, such as $\ell_1$ or $\ell_\infty$; however, for each geometry, substantial effort was needed to derive new robustness guarantees. This begs the question: can we find a general theory for randomized smoothing?
In this work we propose a novel framework for devising and analyzing randomized smoothing schemes, and validate its effectiveness in practice. Our theoretical contributions are as follows: (1) We show that for an appropriate notion of "optimal", the optimal smoothing distributions for any "nice" norm have level sets given by the *Wulff Crystal* of that norm. (2) We propose two novel and complementary methods for deriving provably robust radii for any smoothing distribution. Finally, (3) we show fundamental limits to current randomized smoothing techniques via the theory of *Banach space cotypes*. By combining (1) and (2), we significantly improve the state-of-the-art certified accuracy in $\ell_1$ on standard datasets. On the other hand, using (3), we show that, without more information than label statistics under random input perturbations, randomized smoothing cannot achieve nontrivial certified accuracy against perturbations of $\ell_\infty$-norm $\Omega(1/\sqrt d)$, when the input dimension $d$ is large. We provide code in github.com/tonyduan/rs4a.

[33]  arXiv:2002.08125 [pdf, other]
Title: Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Deep Learning based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, introspection methods have been proposed. Adapting such techniques from computer vision to speech recognition is not straight-forward, because speech data is more complex and less interpretable than image data. In this work, we introduce Gradient-adjusted Neuron Activation Profiles (GradNAPs) as means to interpret features and representations in Deep Neural Networks. GradNAPs are characteristic responses of ANNs to particular groups of inputs, which incorporate the relevance of neurons for prediction. We show how to utilize GradNAPs to gain insight about how data is processed in ANNs. This includes different ways of visualizing features and clustering of GradNAPs to compare embeddings of different groups of inputs in any layer of a given network. We demonstrate our proposed techniques using a fully-convolutional ASR model.

[34]  arXiv:2002.08165 [pdf, other]
Title: Using Hindsight to Anchor Past Knowledge in Continual Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In continual learning, the learner faces a stream of data whose distribution changes over time. Modern neural networks are known to suffer under this setting, as they quickly forget previously acquired knowledge. To address such catastrophic forgetting, many continual learning methods implement different types of experience replay, re-learning on past data stored in a small buffer known as episodic memory. In this work, we complement experience replay with a new objective that we call anchoring, where the learner uses bilevel optimization to update its knowledge on the current task, while keeping intact the predictions on some anchor points of past tasks. These anchor points are learned using gradient-based optimization to maximize forgetting, which is approximated by fine-tuning the currently trained model on the episodic memory of past tasks. Experiments on several supervised learning benchmarks for continual learning demonstrate that our approach improves the standard experience replay in terms of both accuracy and forgetting metrics and for various sizes of episodic memories.

[35]  arXiv:2002.08196 [pdf, other]
Title: Federated Learning in the Sky: Joint Power Allocation and Scheduling with UAV Swarms
Comments: 8 pages, 4 figures
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Robotics (cs.RO); Signal Processing (eess.SP); Machine Learning (stat.ML)

Unmanned aerial vehicle (UAV) swarms must exploit machine learning (ML) in order to execute various tasks ranging from coordinated trajectory planning to cooperative target recognition. However, due to the lack of continuous connections between the UAV swarm and ground base stations (BSs), using centralized ML will be challenging, particularly when dealing with a large volume of data. In this paper, a novel framework is proposed to implement distributed federated learning (FL) algorithms within a UAV swarm that consists of a leading UAV and several following UAVs. Each following UAV trains a local FL model based on its collected data and then sends this trained local model to the leading UAV who will aggregate the received models, generate a global FL model, and transmit it to followers over the intra-swarm network. To identify how wireless factors, like fading, transmission delay, and UAV antenna angle deviations resulting from wind and mechanical vibrations, impact the performance of FL, a rigorous convergence analysis for FL is performed. Then, a joint power allocation and scheduling design is proposed to optimize the convergence rate of FL while taking into account the energy consumption during convergence and the delay requirement imposed by the swarm's control system. Simulation results validate the effectiveness of the FL convergence analysis and show that the joint design strategy can reduce the number of communication rounds needed for convergence by as much as 35% compared with the baseline design.

[36]  arXiv:2002.08204 [pdf]
Title: SYMOG: learning symmetric mixture of Gaussian modes for improved fixed-point quantization
Comments: Preprint submitted to Neurocomputing
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks (DNNs) have been proven to outperform classical methods on several machine learning benchmarks. However, they have high computational complexity and require powerful processing units. Especially when deployed on embedded systems, model size and inference time must be significantly reduced. We propose SYMOG (symmetric mixture of Gaussian modes), which significantly decreases the complexity of DNNs through low-bit fixed-point quantization. SYMOG is a novel soft quantization method such that the learning task and the quantization are solved simultaneously. During training the weight distribution changes from an unimodal Gaussian distribution to a symmetric mixture of Gaussians, where each mean value belongs to a particular fixed-point mode. We evaluate our approach with different architectures (LeNet5, VGG7, VGG11, DenseNet) on common benchmark data sets (MNIST, CIFAR-10, CIFAR-100) and we compare with state-of-the-art quantization approaches. We achieve excellent results and outperform 2-bit state-of-the-art performance with an error rate of only 5.71% on CIFAR-10 and 27.65% on CIFAR-100.

[37]  arXiv:2002.08224 [pdf, other]
Title: A Survey on Predictive Maintenance for Industry 4.0
Authors: Christian Krupitzer (1), Tim Wagenhals (2), Marwin Züfle (1), Veronika Lesch (1), Dominik Schäfer (3), Amin Mozaffarin (4), Janick Edinger (2), Christian Becker (2), Samuel Kounev (1) ((1) University of Würzburg, Würzburg, Germany, (2) University of Mannheim, Mannheim, Germany, (3) Syntax Systems GmbH, Weinheim, Germany, (4) MOZYS Engineering GmbH, Würzburg)
Subjects: Machine Learning (cs.LG)

Production issues at Volkswagen in 2016 lead to dramatic losses in sales of up to 400 million Euros per week. This example shows the huge financial impact of a working production facility for companies. Especially in the data-driven domains of Industry 4.0 and Industrial IoT with intelligent, connected machines, a conventional, static maintenance schedule seems to be old-fashioned. In this paper, we present a survey on the current state of the art in predictive maintenance for Industry 4.0. Based on a structured literate survey, we present a classification of predictive maintenance in the context of Industry 4.0 and discuss recent developments in this area.

[38]  arXiv:2002.08243 [pdf, ps, other]
Title: Optimistic Policy Optimization with Bandit Feedback
Comments: 34 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, without addressing the problem of exploration, or by making strong assumptions on the interaction with the environment. In this paper we consider model-based RL in the tabular finite-horizon MDP setting with unknown transitions and bandit feedback. For this setting, we propose an optimistic trust region policy optimization (TRPO) algorithm for which we establish $\tilde O(\sqrt{S^2 A H^4 K})$ regret for stochastic rewards. Furthermore, we prove $\tilde O( \sqrt{ S^2 A H^4 } K^{2/3} ) $ regret for adversarial rewards. Interestingly, this result matches previous bounds derived for the bandit feedback case, yet with known transitions. To the best of our knowledge, the two results are the first sub-linear regret bounds obtained for policy optimization algorithms with unknown transitions and bandit feedback.

[39]  arXiv:2002.08247 [pdf, other]
Title: Learning Global Transparent Models from Local Contrastive Explanations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

There is a rich and growing literature on producing local point wise contrastive/counterfactual explanations for complex models. These methods highlight what is important to justify the classification and/or produce a contrast point that alters the final classification. Other works try to build globally interpretable models like decision trees and rule lists directly by efficient model search using the data or by transferring information from a complex model using distillation-like methods. Although these interpretable global models can be useful, they may not be consistent with local explanations from a specific complex model of choice. In this work, we explore the question: Can we produce a transparent global model that is consistent with/derivable from local explanations? Based on a key insight we provide a novel method where every local contrastive/counterfactual explanation can be turned into a Boolean feature. These Boolean features are sparse conjunctions of binarized features. The dataset thus constructed is consistent with local explanations by design and one can train an interpretable model like a decision tree on it. We note that this approach strictly loses information due to reliance only on sparse local explanations, nonetheless, we demonstrate empirically that in many cases it can still be competitive with respect to the complex model's performance and also other methods that learn directly from the original dataset. Our approach also provides an avenue to benchmark local explanation methods in a quantitative manner.

[40]  arXiv:2002.08258 [pdf, ps, other]
Title: Knapsack Pruning with Inner Distillation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Neural network pruning reduces the computational cost of an over-parameterized network to improve its efficiency. Popular methods vary from $\ell_1$-norm sparsification to Neural Architecture Search (NAS). In this work, we propose a novel pruning method that optimizes the final accuracy of the pruned network and distills knowledge from the over-parameterized parent network's inner layers. To enable this approach, we formulate the network pruning as a Knapsack Problem which optimizes the trade-off between the importance of neurons and their associated computational cost. Then we prune the network channels while maintaining the high-level structure of the network. The pruned network is fine-tuned under the supervision of the parent network using its inner network knowledge, a technique we refer to as the Inner Knowledge Distillation. Our method leads to state-of-the-art pruning results on ImageNet, CIFAR-10 and CIFAR-100 using ResNet backbones. To prune complex network structures such as convolutions with skip-links and depth-wise convolutions, we propose a block grouping approach to cope with these structures. Through this we produce compact architectures with the same FLOPs as EfficientNet-B0 and MobileNetV3 but with higher accuracy, by $1\%$ and $0.3\%$ respectively on ImageNet, and faster runtime on GPU.

[41]  arXiv:2002.08264 [pdf, other]
Title: Molecule Attention Transformer
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)

Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.

[42]  arXiv:2002.08274 [pdf, other]
Title: Outcome Correlation in Graph Neural Network Regression
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph neural networks aggregate features in vertex neighborhoods to learn vector representations of all vertices, using supervision from some labeled vertices during training. The predictor is then a function of the vector representation, and predictions are made independently on unlabeled nodes. This widely-adopted approach implicitly assumes that vertex labels are independent after conditioning on their neighborhoods. We show that this strong assumption is far from true on many real-world graph datasets and severely limits predictive power on a number of regression tasks. Given that traditional graph-based semi-supervised learning methods operate in the opposite manner by explicitly modeling the correlation in predicted outcomes, this limitation may not be all that surprising.
Here, we address this issue with a simple and interpretable framework that can improve any graph neural network architecture by modeling correlation structure in regression outcome residuals. Specifically, we model the joint distribution of outcome residuals on vertices with a parameterized multivariate Gaussian, where the parameters are estimated by maximizing the marginal likelihood of the observed labels. Our model achieves substantially boosts the performance of graph neural networks, and the learned parameters can also be interpreted as the strength of correlation among connected vertices. To allow us to scale to large networks, we design linear time algorithms for low-variance, unbiased model parameter estimates based on stochastic trace estimation. We also provide a simplified version of our method that makes stronger assumptions on correlation structure but is extremely easy to implement and provides great practical performance in several cases.

[43]  arXiv:2002.08289 [pdf, other]
Title: Variational Encoder-based Reliable Classification
Comments: 7 pages, 6 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Machine learning models provide statistically impressive results which might be individually unreliable. To provide reliability, we propose an Epistemic Classifier (EC) that can provide justification of its belief using support from the training dataset as well as quality of reconstruction. Our approach is based on modified variational auto-encoders that can identify a semantically meaningful low-dimensional space where perceptually similar instances are close in $\ell_2$-distance too. Our results demonstrate improved reliability of predictions and robust identification of samples with adversarial attacks as compared to baseline of softmax-based thresholding.

[44]  arXiv:2002.08329 [pdf, other]
Title: Value-driven Hindsight Modelling
Comments: 8 pages + reference + appendix
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn predictors for value from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. In contrast, model-free methods directly leverage the quantity of interest from the future but have to compose with a potentially weak scalar signal (an estimate of the return). In this paper we develop an approach for representation learning in RL that sits in between these two extremes: we propose to learn what to model in a way that can directly help value prediction. To this end we determine which features of the future trajectory provide useful information to predict the associated return. This provides us with tractable prediction targets that are directly relevant for a task, and can thus accelerate learning of the value function. The idea can be understood as reasoning, in hindsight, about which aspects of the future observations could help past value prediction. We show how this can help dramatically even in simple policy evaluation settings. We then test our approach at scale in challenging domains, including on 57 Atari 2600 games.

[45]  arXiv:2002.08338 [pdf, ps, other]
Title: Multiple Imputation with Denoising Autoencoder using Metamorphic Truth and Imputation Feedback
Authors: Haw-minn Lu (1), Giancarlo Perrone (1), José Unpingco (1) ((1) Gary and Mary West Health Institute)
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Although data may be abundant, complete data is less so, due to missing columns or rows. This missingness undermines the performance of downstream data products that either omit incomplete cases or create derived completed data for subsequent processing. Appropriately managing missing data is required in order to fully exploit and correctly use data. We propose a Multiple Imputation model using Denoising Autoencoders to learn the internal representation of data. Furthermore, we use the novel mechanisms of Metamorphic Truth and Imputation Feedback to maintain statistical integrity of attributes and eliminate bias in the learning process. Our approach explores the effects of imputation on various missingness mechanisms and patterns of missing data, outperforming other methods in many standard test cases.

[46]  arXiv:2002.08339 [pdf, other]
Title: NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Long training times of deep neural networks are a bottleneck in machine learning research. The major impediment to fast training is the quadratic growth of both memory and compute requirements of dense and convolutional layers with respect to their information bandwidth. Recently, training `a priori' sparse networks has been proposed as a method for allowing layers to retain high information bandwidth, while keeping memory and compute low. However, the choice of which sparse topology should be used in these networks is unclear. In this work, we provide a theoretical foundation for the choice of intra-layer topology. First, we derive a new sparse neural network initialization scheme that allows us to explore the space of very deep sparse networks. Next, we evaluate several topologies and show that seemingly similar topologies can often have a large difference in attainable accuracy. To explain these differences, we develop a data-free heuristic that can evaluate a topology independently from the dataset the network will be trained on. We then derive a set of requirements that make a good topology, and arrive at a single topology that satisfies all of them.

[47]  arXiv:2002.08345 [pdf, other]
Title: Schoenberg-Rao distances: Entropy-based and geometry-aware statistical Hilbert distances
Comments: 18 pages, 8 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Distances between probability distributions that take into account the geometry of their sample space,like the Wasserstein or the Maximum Mean Discrepancy (MMD) distances have received a lot of attention in machine learning as they can, for instance, be used to compare probability distributions with disjoint supports. In this paper, we study a class of statistical Hilbert distances that we term the Schoenberg-Rao distances, a generalization of the MMD that allows one to consider a broader class of kernels, namely the conditionally negative semi-definite kernels. In particular, we introduce a principled way to construct such kernels and derive novel closed-form distances between mixtures of Gaussian distributions, among others. These distances, derived from the concave Rao's quadratic entropy, enjoy nice theoretical properties and possess interpretable hyperparameters which can be tuned for specific applications. Our method constitutes a practical alternative to Wasserstein distances and we illustrate its efficiency on a broad range of machine learning tasks such as density estimation, generative modeling and mixture simplification.

[48]  arXiv:2002.08347 [pdf, other]
Title: On Adaptive Attacks to Adversarial Example Defenses
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result---showing that a defense was ineffective---this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.

Cross-lists for Thu, 20 Feb 20

[49]  arXiv:2002.07870 (cross-list from cs.RO) [pdf, other]
Title: Online Parameter Estimation for Safety-Critical Systems with Gaussian Processes
Comments: 7 pages, 5 figures, 1 table
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Parameter estimation is crucial for modeling, tracking, and control of complex dynamical systems. However, parameter uncertainties can compromise system performance under a controller relying on nominal parameter values. Typically, parameters are estimated using numerical regression approaches framed as inverse problems. However, they suffer from non-uniqueness due to existence of multiple local optima, reliance on gradients, numerous experimental data, or stability issues. Addressing these drawbacks, we present a Bayesian optimization framework based on Gaussian processes (GPs) for online parameter estimation. It uses an efficient search strategy over a response surface in the parameter space for finding the global optima with minimal function evaluations. The response surface is modeled as correlated surrogates using GPs on noisy data. The GP posterior predictive variance is exploited for smart adaptive sampling. This balances the exploration versus exploitation trade-off which is key in reaching the global optima under limited budget. We demonstrate our technique on an actuated planar pendulum and safety-critical quadrotor in simulation with changing parameters. We also benchmark our results against solvers using interior point method and sequential quadratic program. By reconfiguring the controller with new optimized parameters iteratively, we drastically improve trajectory tracking of the system versus the nominal case and other solvers.

[50]  arXiv:2002.07873 (cross-list from q-bio.QM) [pdf, other]
Title: A survey of statistical learning techniques as applied to inexpensive pediatric Obstructive Sleep Apnea data
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

Pediatric obstructive sleep apnea affects an estimated 1-5% of elementary-school aged children and can lead to other detrimental health problems. Swift diagnosis and treatment are critical to a child's growth and development, but the variability of symptoms and the complexity of the available data make this a challenge. We take a first step in streamlining the process by focusing on inexpensive data from questionnaires and craniofacial measurements. We apply correlation networks, the Mapper algorithm from topological data analysis, and singular value decomposition in a process of exploratory data analysis. We then apply a variety of supervised and unsupervised learning techniques from statistics, machine learning, and topology, ranging from support vector machines to Bayesian classifiers and manifold learning. Finally, we analyze the results of each of these methods and discuss the implications for a multi-data-sourced algorithm moving forward.

[51]  arXiv:2002.07874 (cross-list from q-bio.QM) [pdf, other]
Title: Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

Deep learning models for MRI classification face two recurring problems: they are typically limited by low sample size, and are abstracted by their own complexity (the "black box problem"). In this paper, we train a convolutional neural network (CNN) with the largest multi-source, functional MRI (fMRI) connectomic dataset ever compiled, consisting of 43,858 datapoints. We apply this model to a cross-sectional comparison of autism (ASD) vs typically developing (TD) controls that has proved difficult to characterise with inferential statistics. To contextualise these findings, we additionally perform classifications of gender and task vs rest. Employing class-balancing to build a training set, we trained 3$\times$300 modified CNNs in an ensemble model to classify fMRI connectivity matrices with overall AUROCs of 0.6774, 0.7680, and 0.9222 for ASD vs TD, gender, and task vs rest, respectively. Additionally, we aim to address the black box problem in this context using two visualization methods. First, class activation maps show which functional connections of the brain our models focus on when performing classification. Second, by analyzing maximal activations of the hidden layers, we were also able to explore how the model organizes a large and mixed-centre dataset, finding that it dedicates specific areas of its hidden layers to processing different covariates of data (depending on the independent variable analyzed), and other areas to mix data from different sources. Our study finds that deep learning models that distinguish ASD from TD controls focus broadly on temporal and cerebellar connections, with a particularly high focus on the right caudate nucleus and paracentral sulcus.

[52]  arXiv:2002.07877 (cross-list from cs.IR) [pdf, other]
Title: CBIR using features derived by Deep Learning
Comments: 18 pages, 31 figures
Subjects: Information Retrieval (cs.IR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

In a Content Based Image Retrieval (CBIR) System, the task is to retrieve similar images from a large database given a query image. The usual procedure is to extract some useful features from the query image, and retrieve images which have similar set of features. For this purpose, a suitable similarity measure is chosen, and images with high similarity scores are retrieved. Naturally the choice of these features play a very important role in the success of this system, and high level features are required to reduce the semantic gap.
In this paper, we propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image classification problem. This approach appears to produce vastly superior results for a variety of databases, and it outperforms many contemporary CBIR systems. We analyse the retrieval time of the method, and also propose a pre-clustering of the database based on the above-mentioned features which yields comparable results in a much shorter time in most of the cases.

[53]  arXiv:2002.07884 (cross-list from stat.ML) [pdf, ps, other]
Title: Observational nonidentifiability, generalized likelihood and free energy
Comments: 25 pages, 1 figure
Subjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)

We study the parameter estimation problem in mixture models with observational nonidentifiability: the full model (also containing hidden variables) is identifiable, but the marginal (observed) model is not. Hence global maxima of the marginal likelihood are (infinitely) degenerate and predictions of the marginal likelihood are not unique. We show how to generalize the marginal likelihood by introducing an effective temperature, and making it similar to the free energy. This generalization resolves the observational nonidentifiability, since its maximization leads to unique results that are better than a random selection of one degenerate maximum of the marginal likelihood or the averaging over many such maxima. The generalized likelihood inherits many features from the usual likelihood, e.g. it holds the conditionality principle, and its local maximum can be searched for via suitably modified expectation-maximization method. The maximization of the generalized likelihood relates to entropy optimization.

[54]  arXiv:2002.07897 (cross-list from eess.IV) [pdf, other]
Title: LocoGAN -- Locally Convolutional GAN
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In the paper we construct a fully convolutional GAN model: LocoGAN, which latent space is given by noise-like images of possibly different resolutions. The learning is local, i.e. we process not the whole noise-like image, but the sub-images of a fixed size. As a consequence LocoGAN can produce images of arbitrary dimensions e.g. LSUN bedroom data set. Another advantage of our approach comes from the fact that we use the position channels, which allows the generation of fully periodic (e.g. cylindrical panoramic images) or almost periodic ,,infinitely long" images (e.g. wall-papers).

[55]  arXiv:2002.07940 (cross-list from astro-ph.CO) [pdf, other]
Title: A unified framework for 21cm tomography sample generation and parameter inference with Progressively Growing GANs
Comments: 15 pages, 8+1 figures, accepted by MNRAS
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)

Creating a database of 21cm brightness temperature signals from the Epoch of Reionisation (EoR) for an array of reionisation histories is a complex and computationally expensive task, given the range of astrophysical processes involved and the possibly high-dimensional parameter space that is to be probed. We utilise a specific type of neural network, a Progressively Growing Generative Adversarial Network (PGGAN), to produce realistic tomography images of the 21cm brightness temperature during the EoR, covering a continuous three-dimensional parameter space that models varying X-ray emissivity, Lyman band emissivity, and ratio between hard and soft X-rays. The GPU-trained network generates new samples at a resolution of $\sim 3'$ in a second (on a laptop CPU), and the resulting global 21cm signal, power spectrum, and pixel distribution function agree well with those of the training data, taken from the 21SSD catalogue \citep{Semelin2017}. Finally, we showcase how a trained PGGAN can be leveraged for the converse task of inferring parameters from 21cm tomography samples via Approximate Bayesian Computation.

[56]  arXiv:2002.07964 (cross-list from stat.AP) [pdf]
Title: Tourism Demand Forecasting with Tourist Attention: An Ensemble Deep Learning Approach
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Econometrics (econ.EM)

The large amount of tourism-related data presents a series of challenges for tourism demand forecasting, including data deficiencies, multicollinearity and long calculation time. A Bagging-based multivariate ensemble deep learning model, integrating Stacked Autoencoders and KELM (B-SAKE) is proposed to address these challenges in this study. We forecast tourist arrivals arriving in Beijing from four countries adopting historical data on tourist arrivals arriving in Beijing, economic indicators and tourist online behavior variables. The results from the cases of four origin countries suggest that our proposed B-SAKE model outperforms than benchmark models whether in horizontal accuracy, directional accuracy or statistical significance. Both Bagging and Stacked Autoencoder can improve the forecasting performance of the models. Moreover, the forecasting performance of the models is evaluated with consistent results by means of the multi-step-ahead forecasting scheme.

[57]  arXiv:2002.08014 (cross-list from stat.ML) [pdf, other]
Title: Communication-Efficient Distributed SVD via Local Power Iterations
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

We study the distributed computing of the truncated singular value decomposition (SVD). We develop an algorithm that we call \texttt{LocalPower} for improving the communication efficiency. Specifically, we uniformly partition the dataset among $m$ nodes and alternate between multiple (precisely $p$) local power iterations and one global aggregation. We theoretically show that under certain assumptions, \texttt{LocalPower} lowers the required number of communications by a factor of $p$ to reach a certain accuracy. We also show that the strategy of periodically decaying $p$ helps improve the performance of \texttt{LocalPower}. We conduct experiments to demonstrate the effectiveness of \texttt{LocalPower}.

[58]  arXiv:2002.08021 (cross-list from stat.AP) [pdf]
Title: Seasonal and Trend Forecasting of Tourist Arrivals: An Adaptive Multiscale Ensemble Learning Approach
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Econometrics (econ.EM)

The accurate seasonal and trend forecasting of tourist arrivals is a very challenging task. In the view of the importance of seasonal and trend forecasting of tourist arrivals, and limited research work paid attention to these previously. In this study, a new adaptive multiscale ensemble (AME) learning approach incorporating variational mode decomposition (VMD) and least square support vector regression (LSSVR) is developed for short-, medium-, and long-term seasonal and trend forecasting of tourist arrivals. In the formulation of our developed AME learning approach, the original tourist arrivals series are first decomposed into the trend, seasonal and remainders volatility components. Then, the ARIMA is used to forecast the trend component, the SARIMA is used to forecast seasonal component with a 12-month cycle, while the LSSVR is used to forecast remainder volatility components. Finally, the forecasting results of the three components are aggregated to generate an ensemble forecasting of tourist arrivals by the LSSVR based nonlinear ensemble approach. Furthermore, a direct strategy is used to implement multi-step-ahead forecasting. Taking two accuracy measures and the Diebold-Mariano test, the empirical results demonstrate that our proposed AME learning approach can achieve higher level and directional forecasting accuracy compared with other benchmarks used in this study, indicating that our proposed approach is a promising model for forecasting tourist arrivals with high seasonality and volatility.

[59]  arXiv:2002.08024 (cross-list from cs.CL) [pdf, other]
Title: Non-Autoregressive Dialog State Tracking
Comments: Accepted at ICLR 2020
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent efforts in Dialogue State Tracking (DST) for task-oriented dialogues have progressed toward open-vocabulary or generation-based approaches where the models can generate slot value candidates from the dialogue history itself. These approaches have shown good performance gain, especially in complicated dialogue domains with dynamic slot values. However, they fall short in two aspects: (1) they do not allow models to explicitly learn signals across domains and slots to detect potential dependencies among (domain, slot) pairs; and (2) existing models follow auto-regressive approaches which incur high time cost when the dialogue evolves over multiple domains and multiple turns. In this paper, we propose a novel framework of Non-Autoregressive Dialog State Tracking (NADST) which can factor in potential dependencies among domains and slots to optimize the models towards better prediction of dialogue states as a complete set rather than separate slots. In particular, the non-autoregressive nature of our method not only enables decoding in parallel to significantly reduce the latency of DST for real-time dialogue response generation, but also detect dependencies among slots at token level in addition to slot and domain level. Our empirical results show that our model achieves the state-of-the-art joint accuracy across all domains on the MultiWOZ 2.1 corpus, and the latency of our model is an order of magnitude lower than the previous state of the art as the dialogue history extends over time.

[60]  arXiv:2002.08025 (cross-list from cs.CR) [pdf, other]
Title: Influence Function based Data Poisoning Attacks to Top-N Recommender Systems
Comments: Accepted by WWW 2020; This is technical report version
Subjects: Cryptography and Security (cs.CR); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Recommender system is an essential component of web services to engage users. Popular recommender systems model user preferences and item properties using a large amount of crowdsourced user-item interaction data, e.g., rating scores; then top-$N$ items that match the best with a user's preference are recommended to the user. In this work, we show that an attacker can launch a data poisoning attack to a recommender system to make recommendations as the attacker desires via injecting fake users with carefully crafted user-item interaction data. Specifically, an attacker can trick a recommender system to recommend a target item to as many normal users as possible. We focus on matrix factorization based recommender systems because they have been widely deployed in industry. Given the number of fake users the attacker can inject, we formulate the crafting of rating scores for the fake users as an optimization problem. However, this optimization problem is challenging to solve as it is a non-convex integer programming problem. To address the challenge, we develop several techniques to approximately solve the optimization problem. For instance, we leverage influence function to select a subset of normal users who are influential to the recommendations and solve our formulated optimization problem based on these influential users. Our results show that our attacks are effective and outperform existing methods.

[61]  arXiv:2002.08027 (cross-list from cs.CR) [pdf, other]
Title: Toward Low-Cost and Stable Blockchain Networks
Comments: Accepted by IEEE ICC 2020
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

Envisioned to be the future of distributed systems, blockchain networks have received increasing attentions from both industry and academic research in recent years. However, the blockchain mining process consumes vast amounts of energy, and studies have shown that the amount of energy consumed in Bitcoin mining is almost the same as electricity used in Ireland. To address the high mining energy cost problem of blockchain networks, in this paper, we propose a blockchain mining resources allocation algorithm to reduce the mining cost in PoW-based (proof-of-work-based) blockchain networks. We first provide a systematic study on general blockchain queueing model. In our queueing model, transactions arrive randomly to the queue and served in a batch manner with unknown probability distribution and agnostic to any priority mechanism. Then, we leverage Lyapunov optimization techniques to propose a dynamic mining resources allocation algorithm (DMRA), which is parameterized by a tuning parameter $K>0$. We show that our algorithm achieves performance-delay tradeoff as $[O(1/K), O(K)]$. The simulation results also demonstrate the effectiveness of DMRA in reducing the mining cost.

[62]  arXiv:2002.08114 (cross-list from cs.DM) [pdf, other]
Title: BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation
Subjects: Discrete Mathematics (cs.DM); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Past work on evacuation planning assumes that evacuees will follow instructions -- however, there is ample evidence that this is not the case. While some people will follow instructions, others will follow their own desires. In this paper, we present a formal definition of a behavior-based evacuation problem (BBEP) in which a human behavior model is taken into account when planning an evacuation. We show that a specific form of constraints can be used to express such behaviors. We show that BBEPs can be solved exactly via an integer program called BB_IP, and inexactly by a much faster algorithm that we call BB_Evac. We conducted a detailed experimental evaluation of both algorithms applied to buildings (though in principle the algorithms can be applied to any graphs) and show that the latter is an order of magnitude faster than BB_IP while producing results that are almost as good on one real-world building graph and as well as on several synthetically generated graphs.

[63]  arXiv:2002.08126 (cross-list from cs.CL) [pdf, ps, other]
Title: Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Recently, language identity information has been utilized to improve the performance of end-to-end code-switching (CS) speech recognition. However, previous works use an additional language identification (LID) model as an auxiliary module, which causes the system complex. In this work, we propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed. We evaluate the approach on a Mandarin-English CS corpus SEAME. Compared to our RNN-T baseline, the proposed method can achieve 16.2% and 12.9% relative error reduction on two test sets, respectively.

[64]  arXiv:2002.08129 (cross-list from stat.ML) [pdf, other]
Title: Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation
Comments: Conference submission
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)

Implicit stochastic models, where the data-generation distribution is intractable but sampling is possible, are ubiquitous in the natural sciences. The models typically have free parameters that need to be inferred from data collected in scientific experiments. A fundamental question is how to design the experiments so that the collected data are most useful. The field of Bayesian experimental design advocates that, ideally, we should choose designs that maximise the mutual information (MI) between the data and the parameters. For implicit models, however, this approach is severely hampered by the high computational cost of computing posteriors and maximising MI, in particular when we have more than a handful of design variables to optimise. In this paper, we propose a new approach to Bayesian experimental design for implicit models that leverages recent advances in neural MI estimation to deal with these issues. We show that training a neural network to maximise a lower bound on MI allows us to jointly determine the optimal design and the posterior. Simulation studies illustrate that this gracefully extends Bayesian experimental design for implicit models to higher design dimensions.

[65]  arXiv:2002.08158 (cross-list from eess.IV) [pdf, other]
Title: Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding
Comments: 8 pages + detailed supplement with additional full resolution reconstructed images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits estimates of posterior uncertainty. A consequence of the "plug and play" nature of our approach is that various rate-distortion trade-offs can be achieved with a single trained model, eliminating the need to train multiple models for different bit rates. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single machine learning model. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.

[66]  arXiv:2002.08159 (cross-list from stat.ML) [pdf, other]
Title: Learning Fair Scoring Functions: Fairness Definitions, Algorithms and Generalization Bounds for Bipartite Ranking
Comments: 27 pages, 11 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Many applications of artificial intelligence, ranging from credit lending to the design of medical diagnosis support tools through recidivism prediction, involve scoring individuals using a learned function of their attributes. These predictive risk scores are used to rank a set of people, and/or take individual decisions about them based on whether the score exceeds a certain threshold that may depend on the context in which the decision is taken. The level of delegation granted to such systems will heavily depend on how questions of fairness can be answered. While this concern has received a lot of attention in the classification setup, the design of relevant fairness constraints for the problem of learning scoring functions has not been much investigated. In this paper, we propose a flexible approach to group fairness for the scoring problem with binary labeled data, a standard learning task referred to as bipartite ranking. We argue that the functional nature of the ROC curve, the gold standard measuring ranking performance in this context, leads to several possible ways of formulating fairness constraints. We introduce general classes of fairness conditions in bipartite ranking and establish generalization bounds for scoring rules learned under such constraints. Beyond the theoretical formulation and results, we design practical learning algorithms and illustrate our approach with numerical experiments.

[67]  arXiv:2002.08235 (cross-list from cs.CE) [pdf, other]
Title: Physics-informed Neural Networks for Solving Nonlinear Diffusivity and Biot's equations
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Numerical Analysis (math.NA)

This paper presents the potential of applying physics-informed neural networks for solving nonlinear multiphysics problems, which are essential to many fields such as biomedical engineering, earthquake prediction, and underground energy harvesting. Specifically, we investigate how to extend the methodology of physics-informed neural networks to solve both the forward and inverse problems in relation to the nonlinear diffusivity and Biot's equations. We explore the accuracy of the physics-informed neural networks with different training example sizes and choices of hyperparameters. The impacts of the stochastic variations between various training realizations are also investigated. In the inverse case, we also study the effects of noisy measurements. Furthermore, we address the challenge of selecting the hyperparameters of the inverse model and illustrate how this challenge is linked to the hyperparameters selection performed for the forward one.

[68]  arXiv:2002.08240 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum statistical query learning
Comments: 24 Pages
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Machine Learning (cs.LG)

We propose a learning model called the quantum statistical learning QSQ model, which extends the SQ learning model introduced by Kearns to the quantum setting. Our model can be also seen as a restriction of the quantum PAC learning model: here, the learner does not have direct access to quantum examples, but can only obtain estimates of measurement statistics on them. Theoretically, this model provides a simple yet expressive setting to explore the power of quantum examples in machine learning. From a practical perspective, since simpler operations are required, learning algorithms in the QSQ model are more feasible for implementation on near-term quantum devices. We prove a number of results about the QSQ learning model. We first show that parity functions, (log n)-juntas and polynomial-sized DNF formulas are efficiently learnable in the QSQ model, in contrast to the classical setting where these problems are provably hard. This implies that many of the advantages of quantum PAC learning can be realized even in the more restricted quantum SQ learning model. It is well-known that weak statistical query dimension, denoted by WSQDIM(C), characterizes the complexity of learning a concept class C in the classical SQ model. We show that log(WSQDIM(C)) is a lower bound on the complexity of QSQ learning, and furthermore it is tight for certain concept classes C. Additionally, we show that this quantity provides strong lower bounds for the small-bias quantum communication model under product distributions. Finally, we introduce the notion of private quantum PAC learning, in which a quantum PAC learner is required to be differentially private. We show that learnability in the QSQ model implies learnability in the quantum private PAC model. Additionally, we show that in the private PAC learning setting, the classical and quantum sample complexities are equal, up to constant factors.

[69]  arXiv:2002.08246 (cross-list from math.OC) [pdf, other]
Title: A Unified Convergence Analysis for Shuffling-Type Gradient Methods
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we provide a unified convergence analysis for a class of shuffling-type gradient methods for solving a well-known finite-sum minimization problem commonly used in machine learning. This algorithm covers various variants such as randomized reshuffling, single shuffling, and cyclic/incremental gradient schemes. We consider two different settings: strongly convex and non-convex problems. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a general class of shuffling-type gradient methods to solve both non-convex and strongly convex problems. While our rate in the non-convex problem is new (i.e. not known yet under standard assumptions), the rate on the strongly convex case matches (up to a constant) the best-known results. However, unlike existing works in this direction, we only use standard assumptions such as smoothness and strong convexity. Finally, we empirically illustrate the effect of learning rates via a non-convex logistic regression and neural network examples.

[70]  arXiv:2002.08249 (cross-list from eess.AS) [pdf, other]
Title: Workshop Report: Detection and Classification in Marine Bioacoustics with Deep Learning
Comments: 13 pages, 1 figure, 1 table
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

On 21-22 November 2019, about 30 researchers gathered in Victoria, BC, Canada, for the workshop "Detection and Classification in Marine Bioacoustics with Deep Learning" organized by MERIDIAN and hosted by Ocean Networks Canada. The workshop was attended by marine biologists, data scientists, and computer scientists coming from both Canadian coasts and the US and representing a wide spectrum of research organizations including universities, government (Fisheries and Oceans Canada, National Oceanic and Atmospheric Administration), industry (JASCO Applied Sciences, Google, Axiom Data Science), and non-for-profits (Orcasound, OrcaLab). Consisting of a mix of oral presentations, open discussion sessions, and hands-on tutorials, the workshop program offered a rare opportunity for specialists from distinctly different domains to engage in conversation about deep learning and its promising potential for the development of detection and classification algorithms in underwater acoustics. In this workshop report, we summarize key points from the presentations and discussion sessions.

[71]  arXiv:2002.08253 (cross-list from stat.ML) [pdf, ps, other]
Title: Distance-Based Regularisation of Deep Networks for Fine-Tuning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We investigate approaches to regularisation during fine-tuning of deep neural networks. First we provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. Our bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, we develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space.

[72]  arXiv:2002.08260 (cross-list from stat.ML) [pdf, other]
Title: Learning Bounds for Moment-Based Domain Adaptation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Domain adaptation algorithms are designed to minimize the misclassification risk of a discriminative model for a target domain with little training data by adapting a model from a source domain with a large amount of training data. Standard approaches measure the adaptation discrepancy based on distance measures between the empirical probability distributions in the source and target domain. In this setting, we address the problem of deriving learning bounds under practice-oriented general conditions on the underlying probability distributions. As a result, we obtain learning bounds for domain adaptation based on finitely many moments and smoothness conditions.

[73]  arXiv:2002.08267 (cross-list from cs.CL) [pdf]
Title: Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation
Comments: 10 pages, 4 figures, 6 tables
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Sentiment Analysis and Emotion Detection in conversation is key in a number of real-world applications, with different applications leveraging different kinds of data to be able to achieve reasonably accurate predictions. Multimodal Emotion Detection and Sentiment Analysis can be particularly useful as applications will be able to use specific subsets of the available modalities, as per their available data, to be able to produce relevant predictions. Current systems dealing with Multimodal functionality fail to leverage and capture the context of the conversation through all modalities, the current speaker and listener(s) in the conversation, and the relevance and relationship between the available modalities through an adequate fusion mechanism. In this paper, we propose a recurrent neural network architecture that attempts to take into account all the mentioned drawbacks, and keeps track of the context of the conversation, interlocutor states, and the emotions conveyed by the speakers in the conversation. Our proposed model out performs the state of the art on two benchmark datasets on a variety of accuracy and regression metrics. Our model implementation is public and can be found at github.com/amanshenoy/multilogue-net

[74]  arXiv:2002.08276 (cross-list from stat.ML) [pdf, other]
Title: Partial Gromov-Wasserstein with Applications on Positive-Unlabeled Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Optimal Transport (OT) framework allows defining similarity between probability distributions and provides metrics such as the Wasserstein and Gromov-Wasserstein discrepancies. Classical OT problem seeks a transportation map that preserves the total mass, requiring the mass of the source and target distributions to be the same. This may be too restrictive in certain applications such as color or shape matching, since the distributions may have arbitrary masses or that only a fraction of the total mass has to be transported. Several algorithms have been devised for computing unbalanced Wasserstein metrics but when it comes with the Gromov-Wasserstein problem, no partial formulation is available yet. This precludes from working with distributions that do not lie in the same metric space or when invariance to rotation or translation is needed. In this paper, we address the partial Gromov-Wasserstein problem and propose an algorithm to solve it. We showcase the new formulation in a positive-unlabeled (PU) learning application. To the best of our knowledge, this is the first application of optimal transport in this context and we first highlight that partial Wasserstein-based metrics prove effective in usual PU learning settings. We then demonstrate that partial Gromov-Wasserstein metrics is efficient in scenario where point clouds come from different domains or have different features.

[75]  arXiv:2002.08277 (cross-list from cs.CV) [pdf, other]
Title: When Radiology Report Generation Meets Knowledge Graph
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.

[76]  arXiv:2002.08295 (cross-list from cs.DC) [pdf, other]
Title: MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML)

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them. The complicated procedures for evaluating innovations, along with the lack of standard and efficient ways of specifying and provisioning ML/DL evaluation, is a major "pain point" for the community. This paper proposes MLModelScope, an open-source, framework/hardware agnostic, extensible and customizable design that enables repeatable, fair, and scalable model evaluation and benchmarking.
We implement the distributed design with support for all major frameworks and hardware, and equip it with web, command-line, and library interfaces. To demonstrate MLModelScope's capabilities we perform parallel evaluation and show how subtle changes to model evaluation pipeline affects the accuracy and HW/SW stack choices affect performance.

[77]  arXiv:2002.08301 (cross-list from eess.IV) [pdf, ps, other]
Title: Multi-wavelet residual dense convolutional neural network for image denoising
Comments: 9 pages, 9 figures
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Networks with large receptive field (RF) have shown advanced fitting ability in recent years. In this work, we utilize the short-term residual learning method to improve the performance and robustness of networks for image denoising tasks. Here, we choose a multi-wavelet convolutional neural network (MWCNN), one of the state-of-art networks with large RF, as the backbone, and insert residual dense blocks (RDBs) in its each layer. We call this scheme multi-wavelet residual dense convolutional neural network (MWRDCNN). Compared with other RDB-based networks, it can extract more features of the object from adjacent layers, preserve the large RF, and boost the computing efficiency. Meanwhile, this approach also provides a possibility of absorbing advantages of multiple architectures in a single network without conflicts. The performance of the proposed method has been demonstrated in extensive experiments with a comparison with existing techniques.

[78]  arXiv:2002.08313 (cross-list from cs.CR) [pdf, other]
Title: NNoculation: Broad Spectrum and Targeted Treatment of Backdoored DNNs
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

This paper proposes a novel two-stage defense (NNoculation) against backdoored neural networks (BadNets) that, unlike existing defenses, makes minimal assumptions on the shape, size and location of backdoor triggers and BadNet's functioning. In the pre-deployment stage, NNoculation retrains the network using "broad-spectrum" random perturbations of inputs drawn from a clean validation set to partially reduce the adversarial impact of a backdoor. In the post-deployment stage, NNoculation detects and quarantines backdoored test inputs by recording disagreements between the original and pre-deployment patched networks. A CycleGAN is then trained to learn transformations between clean validation inputs and quarantined inputs; i.e., it learns to add triggers to clean validation images. This transformed set of backdoored validation images along with their correct labels is used to further retrain the BadNet, yielding our final defense. NNoculation outperforms state-of-the-art defenses NeuralCleanse and Artificial Brain Simulation (ABS) that we show are ineffective when their restrictive assumptions are circumvented by the attacker.

[79]  arXiv:2002.08314 (cross-list from stat.ML) [pdf, other]
Title: Non-Aligned Distribution Distance using Metric Measure Embedding and Optimal Transport
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We propose a novel approach for comparing distributions whose supports do not necessarily lie on the same metric space. Unlike Gromov-Wasserstein (GW) distance that compares pairwise distance of elements from each distribution, we consider a method that embeds the metric measure spaces in a common Euclidean space and computes an optimal transport (OT) on the embedded distributions. This leads to what we call a sub-embedding robust Wasserstein(SERW). Under some conditions, SERW is a distance that considers an OT distance of the (low-distorted) embedded distributions using a common metric. In addition to this novel proposal that generalizes several recent OT works, our contributions stand on several theoretical analyses: i) we characterize the embedding spaces to define SERW distance for distribution alignment; ii) we prove that SERW mimics almost the same properties of GW distance, and we give a cost relation between GW and SERW. The paper also provides some numerical experiments illustrating how SERW behaves on matching problems in real-world.

[80]  arXiv:2002.08320 (cross-list from cs.CR) [html]
Title: Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop 2020
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

The workshop will focus on the application of artificial intelligence to problems in cyber security. AICS 2020 emphasis will be on human-machine teaming within the context of cyber security problems and will specifically explore collaboration between human operators and AI technologies. The workshop will address applicable areas of AI, such as machine learning, game theory, natural language processing, knowledge representation, automated and assistive reasoning and human machine interactions. Further, cyber security application areas with a particular emphasis on the characterization and deployment of human-machine teaming will be the focus.

[81]  arXiv:2002.08326 (cross-list from cs.DC) [pdf, other]
Title: Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration
Comments: Accepted by DAC2020
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

The research interest in specialized hardware accelerators for deep neural networks (DNN) spiked recently owing to their superior performance and efficiency. However, today's DNN accelerators primarily focus on accelerating specific "kernels" such as convolution and matrix multiplication, which are vital but only part of an end-to-end DNN-enabled application. Meaningful speedups over the entire application often require supporting computations that are, while massively parallel, ill-suited to DNN accelerators. Integrating a general-purpose processor such as a CPU or a GPU incurs significant data movement overhead and leads to resource under-utilization on the DNN accelerators.
We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model that offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications. The key to SMA is the temporal integration of the systolic execution model with the GPU-like SIMD execution model. The SMA exploits the common components shared between the systolic-array accelerator and the GPU, and provides lightweight reconfiguration capability to switch between the two modes in-situ. The SMA achieves up to 63% performance improvement while consuming 23% less energy than the baseline Volta architecture with TensorCore.

[82]  arXiv:2002.08327 (cross-list from cs.CR) [pdf, ps, other]
Title: Fawkes: Protecting Personal Privacy against Unauthorized Deep Learning Models
Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Today's proliferation of powerful facial recognition models poses a real threat to personal privacy. As Clearview.ai demonstrated, anyone can canvas the Internet for data, and train highly accurate facial recognition models of us without our knowledge. We need tools to protect ourselves from unauthorized facial recognition systems and their numerous potential misuses. Unfortunately, work in related areas are limited in practicality and effectiveness. In this paper, we propose Fawkes, a system that allow individuals to inoculate themselves against unauthorized facial recognition models. Fawkes achieves this by helping users adding imperceptible pixel-level changes (we call them "cloaks") to their own photos before publishing them online. When collected by a third-party "tracker" and used to train facial recognition models, these "cloaked" images produce functional models that consistently misidentify the user. We experimentally prove that Fawkes provides 95+% protection against user recognition regardless of how trackers train their models. Even when clean, uncloaked images are "leaked" to the tracker and used for training, Fawkes can still maintain a 80+% protection success rate. In fact, we perform real experiments against today's state-of-the-art facial recognition services and achieve 100% success. Finally, we show that Fawkes is robust against a variety of countermeasures that try to detect or disrupt cloaks.

[83]  arXiv:2002.08333 (cross-list from cs.RO) [pdf, other]
Title: Towards Intelligent Pick and Place Assembly of Individualized Products Using Reinforcement Learning
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Individualized manufacturing is becoming an important approach as a means to fulfill increasingly diverse and specific consumer requirements and expectations. While there are various solutions to the implementation of the manufacturing process, such as additive manufacturing, the subsequent automated assembly remains a challenging task. As an approach to this problem, we aim to teach a collaborative robot to successfully perform pick and place tasks by implementing reinforcement learning. For the assembly of an individualized product in a constantly changing manufacturing environment, the simulated geometric and dynamic parameters will be varied. Using reinforcement learning algorithms capable of meta-learning, the tasks will first be trained in simulation. They will then be performed in a real-world environment where new factors are introduced that were not simulated in training to confirm the robustness of the algorithms. The robot will gain its input data from tactile sensors, area scan cameras, and 3D cameras used to generate heightmaps of the environment and the objects. The selection of machine learning algorithms and hardware components as well as further research questions to realize the outlined production scenario are the results of the presented work.

[84]  arXiv:2002.08335 (cross-list from stat.ML) [pdf, other]
Title: Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We introduce a new regularization method for Artificial Neural Networks (ANNs) based on Kernel Flows (KFs). KFs were introduced as a method for kernel selection in regression/kriging based on the minimization of the loss of accuracy incurred by halving the number of interpolation points in random batches of the dataset. Writing $f_\theta(x) = \big(f^{(n)}_{\theta_n}\circ f^{(n-1)}_{\theta_{n-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big)(x)$ for the functional representation of compositional structure of the ANN, the inner layers outputs $h^{(i)}(x) = \big(f^{(i)}_{\theta_i}\circ f^{(i-1)}_{\theta_{i-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big)(x)$ define a hierarchy of feature maps and kernels $k^{(i)}(x,x')=\exp(- \gamma_i \|h^{(i)}(x)-h^{(i)}(x')\|_2^2)$. When combined with a batch of the dataset these kernels produce KF losses $e_2^{(i)}$ (the $L^2$ regression error incurred by using a random half of the batch to predict the other half) depending on parameters of inner layers $\theta_1,\ldots,\theta_i$ (and $\gamma_i$). The proposed method simply consists in aggregating a subset of these KF losses with a classical output loss. We test the proposed method on CNNs and WRNs without alteration of structure nor output classifier and report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without significant increase in computational complexity. We suspect that these results might be explained by the fact that while conventional training only employs a linear functional (a generalized moment) of the empirical distribution defined by the dataset and can be prone to trapping in the Neural Tangent Kernel regime (under over-parameterizations), the proposed loss function (defined as a nonlinear functional of the empirical distribution) effectively trains the underlying kernel defined by the CNN beyond regressing the data with that kernel.

Replacements for Thu, 20 Feb 20

[85]  arXiv:1804.07090 (replaced) [pdf, other]
Title: Robustness via Deep Low-Rank Representations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[86]  arXiv:1808.10648 (replaced) [pdf, other]
Title: Adaptation and Robust Learning of Probabilistic Movement Primitives
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
[87]  arXiv:1905.11027 (replaced) [pdf, other]
Title: Lightlike Neuromanifolds, Occam's Razor and Deep Learning
Authors: Ke Sun, Frank Nielsen
Comments: Under review in ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[88]  arXiv:1905.11926 (replaced) [pdf, other]
Title: Network Deconvolution
Comments: ICLR 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[89]  arXiv:1905.12265 (replaced) [pdf, other]
Title: Strategies for Pre-training Graph Neural Networks
Comments: Accepted as a spotlight to ICLR 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[90]  arXiv:1905.12726 (replaced) [pdf, other]
Title: Prioritized Sequence Experience Replay
Comments: 18 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[91]  arXiv:1906.05374 (replaced) [pdf, other]
Title: Meta-Learning via Learned Loss
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
[92]  arXiv:1908.05569 (replaced) [pdf, other]
Title: Isotropic Maximization Loss and Entropic Score: Fast, Accurate, Scalable, Unexposed, Turnkey, and Native Neural Networks Out-of-Distribution Detection
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[93]  arXiv:1908.06869 (replaced) [pdf, other]
Title: XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Performance (cs.PF); Machine Learning (stat.ML)
[94]  arXiv:1909.04823 (replaced) [pdf, other]
Title: Distributed Equivalent Substitution Training for Large-Scale Recommender Systems
Comments: 10 pages
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
[95]  arXiv:1909.11957 (replaced) [pdf, other]
Title: Drawing early-bird tickets: Towards more efficient training of deep networks
Comments: Accepted as ICLR2020 Spotlight
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[96]  arXiv:1910.09191 (replaced) [pdf, other]
Title: Regularization Matters in Policy Optimization
Comments: More analytic experiments and evaluation metrics added on last version. Code link: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[97]  arXiv:1910.10196 (replaced) [pdf, other]
Title: No-regret Non-convex Online Meta-Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[98]  arXiv:1910.11858 (replaced) [pdf, other]
Title: BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[99]  arXiv:1910.12027 (replaced) [pdf, other]
Title: Consistency Regularization for Generative Adversarial Networks
Comments: ICLR2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[100]  arXiv:1910.13406 (replaced) [pdf, other]
Title: Generalization of Reinforcement Learners with Working and Episodic Memory
Comments: NeurIPS 2019. Equal contribution of first 4 authors
Journal-ref: 33rd Conference on Neural Information Processing Systems (Neurips 2019)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[101]  arXiv:1911.05076 (replaced) [pdf, other]
Title: Constant Curvature Graph Convolutional Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[102]  arXiv:1911.06922 (replaced) [pdf, other]
Title: Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Machine Learning (stat.ML)
[103]  arXiv:1911.09032 (replaced) [pdf, other]
Title: Outside the Box: Abstraction-Based Monitoring of Neural Networks
Comments: accepted at ECAI 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Machine Learning (stat.ML)
[104]  arXiv:1912.05833 (replaced) [pdf, other]
Title: Speech-driven facial animation using polynomial fusion of features
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[105]  arXiv:1912.06638 (replaced) [pdf, other]
Title: WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
Comments: Added Figure, minor edits for clarity
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[106]  arXiv:1912.09818 (replaced) [pdf, other]
Title: When Explanations Lie: Why Many Modified BP Attributions Fail
Comments: 18 pages, 10 figures. Preprint
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[107]  arXiv:1912.09855 (replaced) [pdf, other]
Title: Explainability and Adversarial Robustness for RNNs
Comments: Accepted at IEEE BigDataService 2020
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)
[108]  arXiv:2001.00012 (replaced) [pdf]
Title: Differentially Private M-band Wavelet-Based Mechanisms in Machine Learning Environments
Comments: Part-Time Research Assistant/Helper: Tony Lee; 49 pages, 20 figures, 1 table, to be published by International Press of Boston
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[109]  arXiv:2001.01796 (replaced) [pdf, other]
Title: Fair Active Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[110]  arXiv:2001.02407 (replaced) [pdf, other]
Title: SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
Comments: Accepted in ICLR 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[111]  arXiv:2001.08456 (replaced) [pdf, other]
Title: Ada-LISTA: Learned Solvers Adaptive to Varying Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[112]  arXiv:2002.02561 (replaced) [pdf, other]
Title: Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[113]  arXiv:2002.02705 (replaced) [pdf, other]
Title: Trust Your Model: Iterative Label Improvement and Robust Training by Confidence Based Filtering and Dataset Partitioning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[114]  arXiv:2002.02950 (replaced) [pdf, ps, other]
Title: Logistic Regression Regret: What's the Catch?
Authors: Gil I. Shamir
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[115]  arXiv:2002.03461 (replaced) [pdf, other]
Title: Relation Embedding for Personalised POI Recommendation
Comments: 12 pages, 3 figures, Accepted in the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2020)
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (stat.ML)
[116]  arXiv:2002.03847 (replaced) [pdf, other]
Title: Making Logic Learnable With Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Machine Learning (stat.ML)
[117]  arXiv:2002.05227 (replaced) [pdf, other]
Title: Variational Autoencoders with Riemannian Brownian Motion Priors
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[118]  arXiv:2002.05706 (replaced) [pdf, other]
Title: Sequential Cooperative Bayesian Inference
Comments: 28 pages, 22 figures, submitted to ICML 2020
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[119]  arXiv:2002.07684 (replaced) [pdf, ps, other]
Title: A Lagrangian Approach to Information Propagation in Graph Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[120]  arXiv:2002.07766 (replaced) [pdf, other]
Title: Learning Bijective Feature Maps for Linear ICA
Comments: 8 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[121]  arXiv:1811.03862 (replaced) [pdf, other]
Title: Targeting Solutions in Bayesian Multi-Objective Optimization: Sequential and Batch Versions
Journal-ref: Annals of Mathematics and Artificial Intelligence volume 88, pages 187-212(2020)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[122]  arXiv:1811.06026 (replaced) [pdf, ps, other]
Title: Incentivizing Exploration with Selective Data Disclosure
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[123]  arXiv:1812.03894 (replaced) [pdf, other]
Title: Model-Based Learning of Turbulent Flows using a Mobile Robot
Comments: 21 pages, 26 figures
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Machine Learning (stat.ML)
[124]  arXiv:1812.04103 (replaced) [pdf, other]
Title: Non-local U-Net for Biomedical Image Segmentation
Comments: In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2019
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
[125]  arXiv:1812.09747 (replaced) [pdf, other]
Title: Let Me Not Lie: Learning MultiNomial Logit
Comments: 33 pages, 12 tables, 6 figures, +10 p. Appendix
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[126]  arXiv:1901.00409 (replaced) [pdf, other]
Title: Neural Clustering Processes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[127]  arXiv:1901.10787 (replaced) [pdf, other]
Title: Tensorized Embedding Layers for Efficient Model Compression
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[128]  arXiv:1902.04495 (replaced) [pdf, other]
Title: The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy
Comments: 36 pages, 4 figures
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[129]  arXiv:1904.06744 (replaced) [pdf, ps, other]
Title: A Personalized Preference Learning Framework for Caching in Mobile Networks
Comments: 21 pages, 10 figures, 1 table, to appear in the IEEE Transactions on Mobile Computing
Subjects: Networking and Internet Architecture (cs.NI); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (cs.LG); Multimedia (cs.MM)
[130]  arXiv:1905.00919 (replaced) [pdf, other]
Title: Mimic Learning to Generate a Shareable Network Intrusion Detection Model
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[131]  arXiv:1905.04753 (replaced) [pdf, other]
Title: Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints
Comments: ICLR 2020. Project page with code is at this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[132]  arXiv:1905.09952 (replaced) [pdf, other]
Title: Fast Algorithms for Computational Optimal Transport and Wasserstein Barycenter
Comments: 18 pages, 35 figures
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
[133]  arXiv:1906.06627 (replaced) [pdf, other]
Title: Representation Quality Explains Adversarial Attacks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[134]  arXiv:1906.09412 (replaced) [pdf, other]
Title: Multi-task Learning for Aggregated Data using Gaussian Processes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[135]  arXiv:1906.11667 (replaced) [pdf, other]
Title: Evolving Robust Neural Architectures to Defend from Adversarial Attacks
Subjects: Neural and Evolutionary Computing (cs.NE); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[136]  arXiv:1907.12160 (replaced) [pdf, ps, other]
Title: Adaptive spline fitting with particle swarm optimization
Comments: Expanded literature survey; performance comparison with WaveShrink and smoothing spline; new figures and a table added
Subjects: Computation (stat.CO); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Methodology (stat.ME)
[137]  arXiv:1909.11764 (replaced) [pdf, ps, other]
Title: FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Comments: ICLR 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[138]  arXiv:1910.01226 (replaced) [pdf, ps, other]
Title: Piracy Resistant Watermarks for Deep Neural Networks
Comments: 13 pages
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[139]  arXiv:1910.02757 (replaced) [pdf, other]
Title: Stochastic Bandits with Delay-Dependent Payoffs
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[140]  arXiv:1910.04462 (replaced) [pdf, other]
Title: Fast Tree Variants of Gromov-Wasserstein
Comments: A major revision: (1) improve the complexity of the efficient computation of FlowTGW, (2) add more discussions on tree-metric sampling by clustering-based, (3) add more experiments on larger datasets, and investigate empirical relation for variants of GW, (4) add more details, complexity analysis, and discussions for FlowTGW and DepthTGW, and (5) add some reviews and further discussions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[141]  arXiv:1910.05505 (replaced) [pdf, other]
Title: Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers
Comments: 26 pages, proof of the fact that the flow always converges to a critical point (Theorem 10) significantly simplified, numerical section updated
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[142]  arXiv:1910.09126 (replaced) [pdf, other]
Title: Communication-Efficient Local Decentralized SGD Methods
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[143]  arXiv:1910.14375 (replaced) [pdf, other]
Title: A comparative study of estimating articulatory movements from phoneme sequences and acoustic features
Comments: 5 pages, 5 figures, accepted in ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[144]  arXiv:1911.05146 (replaced) [pdf, other]
Title: HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow
Comments: 18 pages, 10 figures, Accepted, to be presented at ISC '20
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Performance (cs.PF)
[145]  arXiv:1911.07509 (replaced) [pdf, other]
Title: AI-based Pilgrim Detection using Convolutional Neural Networks
Comments: Accepted in ATSIP'2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[146]  arXiv:1911.07676 (replaced) [pdf, ps, other]
Title: Learning with Good Feature Representations in Bandits and in RL with a Generative Model
Comments: 13 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[147]  arXiv:1912.02906 (replaced) [pdf, other]
Title: Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems
Comments: Added experimental results
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[148]  arXiv:1912.06366 (replaced) [pdf, ps, other]
Title: Provably Efficient Reinforcement Learning with Aggregated States
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[149]  arXiv:2001.02004 (replaced) [pdf, other]
Title: CNN 101: Interactive Visual Learning for Convolutional Neural Networks
Comments: CHI'20 Late-Breaking Work (April 25-30, 2020), 7 pages, 3 figures
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[150]  arXiv:2001.11897 (replaced) [pdf, other]
Title: Learning Unitaries by Gradient Descent
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Mathematical Physics (math-ph)
[151]  arXiv:2002.02534 (replaced) [pdf, other]
Title: Fast inference of Boosted Decision Trees in FPGAs for particle physics
Subjects: Computational Physics (physics.comp-ph); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex)
[152]  arXiv:2002.04700 (replaced) [pdf]
Title: A Single RGB Camera Based Gait Analysis with a Mobile Tele-Robot for Healthcare
Authors: Ziyang Wang
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[153]  arXiv:2002.04836 (replaced) [src]
Title: Analysis Of Multi Field Of View Cnn And Attention Cnn On H&E Stained Whole-slide Images On Hepatocellular Carcinoma
Comments: This paper has been withdrawn by the authors due to need for heavy revise
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[154]  arXiv:2002.05145 (replaced) [pdf, other]
Title: Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling
Comments: 20 pages, 7 tables and figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[155]  arXiv:2002.06177 (replaced) [pdf]
Title: The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence
Authors: Gary Marcus
Comments: 5 figures
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[156]  arXiv:2002.06707 (replaced) [pdf, other]
Title: Stochastic Normalizing Flows
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Data Analysis, Statistics and Probability (physics.data-an)
[157]  arXiv:2002.07215 (replaced) [pdf, other]
Title: STANNIS: Low-Power Acceleration of Deep Neural Network Training Using Computational Storage
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[ total of 157 entries: 1-157 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2002, contact, help  (Access key information)