We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 166 entries: 1-166 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 29 Oct 20

[1]  arXiv:2010.14535 [pdf, other]
Title: Neural Architecture Search of SPD Manifold Networks
Comments: Info: 19 pages, 11 Figures, and 9 Tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks. Unlike the conventional NAS problem, our problem requires to search for a unique computational cell called the SPD cell. This SPD cell serves as a basic building block of SPD neural architectures. An efficient solution to our problem is important to minimize the extraneous manual effort in the SPD neural architecture design. To accomplish this goal, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem using the supernet strategy which models the architecture search problem as a one-shot training process of a single supernet. Based on the supernet modeling, we exploit a differentiable NAS algorithm on our relaxed continuous search space for SPD neural architecture search. Statistical evaluation of our method on drone, action, and emotion recognition tasks mostly provides better results than the state-of-the-art SPD networks and NAS algorithms. Empirical results show that our algorithm excels in discovering better SPD network design, and providing models that are more than 3 times lighter than searched by state-of-the-art NAS algorithms.

[2]  arXiv:2010.14543 [pdf, other]
Title: Unsupervised Domain Adaptation for Visual Navigation
Comments: Deep Reinforcement Learning Workshop at NeurIPS 2020. Camera Ready Version
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Advances in visual navigation methods have led to intelligent embodied navigation agents capable of learning meaningful representations from raw RGB images and perform a wide variety of tasks involving structural and semantic reasoning. However, most learning-based navigation policies are trained and tested in simulation environments. In order for these policies to be practically useful, they need to be transferred to the real-world. In this paper, we propose an unsupervised domain adaptation method for visual navigation. Our method translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy. The proposed method outperforms several baselines across two different navigation tasks in simulation. We further show that our method can be used to transfer the navigation policies learned in simulation to the real world.

[3]  arXiv:2010.14563 [pdf, ps, other]
Title: Adversarial Dueling Bandits
Comments: 26 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce the problem of regret minimization in Adversarial Dueling Bandits. As in classic Dueling Bandits, the learner has to repeatedly choose a pair of items and observe only a relative binary `win-loss' feedback for this pair, but here this feedback is generated from an arbitrary preference matrix, possibly chosen adversarially. Our main result is an algorithm whose $T$-round regret compared to the \emph{Borda-winner} from a set of $K$ items is $\tilde{O}(K^{1/3}T^{2/3})$, as well as a matching $\Omega(K^{1/3}T^{2/3})$ lower bound. We also prove a similar high probability regret bound. We further consider a simpler \emph{fixed-gap} adversarial setup, which bridges between two extreme preference feedback models for dueling bandits: stationary preferences and an arbitrary sequence of preferences. For the fixed-gap adversarial setup we give an $\smash{ \tilde{O}((K/\Delta^2)\log{T}) }$ regret algorithm, where $\Delta$ is the gap in Borda scores between the best item and all other items, and show a lower bound of $\Omega(K/\Delta^2)$ indicating that our dependence on the main problem parameters $K$ and $\Delta$ is tight (up to logarithmic factors).

[4]  arXiv:2010.14592 [pdf, other]
Title: Shapley Flow: A Graph-based Approach to Interpreting Model Predictions
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Many existing approaches for estimating feature importance are problematic because they ignore or hide dependencies among features. A causal graph, which encodes the relationships among input variables, can aid in assigning feature importance. However, current approaches that assign credit to nodes in the causal graph fail to explain the entire graph. In light of these limitations, we propose Shapley Flow, a novel approach to interpreting machine learning models. It considers the entire causal graph, and assigns credit to \textit{edges} instead of treating nodes as the fundamental unit of credit assignment. Shapley Flow is the unique solution to a generalization of the Shapley value axioms to directed acyclic graphs. We demonstrate the benefit of using Shapley Flow to reason about the impact of a model's input on its output. In addition to maintaining insights from existing approaches, Shapley Flow extends the flat, set-based, view prevalent in game theory based explanation methods to a deeper, \textit{graph-based}, view. This graph-based view enables users to understand the flow of importance through a system, and reason about potential interventions.

[5]  arXiv:2010.14603 [pdf, other]
Title: Learning to be Safe: Deep RL with a Safety Critic
Comments: In submission, 16 pages (including appendix)
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Safety is an essential component for deploying reinforcement learning (RL) algorithms in real-world scenarios, and is critical during the learning process itself. A natural first approach toward safe RL is to manually specify constraints on the policy's behavior. However, just as learning has enabled progress in large-scale development of AI systems, learning safety specifications may also be necessary to ensure safety in messy open-world environments where manual safety specifications cannot scale. Akin to how humans learn incrementally starting in child-safe environments, we propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors when learning new, modified tasks. We empirically study this form of safety-constrained transfer learning in three challenging domains: simulated navigation, quadruped locomotion, and dexterous in-hand manipulation. In comparison to standard deep RL techniques and prior approaches to safe RL, we find that our method enables the learning of new tasks and in new environments with both substantially fewer safety incidents, such as falling or dropping an object, and faster, more stable learning. This suggests a path forward not only for safer RL systems, but also for more effective RL systems.

[6]  arXiv:2010.14641 [pdf, other]
Title: Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Learning complex behaviors through interaction requires coordinated long-term planning. Random exploration and novelty search lack task-centric guidance and waste effort on non-informative interactions. Instead, decision making should target samples with the potential to optimize performance far into the future, while only reducing uncertainty where conducive to this objective. This paper presents latent optimistic value exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards. We combine finite horizon rollouts from a latent model with value function estimates to predict infinite horizon returns and recover associated uncertainty through ensembling. Policy training then proceeds on an upper confidence bound (UCB) objective to identify and select the interactions most promising to improve long-term performance. We apply LOVE to visual control tasks in continuous state-action spaces and demonstrate improved sample complexity on a selection of benchmarking tasks.

[7]  arXiv:2010.14657 [pdf, ps, other]
Title: Temporal Difference Learning as Gradient Splitting
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in terms of a splitting of the gradient of an appropriately chosen function. As a consequence of this interpretation, convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning. Beyond giving a new, fuller explanation of why temporal difference works, our interpretation also yields improved convergence times. We consider the setting with $1/\sqrt{T}$ step-size, where previous comparable finite-time convergence time bounds for temporal difference learning had the multiplicative factor $1/(1-\gamma)$ in front of the bound, with $\gamma$ being the discount factor. We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where $1/(1-\gamma)$ only multiplies an asymptotically negligible term.

[8]  arXiv:2010.14658 [pdf, ps, other]
Title: Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMC
Comments: To appear in NeurIPS 2020
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Probability (math.PR)

Various differentially private algorithms instantiate the exponential mechanism, and require sampling from the distribution $\exp(-f)$ for a suitable function $f$. When the domain of the distribution is high-dimensional, this sampling can be computationally challenging. Using heuristic sampling schemes such as Gibbs sampling does not necessarily lead to provable privacy. When $f$ is convex, techniques from log-concave sampling lead to polynomial-time algorithms, albeit with large polynomials. Langevin dynamics-based algorithms offer much faster alternatives under some distance measures such as statistical distance. In this work, we establish rapid convergence for these algorithms under distance measures more suitable for differential privacy. For smooth, strongly-convex $f$, we give the first results proving convergence in R\'enyi divergence. This gives us fast differentially private algorithms for such $f$. Our techniques and simple and generic and apply also to underdamped Langevin dynamics.

[9]  arXiv:2010.14664 [pdf, other]
Title: System Identification via Meta-Learning in Linear Time-Varying Environments
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

System identification is a fundamental problem in reinforcement learning, control theory and signal processing, and the non-asymptotic analysis of the corresponding sample complexity is challenging and elusive, even for linear time-varying (LTV) systems. To tackle this challenge, we develop an episodic block model for the LTV system where the model parameters remain constant within each block but change from block to block. Based on the observation that the model parameters across different blocks are related, we treat each episodic block as a learning task and then run meta-learning over many blocks for system identification, using two steps, namely offline meta-learning and online adaptation. We carry out a comprehensive non-asymptotic analysis of the performance of meta-learning based system identification. To deal with the technical challenges rooted in the sample correlation and small sample sizes in each block, we devise a new two-scale martingale small-ball approach for offline meta-learning, for arbitrary model correlation structure across blocks. We then quantify the finite time error of online adaptation by leveraging recent advances in linear stochastic approximation with correlated samples.

[10]  arXiv:2010.14670 [pdf, ps, other]
Title: Online Learning with Primary and Secondary Losses
Authors: Avrim Blum, Han Shao
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of online learning with primary and secondary losses. For example, a recruiter making decisions of which job applicants to hire might weigh false positives and false negatives equally (the primary loss) but the applicants might weigh false negatives much higher (the secondary loss). We consider the following question: Can we combine "expert advice" to achieve low regret with respect to the primary loss, while at the same time performing {\em not much worse than the worst expert} with respect to the secondary loss? Unfortunately, we show that this goal is unachievable without any bounded variance assumption on the secondary loss. More generally, we consider the goal of minimizing the regret with respect to the primary loss and bounding the secondary loss by a linear threshold. On the positive side, we show that running any switching-limited algorithm can achieve this goal if all experts satisfy the assumption that the secondary loss does not exceed the linear threshold by $o(T)$ for any time interval. If not all experts satisfy this assumption, our algorithms can achieve this goal given access to some external oracles which determine when to deactivate and reactivate experts.

[11]  arXiv:2010.14672 [pdf, other]
Title: Why Does MAML Outperform ERM? An Optimization Perspective
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Model-Agnostic Meta-Learning (MAML) has demonstrated widespread success in training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard Empirical Risk Minimization (ERM), and little is understood about how much MAML improves over ERM in terms of the fast adaptability of their solutions in various scenarios. We analytically address this issue in a linear regression setting consisting of a mixture of easy and hard tasks, where hardness is determined by the number of gradient steps required to solve the task. Specifically, we prove that for $\Omega(d_{\text{eff}})$ labelled test samples (for gradient-based fine-tuning) where $d_{\text{eff}}$ is the effective dimension of the problem, in order for MAML to achieve substantial gain over ERM, the optimal solutions of the hard tasks must be closely packed together with the center far from the center of the easy task optimal solutions. We show that these insights also apply in a low-dimensional feature space when both MAML and ERM learn a representation of the tasks, which reduces the effective problem dimension. Further, our few-shot image classification experiments suggest that our results generalize beyond linear regression.

[12]  arXiv:2010.14680 [pdf, other]
Title: Learning to Represent Action Values as a Hypergraph on the Action Vertices
Comments: 9 pages, 10 figures, 3 tables
Subjects: Machine Learning (cs.LG)

Action-value estimation is a critical component of many reinforcement learning (RL) methods whereby sample complexity relies heavily on how fast a good estimator for action value can be learned. By viewing this problem through the lens of representation learning, good representations of both state and action can facilitate action-value estimation. While advances in deep learning have seamlessly driven progress in learning state representations, given the specificity of the notion of agency to RL, little attention has been paid to learning action representations. We conjecture that leveraging the combinatorial structure of multi-dimensional action spaces is a key ingredient for learning good representations of action. To test this, we set forth the action hypergraph networks framework---a class of functions for learning action representations with a relational inductive bias. Using this framework we realise an agent class based on a combination with deep Q-networks, which we dub hypergraph Q-networks. We show the effectiveness of our approach on a myriad of domains: illustrative prediction problems under minimal confounding effects, Atari 2600 games, and physical control benchmarks.

[13]  arXiv:2010.14687 [pdf, other]
Title: MILR: Mathematically Induced Layer Recovery for Plaintext Space Error Correction of CNNs
Comments: 12 pages
Subjects: Machine Learning (cs.LG)

The increased use of Convolutional Neural Networks (CNN) in mission critical systems has increased the need for robust and resilient networks in the face of both naturally occurring faults as well as security attacks. The lack of robustness and resiliency can lead to unreliable inference results. Current methods that address CNN robustness require hardware modification, network modification, or network duplication. This paper proposes MILR a software based CNN error detection and error correction system that enables self-healing of the network from single and multi bit errors. The self-healing capabilities are based on mathematical relationships between the inputs,outputs, and parameters(weights) of a layers, exploiting these relationships allow the recovery of erroneous parameters (weights) throughout a layer and the network. MILR is suitable for plaintext-space error correction (PSEC) given its ability to correct whole-weight and even whole-layer errors in CNNs.

[14]  arXiv:2010.14689 [pdf, other]
Title: Expressive yet Tractable Bayesian Deep Learning via Subnetwork Inference
Comments: 15 pages, extended version with supplementary material
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The Bayesian paradigm has the potential to solve some of the core issues in modern deep learning, such as poor calibration, data inefficiency, and catastrophic forgetting. However, scaling Bayesian inference to the high-dimensional parameter spaces of deep neural networks requires restrictive approximations. In this paper, we propose performing inference over only a small subset of the model parameters while keeping all others as point estimates. This enables us to use expressive posterior approximations that would otherwise be intractable for the full model. In particular, we develop a practical and scalable Bayesian deep learning method that first trains a point estimate, and then infers a full covariance Gaussian posterior approximation over a subnetwork. We propose a subnetwork selection procedure which aims to optimally preserve posterior uncertainty. We empirically demonstrate the effectiveness of our approach compared to point-estimated networks and methods that use less expressive posterior approximations over the full network.

[15]  arXiv:2010.14700 [pdf, other]
Title: Sparse Symmetric Tensor Regression for Functional Connectivity Analysis
Authors: Da Xu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Tensor regression models, such as CP regression and Tucker regression, have many successful applications in neuroimaging analysis where the covariates are of ultrahigh dimensionality and possess complex spatial structures. The high-dimensional covariate arrays, also known as tensors, can be approximated by low-rank structures and fit into the generalized linear models. The resulting tensor regression achieves a significant reduction in dimensionality while remaining efficient in estimation and prediction. Brain functional connectivity is an essential measure of brain activity and has shown significant association with neurological disorders such as Alzheimer's disease. The symmetry nature of functional connectivity is a property that has not been explored in previous tensor regression models. In this work, we propose a sparse symmetric tensor regression that further reduces the number of free parameters and achieves superior performance over symmetrized and ordinary CP regression, under a variety of simulation settings. We apply the proposed method to a study of Alzheimer's disease (AD) and normal ageing from the Berkeley Aging Cohort Study (BACS) and detect two regions of interest that have been identified important to AD.

[16]  arXiv:2010.14701 [pdf, other]
Title: Scaling Laws for Autoregressive Generative Modeling
Comments: 20+15 pages, 30 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depends on the compute budget through a power-law, with exponents that are nearly universal across all data domains.
The cross-entropy loss has an information theoretic interpretation as $S($True$) + D_{\mathrm{KL}}($True$||$Model$)$, and the empirical scaling laws suggest a prediction for both the true data distribution's entropy and the KL divergence between the true and model distributions. With this interpretation, billion-parameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an $8\times 8$ resolution, and we can forecast the model size needed to achieve any given reducible loss (ie $D_{\mathrm{KL}}$) in nats/image for other resolutions.
We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question "Is a picture worth a thousand words?"; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks.

[17]  arXiv:2010.14753 [pdf, ps, other]
Title: A short note on the decision tree based neural turing machine
Authors: Yingshi Chen
Comments: 5 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:2010.02921
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Turing machine and decision tree have developed independently for a long time. With the recent development of differentiable models, there is an intersection between them. Neural turing machine(NTM) opens door for the memory network. It use differentiable attention mechanism to read/write external memory bank. Differentiable forest brings differentiable properties to classical decision tree. In this short note, we show the deep connection between these two models. That is: differentiable forest is a special case of NTM. Differentiable forest is actually decision tree based neural turing machine. Based on this deep connection, we propose a response augmented differential forest (RaDF). The controller of RaDF is differentiable forest, the external memory of RaDF are response vectors which would be read/write by leaf nodes.

[18]  arXiv:2010.14761 [pdf, other]
Title: Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures
Comments: 18 pages, 4 figures. arXiv admin note: text overlap with arXiv:2006.07897
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistics Theory (math.ST)

We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape in the vicinity of a Bayes-optimal solution, and show that the closer we get to such configurations, the higher the local entropy, implying that the Bayes-optimal solution lays inside a wide flat region. We also consider the algorithmically relevant case of targeting wide flat minima of the (differentiable) mean squared error loss. Our analytical and numerical results show not only that in the balanced case the dependence on the norm of the weights is mild, but also, in the unbalanced case, that the performances can be improved.

[19]  arXiv:2010.14763 [pdf, other]
Title: Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes
Comments: arXiv admin note: substantial text overlap with arXiv:2007.09208
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We prove a tight and novel non-trivial convergence analysis for strongly convex problems which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased and unbiased local data sets.

[20]  arXiv:2010.14765 [pdf, other]
Title: Deep Networks from the Principle of Rate Reduction
Comments: arXiv admin note: text overlap with arXiv:1611.05431 by other authors
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)

This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer. The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme. All components of this "white box" network have precise optimization, statistical, and geometric interpretation. This principled framework also reveals and justifies the role of multi-channel lifting and sparse coding in early stage of deep networks. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation also indicates that such a convolutional network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments indicate that so constructed deep network can already learn a good discriminative representation even without any back propagation training.

[21]  arXiv:2010.14766 [pdf, other]
Title: A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation
Comments: arXiv admin note: substantial text overlap with arXiv:1811.12359
Journal-ref: Journal of Machine Learning Research 2020, Volume 21, Number 209
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The idea behind the \emph{unsupervised} learning of \emph{disentangled} representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train over $14000$ models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on eight data sets. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, different evaluation metrics do not always agree on what should be considered "disentangled" and exhibit systematic differences in the estimation. Finally, increased disentanglement does not seem to necessarily lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

[22]  arXiv:2010.14771 [pdf, other]
Title: Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
Comments: arXiv admin note: substantial text overlap with arXiv:2001.02435
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Off-policy Reinforcement Learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment. Current off-policy policy gradient methods either suffer from high bias or high variance, delivering often unreliable estimates. The price of inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited, and a very high sample cost hinders straightforward application. In this paper, we propose a nonparametric Bellman equation, which can be solved in closed form. The solution is differentiable w.r.t the policy parameters and gives access to an estimation of the policy gradient. In this way, we avoid the high variance of importance sampling approaches, and the high bias of semi-gradient methods. We empirically analyze the quality of our gradient estimate against state-of-the-art methods, and show that it outperforms the baselines in terms of sample efficiency on classical control tasks.

[23]  arXiv:2010.14773 [pdf, ps, other]
Title: Graph embedding using multi-layer adjacent point merging model
Subjects: Machine Learning (cs.LG)

For graph classification tasks, many traditional kernel methods focus on measuring the similarity between graphs. These methods have achieved great success on resolving graph isomorphism problems. However, in some classification problems, the graph class depends on not only the topological similarity of the whole graph, but also constituent subgraph patterns. To this end, we propose a novel graph embedding method using a multi-layer adjacent point merging model. This embedding method allows us to extract different subgraph patterns from train-data. Then we present a flexible loss function for feature selection which enhances the robustness of our method for different classification problems. Finally, numerical evaluations demonstrate that our proposed method outperforms many state-of-the-art methods.

[24]  arXiv:2010.14774 [pdf, other]
Title: Structural Causal Model with Expert Augmented Knowledge to Estimate the Effect of Oxygen Therapy on Mortality in the ICU
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent advances in causal inference techniques, more specifically, in the theory of structural causal models, provide the framework for identification of causal effects from observational data in the cases where the causal graph is identifiable, i.e., the data generating mechanism can be recovered from the joint distribution. However, no such studies have been done to demonstrate this concept with a clinical example. We present a complete framework to estimate the causal effect from observational data by augmenting expert knowledge in the model development phase and with a practical clinical application. Our clinical application entails a timely and important research question, i.e., the effect of oxygen therapy intervention in the intensive care unit (ICU); the result of this project is useful in a variety of disease conditions, including severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) patients in the ICU. We used data from the MIMIC III database, a standard database in the machine learning community that contains 58,976 admissions from an ICU in Boston, MA, for estimating the oxygen therapy effect on morality. We also identified the covariate-specific effect to oxygen therapy from the model for more personalized intervention.

[25]  arXiv:2010.14778 [pdf, other]
Title: DNA: Differentiable Network-Accelerator Co-Search
Subjects: Machine Learning (cs.LG)

Powerful yet complex deep neural networks (DNNs) have fueled a booming demand for efficient DNN solutions to bring DNN-powered intelligence into numerous applications. Jointly optimizing the networks and their accelerators are promising in providing optimal performance. However, the great potential of such solutions have yet to be unleashed due to the challenge of simultaneously exploring the vast and entangled, yet different design spaces of the networks and their accelerators. To this end, we propose DNA, a Differentiable Network-Accelerator co-search framework for automatically searching for matched networks and accelerators to maximize both the task accuracy and acceleration efficiency. Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators that is applicable to both FPGA- and ASIC-based DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration for more efficient DNNs and their accelerators; and (2) a joint DNN network and accelerator co-search algorithm that enables simultaneously searching for optimal DNN structures and their accelerators' micro-architectures and mapping methods to maximize both the task accuracy and acceleration efficiency. Experiments and ablation studies based on FPGA measurements and ASIC synthesis show that the matched networks and accelerators generated by DNA consistently outperform state-of-the-art (SOTA) DNNs and DNN accelerators (e.g., 3.04x better FPS with a 5.46% higher accuracy on ImageNet), while requiring notably reduced search time (up to 1234.3x) over SOTA co-exploration methods, when evaluated over ten SOTA baselines on three datasets. All codes will be released upon acceptance.

[26]  arXiv:2010.14785 [pdf, other]
Title: Designing Interpretable Approximations to Deep Reinforcement Learning with Soft Decision Trees
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or requirements such as verifiable safety guarantees, it may not be feasible to actually use such high-performing DNNs in practice. Many techniques have been developed in recent years to compress or distill complex DNNs into smaller, faster or more understandable models and controllers. This work seeks to provide a quantitative framework with metrics to systematically evaluate the outcome of such conversion processes, and identify reduced models that not only preserve a desired performance level, but also, for example, succinctly explain the latent knowledge represented by a DNN. We illustrate the effectiveness of the proposed approach on the evaluation of decision tree variants in the context of benchmark reinforcement learning tasks.

[27]  arXiv:2010.14816 [pdf, other]
Title: Higher Order Linear Transformer
Authors: Jean Mercat
Subjects: Machine Learning (cs.LG)

Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.

[28]  arXiv:2010.14831 [pdf, other]
Title: Deep Manifold Computing and Visualization
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

The ability to preserve local geometry of highly nonlinear manifolds in high dimensional spaces and properly unfold them into lower dimensional hyperplanes is the key to the success of manifold computing, nonlinear dimensionality reduction (NLDR) and visualization. This paper proposes a novel method, called elastic locally isometric smoothness (ELIS), to empower deep neural networks with such an ability. ELIS requires that a desired metric between points should be preserved across layers in order to preserve local geometry; such a smoothness constraint effectively regularizes vector-based transformations to become well-behaved local metric-preserving homeomorphisms. Moreover, ELIS requires that the smoothness should be imposed in a way to render sufficient flexibility for tackling complicated nonlinearity and non-Euclideanity; this is achieved layer-wisely via nonlinearity in both the similarity and activation functions. The ELIS method incorporates a class of suitable nonlinear similarity functions into a two-way divergence loss and uses hyperparameter continuation in finding optimal solutions. Extensive experiments, comparisons, and ablation study demonstrate that ELIS can deliver results not only superior to UMAP and t-SNE for and visualization but also better than other leading counterparts of manifold and autoencoder learning for NLDR and manifold data generation.

[29]  arXiv:2010.14864 [pdf, other]
Title: Tree-structured Ising models can be learned efficiently
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)

We provide the first polynomial-sample and polynomial-time algorithm for learning tree-structured Ising models. In particular, we show that $n$-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance~$\epsilon$ from an optimal $O(n \log n/\epsilon^2)$ samples, where $O(.)$ hides an absolute constant which does not depend on the model being learned -- neither its tree nor the magnitude of its edge strengths, on which we place no assumptions. Our guarantees hold, in fact, for the celebrated Chow-Liu [1968] algorithm, using the plug-in estimator for mutual information. While this (or any other) algorithm may fail to identify the structure of the underlying model correctly from a finite sample, we show that it will still learn a tree-structured model that is close to the true one in TV distance, a guarantee called "proper learning."
Prior to our work there were no known sample- and time-efficient algorithms for learning (properly or non-properly) arbitrary tree-structured graphical models. In particular, our guarantees cannot be derived from known results for the Chow-Liu algorithm and the ensuing literature on learning graphical models, including a recent renaissance of algorithms on this learning challenge, which only yield asymptotic consistency results, or sample-inefficient and/or time-inefficient algorithms, unless further assumptions are placed on the graphical model, such as bounds on the "strengths" of the model's edges. While we establish guarantees for a widely known and simple algorithm, the analysis that this algorithm succeeds is quite complex, requiring a hierarchical classification of the edges into layers with different reconstruction guarantees, depending on their strength, combined with delicate uses of the subadditivity of the squared Hellinger distance over graphical models to control the error accumulation.

[30]  arXiv:2010.14876 [pdf, other]
Title: Fighting Copycat Agents in Behavioral Cloning from Observation Histories
Comments: Published at NeurIPS 2020 9 pages(exclude reference and appendices)
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Imitation learning trains policies to map from input observations to the actions that an expert would choose. In this setting, distribution shift frequently exacerbates the effect of misattributing expert actions to nuisance correlates among the observed variables. We observe that a common instance of this causal confusion occurs in partially observed settings when expert actions are strongly correlated over time: the imitator learns to cheat by predicting the expert's previous action, rather than the next action. To combat this "copycat problem", we propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate, while retaining the information necessary to predict the next action. In our experiments, our approach improves performance significantly across a variety of partially observed imitation learning tasks.

[31]  arXiv:2010.14878 [pdf, other]
Title: An Optimal Control Approach to Learning in SIDARTHE Epidemic model
Comments: 11 pages, 7 figures, submitted at TNNLS
Subjects: Machine Learning (cs.LG); Physics and Society (physics.soc-ph)

The COVID-19 outbreak has stimulated the interest in the proposal of novel epidemiological models to predict the course of the epidemic so as to help planning effective control strategies. In particular, in order to properly interpret the available data, it has become clear that one must go beyond most classic epidemiological models and consider models that, like the recently proposed SIDARTHE, offer a richer description of the stages of infection. The problem of learning the parameters of these models is of crucial importance especially when assuming that they are time-variant, which further enriches their effectiveness. In this paper we propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data. We formulate the problem in terms of a functional risk that depends on the learning variables through the solutions of a dynamic system. The resulting variational problem is then solved by using a gradient flow on a suitable, regularized functional. We forecast the epidemic evolution in Italy and France. Results indicate that the model provides reliable and challenging predictions over all available data as well as the fundamental role of the chosen strategy on the time-variant parameters.

[32]  arXiv:2010.14900 [pdf, other]
Title: Dynamic Bayesian Approach for decision-making in Ego-Things
Comments: IEEE 5th World Forum on Internet of Things at Limerick, Ireland
Subjects: Machine Learning (cs.LG)

This paper presents a novel approach to detect abnormalities in dynamic systems based on multisensory data and feature selection. The proposed method produces multiple inference models by considering several features of the observed data. This work facilitates the obtainment of the most precise features for predicting future instances and detecting abnormalities. Growing neural gas (GNG) is employed for clustering multisensory data into a set of nodes that provide a semantic interpretation of data and define local linear models for prediction purposes. Our method uses a Markov Jump particle filter (MJPF) for state estimation and abnormality detection. The proposed method can be used for selecting the optimal set features to be shared in networking operations such that state prediction, decision-making, and abnormality detection processes are favored. This work is evaluated by using a real dataset consisting of a moving vehicle performing some tasks in a controlled environment.

[33]  arXiv:2010.14907 [pdf, other]
Title: Online feature selection for rapid, low-overhead learning in networked systems
Authors: Xiaoxuan Wang (1), Forough Shahab Samani (1 and 2), Rolf Stadler (1 and 2) ((1) KTH Royal Institute of Technology, Sweden (2) RISE Research Institutes of Sweden)
Comments: A short version of this paper has been published at IFIP/IEEE 16th International Conference on Network and Service Management, 2-6 November 2020
Subjects: Machine Learning (cs.LG)

Data-driven functions for operation and management often require measurements collected through monitoring for model training and prediction. The number of data sources can be very large, which requires a significant communication and computing overhead to continuously extract and collect this data, as well as to train and update the machine-learning models. We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources, which allows for rapid, low-overhead, and effective learning and prediction. OSFS is instantiated with a feature ranking algorithm and applies the concept of a stable feature set, which we introduce in the paper. We perform extensive, experimental evaluation of our method on data from an in-house testbed. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude, from which models are trained with acceptable prediction accuracy. While our method is heuristic and can be improved in many ways, the results clearly suggests that many learning tasks do not require a lengthy monitoring phase and expensive offline training.

[34]  arXiv:2010.14908 [pdf, other]
Title: Collective Awareness for Abnormality Detection in Connected Autonomous Vehicles
Comments: IEEE Internet of Things Journal
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Signal Processing (eess.SP)

The advancements in connected and autonomous vehicles in these times demand the availability of tools providing the agents with the capability to be aware and predict their own states and context dynamics. This article presents a novel approach to develop an initial level of collective awareness in a network of intelligent agents. A specific collective self awareness functionality is considered, namely, agent centered detection of abnormal situations present in the environment around any agent in the network. Moreover, the agent should be capable of analyzing how such abnormalities can influence the future actions of each agent. Data driven dynamic Bayesian network (DBN) models learned from time series of sensory data recorded during the realization of tasks (agent network experiences) are here used for abnormality detection and prediction. A set of DBNs, each related to an agent, is used to allow the agents in the network to each synchronously aware possible abnormalities occurring when available models are used on a new instance of the task for which DBNs have been learned. A growing neural gas (GNG) algorithm is used to learn the node variables and conditional probabilities linking nodes in the DBN models; a Markov jump particle filter (MJPF) is employed for state estimation and abnormality detection in each agent using learned DBNs as filter parameters. Performance metrics are discussed to asses the algorithms reliability and accuracy. The impact is also evaluated by the communication channel used by the network to share the data sensed in a distributed way by each agent of the network. The IEEE 802.11p protocol standard has been considered for communication among agents. Real data sets are also used acquired by autonomous vehicles performing different tasks in a controlled environment.

[35]  arXiv:2010.14927 [pdf, other]
Title: Most ReLU Networks Suffer from $\ell^2$ Adversarial Perturbations
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

We consider ReLU networks with random weights, in which the dimension decreases at each layer. We show that for most such networks, most examples $x$ admit an adversarial perturbation at an Euclidean distance of $O\left(\frac{\|x\|}{\sqrt{d}}\right)$, where $d$ is the input dimension. Moreover, this perturbation can be found via gradient flow, as well as gradient descent with sufficiently small steps. This result can be seen as an explanation to the abundance of adversarial examples, and to the fact that they are found via gradient descent.

[36]  arXiv:2010.14945 [pdf, other]
Title: Graph Contrastive Learning with Adaptive Augmentation
Comments: Work in progress; 11 pages, 3 figures, 5 tables. arXiv admin note: substantial text overlap with arXiv:2006.04131
Subjects: Machine Learning (cs.LG)

Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemes---a crucial component in CL---remains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structural and attribute information of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of real-world datasets. Experimental results demonstrate that our proposed method consistently outperforms existing state-of-the-art methods and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.

[37]  arXiv:2010.14946 [pdf, other]
Title: Smart Anomaly Detection in Sensor Systems
Subjects: Machine Learning (cs.LG); General Literature (cs.GL)

Anomaly detection is concerned with identifying data patterns that deviate remarkably from the expected behaviour. This is an important research problem, due to its broad set of application domains, from data analysis to e-health, cybersecurity, predictive maintenance, fault prevention, and industrial automation. Herein, we review state-of-the-art methods that may be employed to detect anomalies in the specific area of sensor systems, which poses hard challenges in terms of information fusion, data volumes, data speed, and network/energy efficiency, to mention but the most pressing ones. In this context, anomaly detection is a particularly hard problem, given the need to find computing-energy accuracy trade-offs in a constrained environment. We taxonomize methods ranging from conventional techniques (statistical methods, time-series analysis, signal processing, etc.) to data-driven techniques (supervised learning, reinforcement learning, deep learning, etc.). We also look at the impact that different architectural environments (Cloud, Fog, Edge) can have on the sensors ecosystem. The review points to the most promising intelligent-sensing methods, and pinpoints a set of interesting open issues and challenges.

[38]  arXiv:2010.14957 [pdf, other]
Title: Dimensionality Reduction and Anomaly Detection for CPPS Data using Autoencoder
Comments: Copyright IEEE 2019
Journal-ref: 2019 IEEE International Conference on Industrial Technology (ICIT)
Subjects: Machine Learning (cs.LG)

Unsupervised anomaly detection (AD) is a major topic in the field of Cyber-Physical Production Systems (CPPSs). A closely related concern is dimensionality reduction (DR) which is: 1) often used as a preprocessing step in an AD solution, 2) a sort of AD, if a measure of observation conformity to the learned data manifold is provided.
We argue that the two aspects can be complementary in a CPPS anomaly detection solution. In this work, we focus on the nonlinear autoencoder (AE) as a DR/AD approach. The contribution of this work is: 1) we examine the suitability of AE reconstruction error as an AD decision criterion in CPPS data. 2) we analyze its relation to a potential second-phase AD approach in the AE latent space 3) we evaluate the performance of the approach on three real-world datasets. Moreover, the approach outperforms state-of-the-art techniques, alongside a relatively simple and straightforward application.

[39]  arXiv:2010.14978 [pdf, ps, other]
Title: Game-Theoretic Interactions of Different Orders
Subjects: Machine Learning (cs.LG)

In this study, we define interaction components of different orders between two input variables based on game theory. We further prove that interaction components of different orders satisfy several desirable properties.

[40]  arXiv:2010.14986 [pdf, other]
Title: Evaluating Robustness of Predictive Uncertainty Estimation: Are Dirichlet-based Models Reliable?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Robustness to adversarial perturbations and accurate uncertainty estimation are crucial for reliable application of deep learning in real world settings. Dirichlet-based uncertainty (DBU) models are a family of models that predict the parameters of a Dirichlet distribution (instead of a categorical one) and promise to signal when not to trust their predictions. Untrustworthy predictions are obtained on unknown or ambiguous samples and marked with a high uncertainty by the models. In this work, we show that DBU models with standard training are not robust w.r.t. three important tasks in the field of uncertainty estimation. In particular, we evaluate how useful the uncertainty estimates are to (1) indicate correctly classified samples, and (2) to detect adversarial examples that try to fool classification. We further evaluate the reliability of DBU models on the task of (3) distinguishing between in-distribution (ID) and out-of-distribution (OOD) data. To this end, we present the first study of certifiable robustness for DBU models. Furthermore, we propose novel uncertainty attacks that fool models into assigning high confidence to OOD data and low confidence to ID data, respectively. Based on our results, we explore the first approaches to make DBU models more robust. We use adversarial training procedures based on label attacks, uncertainty attacks, or random noise and demonstrate how they affect robustness of DBU models on ID data and OOD data.

[41]  arXiv:2010.15003 [pdf, other]
Title: Estimating Product Relations in Neural Networks
Authors: Bhaavan Goel
Comments: 5 pages, 8 figures
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Universal approximation theorem suggests that a shallow neural network can approximate any function. The input to neurons at each layer is a weighted sum of previous layer neurons and then an activation is applied. These activation functions perform very well when the output is a linear combination of input data. When trying to learn a function which involves product of input data, the neural networks tend to overfit the data to approximate the function. In this paper we will use properties of logarithmic functions to propose a pair of custom activation functions which can translate products into linear expression and learn using backpropagation. We will try to generalize this approach for some complex arithmetic functions and test the accuracy on a disjoint distribution with the training set.

[42]  arXiv:2010.15010 [pdf, other]
Title: Geometric Scattering Attention Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Geometric scattering has recently gained recognition in graph representation learning, and recent work has shown that integrating scattering features in graph convolution networks (GCNs) can alleviate the typical oversmoothing of features in node representation learning. However, scattering methods often rely on handcrafted design, requiring careful selection of frequency bands via a cascade of wavelet transforms, as well as an effective weight sharing scheme to combine together low- and band-pass information. Here, we introduce a new attention-based architecture to produce adaptive task-driven node representations by implicitly learning node-wise weights for combining multiple scattering and GCN channels in the network. We show the resulting geometric scattering attention network (GSAN) outperforms previous networks in semi-supervised node classification, while also enabling a spectral study of extracted information by examining node-wise attention weights.

[43]  arXiv:2010.15011 [pdf, other]
Title: Predicting Classification Accuracy when Adding New Unobserved Classes
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Multiclass classifiers are often designed and evaluated only on a sample from the classes on which they will eventually be applied. Hence, their final accuracy remains unknown. In this work we study how a classifier's performance over the initial class sample can be used to extrapolate its expected accuracy on a larger, unobserved set of classes. For this, we define a measure of separation between correct and incorrect classes that is independent of the number of classes: the reversed ROC (rROC), which is obtained by replacing the roles of classes and data-points in the common ROC. We show that the classification accuracy is a function of the rROC in multiclass classifiers, for which the learned representation of data from the initial class sample remains unchanged when new classes are added. Using these results we formulate a robust neural-network-based algorithm, CleaneX, which learns to estimate the accuracy of such classifiers on arbitrarily large sets of classes. Our method achieves remarkably better predictions than current state-of-the-art methods on both simulations and real datasets of object detection, face recognition, and brain decoding.

[44]  arXiv:2010.15020 [pdf, other]
Title: Provably Efficient Online Agnostic Learning in Markov Games
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study online agnostic learning, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable. We show that in this challenging setting, achieving sublinear regret against the best response in hindsight is statistically hard. We then consider a weaker notion of regret, and present an algorithm that achieves after $K$ episodes a sublinear $\tilde{\mathcal{O}}(K^{3/4})$ regret. This is the first sublinear regret bound (to our knowledge) in the online agnostic setting. Importantly, our regret bound is independent of the size of the opponents' action spaces. As a result, even when the opponents' actions are fully observable, our regret bound improves upon existing analysis (e.g., (Xie et al., 2020)) by an exponential factor in the number of opponents.

[45]  arXiv:2010.15028 [pdf, other]
Title: DeepRite: Deep Recurrent Inverse TreatmEnt Weighting for Adjusting Time-varying Confounding in Modern Longitudinal Observational Data
Subjects: Machine Learning (cs.LG)

Counterfactual prediction is about predicting outcome of the unobserved situation from the data. For example, given patient is on drug A, what would be the outcome if she switch to drug B. Most of existing works focus on modeling counterfactual outcome based on static data. However, many applications have time-varying confounding effects such as multiple treatments over time. How to model such time-varying effects from longitudinal observational data? How to model complex high-dimensional dependency in the data? To address these challenges, we propose Deep Recurrent Inverse TreatmEnt weighting (DeepRite) by incorporating recurrent neural networks into two-phase adjustments for the existence of time-varying confounding in modern longitudinal data. In phase I cohort reweighting we fit one network for emitting time dependent inverse probabilities of treatment, use them to generate a pseudo balanced cohort. In phase II outcome progression, we input the adjusted data to the subsequent predictive network for making counterfactual predictions. We evaluate DeepRite on both synthetic data and a real data collected from sepsis patients in the intensive care units. DeepRite is shown to recover the ground truth from synthetic data, and estimate unbiased treatment effects from real data that can be better aligned with the standard guidelines for management of sepsis thanks to its applicability to create balanced cohorts.

[46]  arXiv:2010.15031 [pdf, ps, other]
Title: On Learning Continuous Pairwise Markov Random Fields
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider learning a sparse pairwise Markov Random Field (MRF) with continuous-valued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finite-sample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRFs with continuous variables and also has desirable asymptotic properties, including consistency and normality under mild conditions. Further, we establish that the population version of the optimization criterion employed in Vuffray et al. (2019) can be interpreted as local maximum likelihood estimation (MLE). As part of our analysis, we introduce a robust variation of sparse linear regression a` la Lasso, which may be of interest in its own right.

[47]  arXiv:2010.15054 [pdf, other]
Title: Attribution Preservation in Network Compression for Reliable Network Interpretation
Comments: NeurIPS 2020. Code: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Neural networks embedded in safety-sensitive applications such as self-driving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edge-computing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for mission-critical applications. This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions. To combat the attribution inconsistency problem, we present a framework that can preserve the attributions while compressing a network. By employing the Weighted Collapsed Attribution Matching regularizer, we match the attribution maps of the network being compressed to its pre-compression former self. We demonstrate the effectiveness of our algorithm both quantitatively and qualitatively on diverse compression methods.

[48]  arXiv:2010.15056 [pdf, other]
Title: Self-awareness in Intelligent Vehicles: Experience Based Abnormality Detection
Comments: Robot 2019: Fourth Iberian Robotics Conference
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

The evolution of Intelligent Transportation System in recent times necessitates the development of self-driving agents: the self-awareness consciousness. This paper aims to introduce a novel method to detect abnormalities based on internal cross-correlation parameters of the vehicle. Before the implementation of Machine Learning, the detection of abnormalities were manually programmed by checking every variable and creating huge nested conditions that are very difficult to track. Nowadays, it is possible to train a Dynamic Bayesian Network (DBN) model to automatically evaluate and detect when the vehicle is potentially misbehaving. In this paper, different scenarios have been set in order to train and test a switching DBN for Perimeter Monitoring Task using a semantic segmentation for the DBN model and Hellinger Distance metric for abnormality measurements.

[49]  arXiv:2010.15088 [pdf, other]
Title: Finite-Time Analysis of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Stochastic approximation, a data-driven approach for finding the fixed point of an unknown operator, provides a unified framework for treating many problems in stochastic optimization and reinforcement learning. Motivated by a growing interest in multi-agent and multi-task learning, we consider in this paper a decentralized variant of stochastic approximation. A network of agents, each with their own unknown operator and data observations, cooperatively find the fixed point of the aggregate operator. The agents work by running a local stochastic approximation algorithm using noisy samples from their operators while averaging their iterates with their neighbors' on a decentralized communication graph. Our main contribution provides a finite-time analysis of this decentralized stochastic approximation algorithm and characterizes the impacts of the underlying communication topology between agents. Our model for the data observed at each agent is that it is sampled from a Markov processes; this lack of independence makes the iterates biased and (potentially) unbounded. Under mild assumptions on the Markov processes, we show that the convergence rate of the proposed methods is essentially the same as if the samples were independent, differing only by a log factor that represents the mixing time of the Markov process. We also present applications of the proposed method on a number of interesting learning problems in multi-agent systems, including a decentralized variant of Q-learning for solving multi-task reinforcement learning.

[50]  arXiv:2010.15100 [pdf, other]
Title: Evaluating Model Robustness to Dataset Shift
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

As the use of machine learning in safety-critical domains becomes widespread, the importance of evaluating their safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which typically requires applying the model to multiple, independent datasets. Since the cost of collecting such datasets is often prohibitive, in this paper, we propose a framework for evaluating this type of robustness using a single, fixed evaluation dataset. We use the original evaluation data to define an uncertainty set of possible evaluation distributions and estimate the algorithm's performance on the "worst-case" distribution within this set. Specifically, we consider distribution shifts defined by conditional distributions, allowing some distributions to shift while keeping other portions of the data distribution fixed. This results in finer-grained control over the considered shifts and more plausible worst-case distributions than previous approaches based on covariate shifts. To address the challenges associated with estimation in complex, high-dimensional distributions, we derive a "debiased" estimator which maintains $\sqrt{N}$-consistency even when machine learning methods with slower convergence rates are used to estimate the nuisance parameters. In experiments on a real medical risk prediction task, we show that this estimator can be used to evaluate robustness and accounts for realistic shifts that cannot be expressed as covariate shift. The proposed framework provides a means for practitioners to proactively evaluate the safety of their models using a single validation dataset.

[51]  arXiv:2010.15110 [pdf, other]
Title: Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Comments: 19 pages, 19 figures, In Advances in Neural Information Processing Systems 34 (NeurIPS 2020)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a highly chaotic rapid initial transient that within 2 to 3 epochs determines the final linearly connected basin of low loss containing the end point of training. During this chaotic transient, the NTK changes rapidly, learning useful features from the training data that enables it to outperform the standard initial NTK by a factor of 3 in less than 3 to 4 epochs. After this rapid chaotic transient, the NTK changes at constant velocity, and its performance matches that of full network training in 15% to 45% of training time. Overall, our analysis reveals a striking correlation between a diverse set of metrics over training time, governed by a rapid chaotic to stable transition in the first few epochs, that together poses challenges and opportunities for the development of more accurate theories of deep learning.

[52]  arXiv:2010.15114 [pdf, other]
Title: The geometry of integration in text classification RNNs
Comments: 9+19 pages, 30 figures
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

Despite the widespread application of recurrent neural networks (RNNs) across a variety of tasks, a unified understanding of how RNNs solve these tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those patterns depend on the training dataset or task. This work addresses these questions in the context of a specific natural language processing task: text classification. Using tools from dynamical systems analysis, we study recurrent networks trained on a battery of both natural and synthetic text classification tasks. We find the dynamics of these trained RNNs to be both interpretable and low-dimensional. Specifically, across architectures and datasets, RNNs accumulate evidence for each class as they process the text, using a low-dimensional attractor manifold as the underlying mechanism. Moreover, the dimensionality and geometry of the attractor manifold are determined by the structure of the training dataset; in particular, we describe how simple word-count statistics computed on the training dataset can be used to predict these properties. Our observations span multiple architectures and datasets, reflecting a common mechanism RNNs employ to perform text classification. To the degree that integration of evidence towards a decision is a common computational primitive, this work lays the foundation for using dynamical systems techniques to study the inner workings of RNNs.

[53]  arXiv:2010.15116 [pdf, other]
Title: On Graph Neural Networks versus Graph-Augmented MLPs
Subjects: Machine Learning (cs.LG); Combinatorics (math.CO); Machine Learning (stat.ML)

From the perspective of expressive power, this work compares multi-layer Graph Neural Networks (GNNs) with a simplified alternative that we call Graph-Augmented Multi-Layer Perceptrons (GA-MLPs), which first augments node features with certain multi-hop operators on the graph and then applies an MLP in a node-wise fashion. From the perspective of graph isomorphism testing, we show both theoretically and numerically that GA-MLPs with suitable operators can distinguish almost all non-isomorphic graphs, just like the Weifeiler-Lehman (WL) test. However, by viewing them as node-level functions and examining the equivalence classes they induce on rooted graphs, we prove a separation in expressive power between GA-MLPs and GNNs that grows exponentially in depth. In particular, unlike GNNs, GA-MLPs are unable to count the number of attributed walks. We also demonstrate via community detection experiments that GA-MLPs can be limited by their choice of operator family, as compared to GNNs with higher flexibility in learning.

Cross-lists for Thu, 29 Oct 20

[54]  arXiv:2010.13891 (cross-list from q-fin.CP) [pdf]
Title: Stock Price Prediction Using CNN and LSTM-Based Deep Learning Models
Comments: The paper consists of 7 pages, 10 figures, and 5 tables. This is the accepted version of our paper in the IEEE International Conference on Decision Aid Sciences and Applications (DASA'20), November 8-9, 2020, Bahrain
Subjects: Computational Finance (q-fin.CP); Machine Learning (cs.LG)

Designing robust and accurate predictive models for stock price prediction has been an active area of research for a long time. While on one side, the supporters of the efficient market hypothesis claim that it is impossible to forecast stock prices accurately, many researchers believe otherwise. There exist propositions in the literature that have demonstrated that if properly designed and optimized, predictive models can very accurately and reliably predict future values of stock prices. This paper presents a suite of deep learning based models for stock price prediction. We use the historical records of the NIFTY 50 index listed in the National Stock Exchange of India, during the period from December 29, 2008 to July 31, 2020, for training and testing the models. Our proposition includes two regression models built on convolutional neural networks and three long and short term memory network based predictive models. To forecast the open values of the NIFTY 50 index records, we adopted a multi step prediction technique with walk forward validation. In this approach, the open values of the NIFTY 50 index are predicted on a time horizon of one week, and once a week is over, the actual index values are included in the training set before the model is trained again, and the forecasts for the next week are made. We present detailed results on the forecasting accuracies for all our proposed models. The results show that while all the models are very accurate in forecasting the NIFTY 50 open values, the univariate encoder decoder convolutional LSTM with the previous two weeks data as the input is the most accurate model. On the other hand, a univariate CNN model with previous one week data as the input is found to be the fastest model in terms of its execution speed.

[55]  arXiv:2010.14557 (cross-list from cs.CL) [pdf, other]
Title: DGST: a Dual-Generator Network for Text Style Transfer
Comments: Accepted by EMNLP 2020, camera ready version
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We propose DGST, a novel and simple Dual-Generator network architecture for text Style Transfer. Our model employs two generators only, and does not rely on any discriminators or parallel corpus for training. Both quantitative and qualitative experiments on the Yelp and IMDb datasets show that our model gives competitive performance compared to several strong baselines with more complicated architecture designs.

[56]  arXiv:2010.14570 (cross-list from cs.IR) [pdf, other]
Title: Addressing Purchase-Impression Gap through a Sequential Re-ranker
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Large scale eCommerce platforms such as eBay carry a wide variety of inventory and provide several buying choices to online shoppers. It is critical for eCommerce search engines to showcase in the top results the variety and selection of inventory available, specifically in the context of the various buying intents that may be associated with a search query. Search rankers are most commonly powered by learning-to-rank models which learn the preference between items during training. However, they score items independent of other items at runtime. Although the items placed at top of the results by such scoring functions may be independently optimal, they can be sub-optimal as a set. This may lead to a mismatch between the ideal distribution of items in the top results vs what is actually impressed. In this paper, we present methods to address the purchase-impression gap observed in top search results on eCommerce sites. We establish the ideal distribution of items based on historic shopping patterns. We then present a sequential reranker that methodically reranks top search results produced by a conventional pointwise scoring ranker. The reranker produces a reordered list by sequentially selecting candidates trading off between their independent relevance and potential to address the purchase-impression gap by utilizing specially constructed features that capture impression distribution of items already added to a reranked list. The sequential reranker enables addressing purchase impression gap with respect to multiple item aspects. Early version of the reranker showed promising lifts in conversion and engagement metrics at eBay. Based on experiments on randomly sampled validation datasets, we observe that the reranking methodology presented produces around 10% reduction in purchase-impression gap at an average for the top 20 results, while making improvements to conversion metrics.

[57]  arXiv:2010.14571 (cross-list from cs.CL) [pdf, other]
Title: Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Comments: Accepted to COLING 2020. 9 pages with 8 page abstract
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context. LangID is largely treated as solved in the literature, with models reported that achieve over 90% average F1 on as many as 1,366 languages. We train LangID models on up to 1,629 languages with comparable quality on held-out test sets, but find that human-judged LangID accuracy for web-crawl text corpora created using these models is only around 5% for many lower-resource languages, suggesting a need for more robust evaluation. Further analysis revealed a variety of error modes, arising from domain mismatch, class imbalance, language similarity, and insufficiently expressive models. We propose two classes of techniques to mitigate these errors: wordlist-based tunable-precision filters (for which we release curated lists in about 500 languages) and transformer-based semi-supervised LangID models, which increase median dataset precision from 5.5% to 71.2%. These techniques enable us to create an initial data set covering 100K or more relatively clean sentences in each of 500+ languages, paving the way towards a 1,000-language web text corpus.

[58]  arXiv:2010.14575 (cross-list from cs.RO) [pdf]
Title: Learning Time Reduction Using Warm Start Methods for a Reinforcement Learning Based Supervisory Control in Hybrid Electric Vehicle Applications
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Reinforcement Learning (RL) is widely utilized in the field of robotics, and as such, it is gradually being implemented in the Hybrid Electric Vehicle (HEV) supervisory control. Even though RL exhibits excellent performance in terms of fuel consumption minimization in simulation, the large learning iteration number needs a long learning time, making it hardly applicable in real-world vehicles. In addition, the fuel consumption of initial learning phases is much worse than baseline controls. This study aims to reduce the learning iterations of Q-learning in HEV application and improve fuel consumption in initial learning phases utilizing warm start methods. Different from previous studies, which initiated Q-learning with zero or random Q values, this study initiates the Q-learning with different supervisory controls (i.e., Equivalent Consumption Minimization Strategy control and heuristic control), and detailed analysis is given. The results show that the proposed warm start Q-learning requires 68.8% fewer iterations than cold start Q-learning. The trained Q-learning is validated in two different driving cycles, and the results show 10-16% MPG improvement when compared to Equivalent Consumption Minimization Strategy control. Furthermore, real-time feasibility is analyzed, and the guidance of vehicle implementation is provided. The results of this study can be used to facilitate the deployment of RL in vehicle supervisory control applications.

[59]  arXiv:2010.14585 (cross-list from eess.SP) [pdf, other]
Title: Nonlinear State-Space Generalizations of Graph Convolutional Neural Networks
Comments: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Graph convolutional neural networks (GCNNs) learn compositional representations from network data by nesting linear graph convolutions into nonlinearities. In this work, we approach GCNNs from a state-space perspective revealing that the graph convolutional module is a minimalistic linear state-space model, in which the state update matrix is the graph shift operator. We show this state update may be problematic because it is nonparametric, and depending on the graph spectrum it may explode or vanish. Therefore, the GCNN has to trade its degrees of freedom between extracting features from data and handling these instabilities. To improve such trade-off, we propose a novel family of nodal aggregation rules that aggregates node features within a layer in a nonlinear state-space parametric fashion and allowing for a better trade-off. We develop two architectures within this family inspired by the recursive ideas with and without nodal gating mechanisms. The proposed solutions generalize the GCNN and provide an additional handle to control the state update and learn from the data. Numerical results on source localization and authorship attribution show the superiority of the nonlinear state-space generalization models over the baseline GCNN.

[60]  arXiv:2010.14602 (cross-list from cs.SD) [pdf, ps, other]
Title: CopyPaste: An Augmentation Method for Speech Emotion Recognition
Comments: Under ICASSP2021 peer-review
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Data augmentation is a widely used strategy for training robust machine learning models. It partially alleviates the problem of limited data for tasks like speech emotion recognition (SER), where collecting data is expensive and challenging. This study proposes CopyPaste, a perceptually motivated novel augmentation procedure for SER. Assuming that the presence of emotions other than neutral dictates a speaker's overall perceived emotion in a recording, concatenation of an emotional (emotion E) and a neutral utterance can still be labeled with emotion E. We hypothesize that SER performance can be improved using these concatenated utterances in model training. To verify this, three CopyPaste schemes are tested on two deep learning models: one trained independently and another using transfer learning from an x-vector model, a speaker recognition model. We observed that all three CopyPaste schemes improve SER performance on all the three datasets considered: MSP-Podcast, Crema-D, and IEMOCAP. Additionally, CopyPaste performs better than noise augmentation and, using them together improves the SER performance further. Our experiments on noisy test sets suggested that CopyPaste is effective even in noisy test conditions.

[61]  arXiv:2010.14605 (cross-list from cs.NI) [pdf, other]
Title: Beyond Accuracy: Cost-Aware Data Representation Exploration for Network Traffic Model Performance
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

In this paper, we explore how different representations of network traffic affect the performance of machine learning models for a range of network management tasks, including application performance diagnosis and attack detection. We study the relationship between the systems-level costs of different representations of network traffic to the ultimate target performance metric -- e.g., accuracy -- of the models trained from these representations. We demonstrate the benefit of exploring a range of representations of network traffic and present Network Microscope, a proof-of-concept reference implementation that both monitors network traffic at high speed and transforms the traffic in real time to produce a variety of representations for input to machine learning models. Systems like Network Microscope can ultimately help network operators better explore the design space of data representation for learning, balancing systems costs related to feature extraction and model training against resulting model performance.

[62]  arXiv:2010.14611 (cross-list from cs.NE) [pdf, other]
Title: Hybrid Backpropagation Parallel Reservoir Networks
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

In many real-world applications, fully-differentiable RNNs such as LSTMs and GRUs have been widely deployed to solve time series learning tasks. These networks train via Backpropagation Through Time, which can work well in practice but involves a biologically unrealistic unrolling of the network in time for gradient updates, are computationally expensive, and can be hard to tune. A second paradigm, Reservoir Computing, keeps the recurrent weight matrix fixed and random. Here, we propose a novel hybrid network, which we call Hybrid Backpropagation Parallel Echo State Network (HBP-ESN) which combines the effectiveness of learning random temporal features of reservoirs with the readout power of a deep neural network with batch normalization. We demonstrate that our new network outperforms LSTMs and GRUs, including multi-layer "deep" versions of these networks, on two complex real-world multi-dimensional time series datasets: gesture recognition using skeleton keypoints from ChaLearn, and the DEAP dataset for emotion recognition from EEG measurements. We show also that the inclusion of a novel meta-ring structure, which we call HBP-ESN M-Ring, achieves similar performance to one large reservoir while decreasing the memory required by an order of magnitude. We thus offer this new hybrid reservoir deep learning paradigm as a new alternative direction for RNN learning of temporal or sequential data.

[63]  arXiv:2010.14615 (cross-list from cs.NE) [pdf, ps, other]
Title: Discrete-time signatures and randomness in reservoir computing
Comments: 14 pages
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of state-space systems that generate Volterra series expansions. This procedure yields a state-affine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided.

[64]  arXiv:2010.14616 (cross-list from cs.NE) [pdf, other]
Title: Lineage Evolution Reinforcement Learning
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

We propose a general agent population learning system, and on this basis, we propose lineage evolution reinforcement learning algorithm. Lineage evolution reinforcement learning is a kind of derivative algorithm which accords with the general agent population learning system. We take the agents in DQN and its related variants as the basic agents in the population, and add the selection, mutation and crossover modules in the genetic algorithm to the reinforcement learning algorithm. In the process of agent evolution, we refer to the characteristics of natural genetic behavior, add lineage factor to ensure the retention of potential performance of agent, and comprehensively consider the current performance and lineage value when evaluating the performance of agent. Without changing the parameters of the original reinforcement learning algorithm, lineage evolution reinforcement learning can optimize different reinforcement learning algorithms. Our experiments show that the idea of evolution with lineage improves the performance of original reinforcement learning algorithm in some games in Atari 2600.

[65]  arXiv:2010.14620 (cross-list from cs.SI) [pdf, other]
Title: Correlation Robust Influence Maximization
Journal-ref: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Optimization and Control (math.OC)

We propose a distributionally robust model for the influence maximization problem. Unlike the classic independent cascade model \citep{kempe2003maximizing}, this model's diffusion process is adversarially adapted to the choice of seed set. Hence, instead of optimizing under the assumption that all influence relationships in the network are independent, we seek a seed set whose expected influence under the worst correlation, i.e. the "worst-case, expected influence", is maximized. We show that this worst-case influence can be efficiently computed, and though the optimization is NP-hard, a ($1 - 1/e$) approximation guarantee holds. We also analyze the structure to the adversary's choice of diffusion process, and contrast with established models. Beyond the key computational advantages, we also highlight the extent to which the independence assumption may cost optimality, and provide insights from numerical experiments comparing the adversarial and independent cascade model.

[66]  arXiv:2010.14640 (cross-list from cs.DL) [pdf, other]
Title: Improving Text Relationship Modeling with Artificial Data
Comments: 9 pages, 3 figures
Subjects: Digital Libraries (cs.DL); Machine Learning (cs.LG)

Data augmentation uses artificially-created examples to support supervised machine learning, adding robustness to the resulting models and helping to account for limited availability of labelled data. We apply and evaluate a synthetic data approach to relationship classification in digital libraries, generating artificial books with relationships that are common in digital libraries but not easier inferred from existing metadata. We find that for classification on whole-part relationships between books, synthetic data improves a deep neural network classifier by 91%. Further, we consider the ability of synthetic data to learn a useful new text relationship class from fully artificial training data.

[67]  arXiv:2010.14649 (cross-list from cs.CL) [pdf]
Title: Learning Contextualised Cross-lingual Word Embeddings for Extremely Low-Resource Languages Using Parallel Corpora
Comments: 9 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We propose a new approach for learning contextualised cross-lingual word embeddings based only on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTM-based encoder-decoder model that performs bidirectional translation and reconstruction of the input sentence. Through sharing model parameters among different languages, our model jointly trains the word embeddings in a common multilingual space. We also propose a simple method to combine word and subword embeddings to make use of orthographic similarities across different languages. We base our experiments on real-world data from endangered languages, namely Yongning Na, Shipibo-Konibo and Griko. Our experiments on bilingual lexicon induction and word alignment tasks show that our model outperforms existing methods by a large margin for most language pairs. These results demonstrate that, contrary to common belief, an encoder-decoder translation model is beneficial for learning cross-lingual representations, even in extremely low-resource scenarios.

[68]  arXiv:2010.14660 (cross-list from cs.CL) [pdf, other]
Title: DualTKB: A Dual Learning Bridge between Text and Knowledge Base
Comments: Equal Contributions of Authors Pierre L. Dognin, Igor Melnyk, and Inkit Padhi. Accepted at EMNLP'20
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs). We investigate the impact of weak supervision by creating a weakly supervised dataset and show that even a slight amount of supervision can significantly improve the model performance and enable better-quality transfers. We examine different model architectures, and evaluation metrics, proposing a novel Commonsense KB completion metric tailored for generative models. Extensive experimental results show that the proposed method compares very favorably to the existing baselines. This approach is a viable step towards a more advanced system for automatic KB construction/expansion and the reverse operation of KB conversion to coherent textual descriptions.

[69]  arXiv:2010.14694 (cross-list from econ.EM) [pdf, other]
Title: Deep Learning for Individual Heterogeneity
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

We propose a methodology for effectively modeling individual heterogeneity using deep learning while still retaining the interpretability and economic discipline of classical models. We pair a transparent, interpretable modeling structure with rich data environments and machine learning methods to estimate heterogeneous parameters based on potentially high dimensional or complex observable characteristics. Our framework is widely-applicable, covering numerous settings of economic interest. We recover, as special cases, well-known examples such as average treatment effects and parametric components of partially linear models. However, we also seamlessly deliver new results for diverse examples such as price elasticities, willingness-to-pay, and surplus measures in choice models, average marginal and partial effects of continuous treatment variables, fractional outcome models, count data, heterogeneous production function components, and more. Deep neural networks are well-suited to structured modeling of heterogeneity: we show how the network architecture can be designed to match the global structure of the economic model, giving novel methodology for deep learning as well as, more formally, improved rates of convergence. Our results on deep learning have consequences for other structured modeling environments and applications, such as for additive models. Our inference results are based on an influence function we derive, which we show to be flexible enough to to encompass all settings with a single, unified calculation, removing any requirement for case-by-case derivations. The usefulness of the methodology in economics is shown in two empirical applications: the response of 410(k) participation rates to firm matching and the impact of prices on subscription choices for an online service. Extensions to instrumental variables and multinomial choices are shown.

[70]  arXiv:2010.14709 (cross-list from cs.SD) [pdf, other]
Title: Melody-Conditioned Lyrics Generation with SeqGANs
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Automatic lyrics generation has received attention from both music and AI communities for years. Early rule-based approaches have~---due to increases in computational power and evolution in data-driven models---~mostly been replaced with deep-learning-based systems. Many existing approaches, however, either rely heavily on prior knowledge in music and lyrics writing or oversimplify the task by largely discarding melodic information and its relationship with the text. We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN), which generates a line of lyrics given the corresponding melody as the input. Furthermore, we investigate the performance of the generator with an additional input condition: the theme or overarching topic of the lyrics to be generated. We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.

[71]  arXiv:2010.14712 (cross-list from cs.RO) [pdf, other]
Title: Socially-Compatible Behavior Design of Autonomous Vehicles with Verification on Real Human Data
Comments: 9 pages, 10 figure, submitted to IEEE Robotics and Automation Letters (RA-L) and 2021 IEEE International Conference on Robotics and Automation (ICRA 2021)
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As more and more autonomous vehicles (AVs) are being deployed on public roads, designing socially compatible behaviors for them is of critical importance. Based on observations, AVs need to predict the future behaviors of other traffic participants, and be aware of the uncertainties associated with such prediction so that safe, efficient, and human-like motions can be generated. In this paper, we propose an integrated prediction and planning framework that allows the AVs to online infer the characteristics of other road users and generate behaviors optimizing not only their own rewards, but also their courtesy to others, as well as their confidence on the consequences in the presence of uncertainties. Based on the definitions of courtesy and confidence, we explore the influences of such factors on the behaviors of AVs in interactive driving scenarios. Moreover, we evaluate the proposed algorithm on naturalistic human driving data by comparing the generated behavior with the ground truth. Results show that the online inference can significantly improve the human-likeness of the generated behaviors. Furthermore, we find that human drivers show great courtesy to others, even for those without right-of-way.

[72]  arXiv:2010.14713 (cross-list from cs.CV) [pdf, other]
Title: CompRess: Self-Supervised Learning by Compressing Representations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Self-supervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from self-supervised learning than smaller models. As a result, the gap between supervised and self-supervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for self-supervised learning, we develop a model compression method to compress an already learned, deep self-supervised model (teacher) to a smaller one (student). We train the student model so that it mimics the relative similarity between the data points in the teacher's embedding space. For AlexNet, our method outperforms all previous methods including the fully supervised model on ImageNet linear evaluation (59.0% compared to 56.5%) and on nearest neighbor evaluation (50.7% compared to 41.4%). To the best of our knowledge, this is the first time a self-supervised AlexNet has outperformed supervised one on ImageNet classification. Our code is available here: https://github.com/UMBCvision/CompRess

[73]  arXiv:2010.14731 (cross-list from cs.CV) [pdf, other]
Title: MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images
Comments: 5 pages, 3 figures, 2 tables; Submitted to ISBI 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Semi-supervised learning via learning from limited quantities of labeled data has been investigated as an alternative to supervised counterparts. Maximizing knowledge gains from copious unlabeled data benefit semi-supervised learning settings. Moreover, learning multiple tasks within the same model further improves model generalizability. We propose a novel multitask learning model, namely MultiMix, which jointly learns disease classification and anatomical segmentation in a sparingly supervised manner, while preserving explainability through bridge saliency between the two tasks. Our extensive experimentation with varied quantities of labeled data in the training sets justify the effectiveness of our multitasking model for the classification of pneumonia and segmentation of lungs from chest X-ray images. Moreover, both in-domain and cross-domain evaluations across the tasks further showcase the potential of our model to adapt to challenging generalization scenarios.

[74]  arXiv:2010.14734 (cross-list from stat.CO) [pdf, other]
Title: Generalized eigen, singular value, and partial least squares decompositions: The GSVD package
Authors: Derek Beaton (1) ((1) Rotman Research Institute, Baycrest Health Sciences)
Comments: 38 pages, 9 figures, 3 tables
Subjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME)

The generalized singular value decomposition (GSVD, a.k.a. "SVD triplet", "duality diagram" approach) provides a unified strategy and basis to perform nearly all of the most common multivariate analyses (e.g., principal components, correspondence analysis, multidimensional scaling, canonical correlation, partial least squares). Though the GSVD is ubiquitous, powerful, and exible, it has very few implementations. Here I introduce the GSVD package for R. The general goal of GSVD is to provide a small set of accessible functions to perform the GSVD and two other related decompositions (generalized eigenvalue decomposition, generalized partial least squares-singular value decomposition). Furthermore, GSVD helps provide a more unified conceptual approach and nomenclature to many techniques. I first introduce the concept of the GSVD, followed by a formal definition of the generalized decompositions. Next I provide some key decisions made during development, and then a number of examples of how to use GSVD to implement various statistical techniques. These examples also illustrate one of the goals of GSVD: how others can (or should) build analysis packages that depend on GSVD. Finally, I discuss the possible future of GSVD.

[75]  arXiv:2010.14784 (cross-list from cs.CL) [pdf]
Title: A Chinese Text Classification Method With Low Hardware Requirement Based on Improved Model Concatenation
Authors: Yuanhao Zhuo
Comments: 5 pages, 2 figures, 5 tables
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In order to improve the accuracy performance of Chinese text classification models with low hardware requirements, an improved concatenation-based model is designed in this paper, which is a concatenation of 5 different sub-models, including TextCNN, LSTM, and Bi-LSTM. Compared with the existing ensemble learning method, for a text classification mission, this model's accuracy is 2% higher. Meanwhile, the hardware requirements of this model are much lower than the BERT-based model.

[76]  arXiv:2010.14793 (cross-list from cs.CV) [pdf, other]
Title: Class-Agnostic Segmentation Loss and Its Application to Salient Object Detection and Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper we present a novel loss function, called class-agnostic segmentation (CAS) loss. With CAS loss the class descriptors are learned during training of the network. We don't require to define the label of a class a-priori, rather the CAS loss clusters regions with similar appearance together in a weakly-supervised manner. Furthermore, we show that the CAS loss function is sparse, bounded, and robust to class-imbalance. We apply our CAS loss function with fully-convolutional ResNet101 and DeepLab-v3 architectures to the binary segmentation problem of salient object detection. We investigate the performance against the state-of-the-art methods in two settings of low and high-fidelity training data on seven salient object detection datasets. For low-fidelity training data (incorrect class label) class-agnostic segmentation loss outperforms the state-of-the-art methods on salient object detection datasets by staggering margins of around 50%. For high-fidelity training data (correct class labels) class-agnostic segmentation models perform as good as the state-of-the-art approaches while beating the state-of-the-art methods on most datasets. In order to show the utility of the loss function across different domains we also test on general segmentation dataset, where class-agnostic segmentation loss outperforms cross-entropy based loss by huge margins on both region and edge metrics.

[77]  arXiv:2010.14810 (cross-list from cs.CV) [pdf, other]
Title: Cycle-Contrast for Self-Supervised Video Representation Learning
Comments: 12 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We present Cycle-Contrastive Learning (CCL), a novel self-supervised method for learning video representation. Following a nature that there is a belong and inclusion relation of video and its frames, CCL is designed to find correspondences across frames and videos considering the contrastive representation in their domains respectively. It is different from recent approaches that merely learn correspondences across frames or clips. In our method, the frame and video representations are learned from a single network based on an R3D architecture, with a shared non-linear transformation for embedding both frame and video features before the cycle-contrastive loss. We demonstrate that the video representation learned by CCL can be transferred well to downstream tasks of video understanding, outperforming previous methods in nearest neighbour retrieval and action recognition tasks on UCF101, HMDB51 and MMAct.

[78]  arXiv:2010.14824 (cross-list from cs.CG) [pdf]
Title: Explainable Artificial Intelligence for Manufacturing Cost Estimation and Machining Feature Visualization
Subjects: Computational Geometry (cs.CG); Machine Learning (cs.LG)

Studies on manufacturing cost prediction based on deep learning have begun in recent years, but the cost prediction rationale cannot be explained because the models are still used as a black box. This study aims to propose a manufacturing cost prediction process for 3D computer-aided design (CAD) models using explainable artificial intelligence. The proposed process can visualize the machining features of the 3D CAD model that are influencing the increase in manufacturing costs. The proposed process consists of (1) data collection and pre-processing, (2) 3D deep learning architecture exploration, and (3) visualization to explain the prediction results. The proposed deep learning model shows high predictability of manufacturing cost for the computer numerical control (CNC) machined parts. In particular, using 3D gradient-weighted class activation mapping proves that the proposed model not only can detect the CNC machining features but also can differentiate the machining difficulty for the same feature. Using the proposed process, we can provide a design guidance to engineering designers in reducing manufacturing costs during the conceptual design phase. We can also provide real-time quotations and redesign proposals to online manufacturing platform customers.

[79]  arXiv:2010.14860 (cross-list from stat.ML) [pdf, other]
Title: The Evidence Lower Bound of Variational Autoencoders Converges to a Sum of Three Entropies
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The central objective function of a variational autoencoder (VAE) is its variational lower bound. Here we show that for standard VAEs the variational bound is at convergence equal to the sum of three entropies: the (negative) entropy of the latent distribution, the expected (negative) entropy of the observable distribution, and the average entropy of the variational distributions. Our derived analytical results are exact and apply for small as well as complex neural networks for decoder and encoder. Furthermore, they apply for finite and infinitely many data points and at any stationary point (including local and global maxima). As a consequence, we show that the variance parameters of encoder and decoder play the key role in determining the values of variational bounds at convergence. Furthermore, the obtained results can allow for closed-form analytical expressions at convergence, which may be unexpected as neither variational bounds of VAEs nor log-likelihoods of VAEs are closed-form during learning. As our main contribution, we provide the proofs for convergence of standard VAEs to sums of entropies. Furthermore, we numerically verify our analytical results and discuss some potential applications. The obtained equality to entropy sums provides novel information on those points in parameter space that variational learning converges to. As such, we believe they can potentially significantly contribute to our understanding of established as well as novel VAE approaches.

[80]  arXiv:2010.14863 (cross-list from cond-mat.dis-nn) [pdf, other]
Title: High-dimensional inference: a statistical mechanics perspective
Authors: Jean Barbier
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Information Theory (cs.IT); Machine Learning (cs.LG)

Statistical inference is the science of drawing conclusions about some system from data. In modern signal processing and machine learning, inference is done in very high dimension: very many unknown characteristics about the system have to be deduced from a lot of high-dimensional noisy data. This "high-dimensional regime" is reminiscent of statistical mechanics, which aims at describing the macroscopic behavior of a complex system based on the knowledge of its microscopic interactions. It is by now clear that there are many connections between inference and statistical physics. This article aims at emphasizing some of the deep links connecting these apparently separated disciplines through the description of paradigmatic models of high-dimensional inference in the language of statistical mechanics. This article has been published in the issue on artificial intelligence of Ithaca, an Italian popularization-of-science journal. The selected topics and references are highly biased and not intended to be exhaustive in any ways. Its purpose is to serve as introduction to statistical mechanics of inference through a very specific angle that corresponds to my own tastes and limited knowledge.

[81]  arXiv:2010.14877 (cross-list from stat.ML) [pdf, other]
Title: Hierarchical Gaussian Processes with Wasserstein-2 Kernels
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We investigate the usefulness of Wasserstein-2 kernels in the context of hierarchical Gaussian Processes. Stemming from an observation that stacking Gaussian Processes severely diminishes the model's ability to detect outliers, which when combined with non-zero mean functions, further extrapolates low variance to regions with low training data density, we posit that directly taking into account the variance in the computation of Wasserstein-2 kernels is of key importance towards maintaining outlier status as we progress through the hierarchy. We propose two new models operating in Wasserstein space which can be seen as equivalents to Deep Kernel Learning and Deep GPs. Through extensive experiments, we show improved performance on large scale datasets and improved out-of-distribution detection on both toy and real data.

[82]  arXiv:2010.14881 (cross-list from eess.IV) [pdf]
Title: Medical Deep Learning -- A systematic Meta-Review
Comments: 46 pages, 4 tables, 150 references
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Deep learning had a remarkable impact in different scientific disciplines during the last years. This was demonstrated in numerous tasks, where deep learning algorithms were able to outperform the state-of-art methods, also in image processing and analysis. Moreover, deep learning delivers good results in tasks like autonomous driving, which could not have been performed automatically before. There are even applications where deep learning outperformed humans, like object recognition or games. Another field in which this development is showing a huge potential is the medical domain. With the collection of large quantities of patient records and data, and a trend towards personalized treatments, there is a great need for an automatic and reliable processing and analysis of this information. Patient data is not only collected in clinical centres, like hospitals, but it relates also to data coming from general practitioners, healthcare smartphone apps or online websites, just to name a few. This trend resulted in new, massive research efforts during the last years. In Q2/2020, the search engine PubMed returns already over 11.000 results for the search term $'$deep learning$'$, and around 90% of these publications are from the last three years. Hence, a complete overview of the field of $'$medical deep learning$'$ is almost impossible to obtain and getting a full overview of medical sub-fields gets increasingly more difficult. Nevertheless, several review and survey articles about medical deep learning have been presented within the last years. They focused, in general, on specific medical scenarios, like the analysis of medical images containing specific pathologies. With these surveys as foundation, the aim of this contribution is to provide a very first high-level, systematic meta-review of medical deep learning surveys.

[83]  arXiv:2010.14903 (cross-list from cs.CY) [pdf, other]
Title: A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Influenza is an acute respiratory seasonal disease that affects millions of people worldwide and causes thousands of deaths in Europe alone. Being able to estimate in a fast and reliable way the impact of an illness on a given country is essential to plan and organize effective countermeasures, which is now possible by leveraging unconventional data sources like web searches and visits. In this study, we show the feasibility of exploiting information about Wikipedia's page views of a selected group of articles and machine learning models to obtain accurate estimates of influenza-like illnesses incidence in four European countries: Italy, Germany, Belgium, and the Netherlands. We propose a novel language-agnostic method, based on two algorithms, Personalized PageRank and CycleRank, to automatically select the most relevant Wikipedia pages to be monitored without the need for expert supervision. We then show how our model is able to reach state-of-the-art results by comparing it with previous solutions.

[84]  arXiv:2010.14921 (cross-list from cs.OH) [pdf]
Title: Comparison Analysis of Tree Based and Ensembled Regression Algorithms for Traffic Accident Severity Prediction
Subjects: Other Computer Science (cs.OH); Machine Learning (cs.LG)

Rapid increase of traffic volume on urban roads over time has changed the traffic scenario globally. It has also increased the ratio of road accidents that can be severe and fatal in the worst case. To improve traffic safety and its management on urban roads, there is a need for prediction of severity level of accidents. Various machine learning models are being used for accident prediction. In this study, tree based ensemble models (Random Forest, AdaBoost, Extra Tree, and Gradient Boosting) and ensemble of two statistical models (Logistic Regression Stochastic Gradient Descent) as voting classifiers are compared for prediction of road accident severity. Significant features that are strongly correlated with the accident severity are identified by Random Forest. Analysis proved Random Forest as the best performing model with highest classification results with 0.974 accuracy, 0.954 precision, 0.930 recall and 0.942 F-score using 20 most significant features as compared to other techniques classification of road accidents severity.

[85]  arXiv:2010.14925 (cross-list from cs.CV) [pdf, other]
Title: MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis
Comments: Code and dataset are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.

[86]  arXiv:2010.14928 (cross-list from stat.ML) [pdf, other]
Title: Particle gradient descent model for point process generation
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR)

This paper introduces a generative model for planar point processes in a square window, built upon a single realization of a stationary, ergodic point process observed in this window. Inspired by recent advances in gradient descent methods for maximum entropy models, we propose a method to generate similar point patterns by jointly moving particles of an initial Poisson configuration towards a target counting measure. The target measure is generated via a deterministic gradient descent algorithm, so as to match a set of statistics of the given, observed realization. Our statistics are estimators of the multi-scale wavelet phase harmonic covariance, recently proposed in image modeling. They allow one to capture geometric structures through multi-scale interactions between wavelet coefficients. Both our statistics and the gradient descent algorithm scale better with the number of observed points than the classical k-nearest neighbour distances previously used in generative models for point processes, based on the rejection sampling or simulated-annealing. The overall quality of our model is evaluated on point processes with various geometric structures through spectral and topological data analysis.

[87]  arXiv:2010.14933 (cross-list from eess.IV) [pdf, other]
Title: Generative Tomography Reconstruction
Comments: Accepted as a poster for the NeurIPS 2020 Workshop on Deep Learning and Inverse Problems
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Numerical Analysis (math.NA)

We propose an end-to-end differentiable architecture for tomography reconstruction that directly maps a noisy sinogram into a denoised reconstruction. Compared to existing approaches our end-to-end architecture produces more accurate reconstructions while using less parameters and time. We also propose a generative model that, given a noisy sinogram, can sample realistic reconstructions. This generative model can be used as prior inside an iterative process that, by taking into consideration the physical model, can reduce artifacts and errors in the reconstructions.

[88]  arXiv:2010.14943 (cross-list from eess.SP) [pdf, other]
Title: An Approach for GCI Fusion With Labeled Multitarget Densities
Comments: 12 pages, 6 figures, submitted to IEEE Transactions on Signal Processing
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

This paper addresses the Generalized Covariance Intersection (GCI) fusion method for labeled random finite sets. We propose a joint label space for the support of fused labeled random finite sets to represent the label association between different agents, avoiding the label consistency condition for the label-wise GCI fusion algorithm. Specifically, we devise the joint label space by the direct product of all label spaces for each agent. Then we apply the GCI fusion method to obtain the joint labeled multi-target density. The joint labeled RFS is then marginalized into a general labeled RFS, providing that each target is represented by a single Bernoulli component with a unique label. The joint labeled GCI (JL-GCI) for fusing LMB RFSs from different agents is demonstrated. We also propose the simplified JL-GCI method given the assumption that targets are well-separated in the scenario. The simulation result presents the effectiveness of label inconsistency and excellent performance in challenging tracking scenarios.

[89]  arXiv:2010.14977 (cross-list from cs.CV) [pdf, other]
Title: Real-time Tropical Cyclone Intensity Estimation by Handling Temporally Heterogeneous Satellite Data
Comments: under review of AAAI 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Analyzing big geophysical observational data collected by multiple advanced sensors on various satellite platforms promotes our understanding of the geophysical system. For instance, convolutional neural networks (CNN) have achieved great success in estimating tropical cyclone (TC) intensity based on satellite data with fixed temporal frequency (e.g., 3 h). However, to achieve more timely (under 30 min) and accurate TC intensity estimates, a deep learning model is demanded to handle temporally-heterogeneous satellite observations. Specifically, infrared (IR1) and water vapor (WV) images are available under every 15 minutes, while passive microwave rain rate (PMW) is available for about every 3 hours. Meanwhile, the visible (VIS) channel is severely affected by noise and sunlight intensity, making it difficult to be utilized. Therefore, we propose a novel framework that combines generative adversarial network (GAN) with CNN. The model utilizes all data, including VIS and PMW information, during the training phase and eventually uses only the high-frequent IR1 and WV data for providing intensity estimates during the predicting phase. Experimental results demonstrate that the hybrid GAN-CNN framework achieves comparable precision to the state-of-the-art models, while possessing the capability of increasing the maximum estimation frequency from 3 hours to less than 15 minutes.

[90]  arXiv:2010.15040 (cross-list from stat.ML) [pdf, other]
Title: Training Generative Adversarial Networks by Solving Ordinary Differential Equations
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly stable. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training - when combined with a regulariser that controls the integration error. Our approach represents a radical departure from previous methods which typically use adaptive optimisation and stabilisation techniques that constrain the functional space (e.g. Spectral Normalisation). Evaluation on CIFAR-10 and ImageNet shows that our method outperforms several strong baselines, demonstrating its efficacy.

[91]  arXiv:2010.15045 (cross-list from cs.NE) [pdf, other]
Title: A multi-agent model for growing spiking neural networks
Comments: 79 pages. Master's thesis
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Artificial Intelligence has looked into biological systems as a source of inspiration. Although there are many aspects of the brain yet to be discovered, neuroscience has found evidence that the connections between neurons continuously grow and reshape as a part of the learning process. This differs from the design of Artificial Neural Networks, that achieve learning by evolving the weights in the synapses between them and their topology stays unaltered through time.
This project has explored rules for growing the connections between the neurons in Spiking Neural Networks as a learning mechanism. These rules have been implemented on a multi-agent system for creating simple logic functions, that establish a base for building up more complex systems and architectures. Results in a simulation environment showed that for a given set of parameters it is possible to reach topologies that reproduce the tested functions.
This project also opens the door to the usage of techniques like genetic algorithms for obtaining the best suited values for the model parameters, and hence creating neural networks that can adapt to different functions.

[92]  arXiv:2010.15049 (cross-list from eess.AS) [pdf, other]
Title: Optimizing Short-Time Fourier Transform Parameters via Gradient Descent
Comments: Submitted for ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

The Short-Time Fourier Transform (STFT) has been a staple of signal processing, often being the first step for many audio tasks. A very familiar process when using the STFT is the search for the best STFT parameters, as they often have significant side effects if chosen poorly. These parameters are often defined in terms of an integer number of samples, which makes their optimization non-trivial. In this paper we show an approach that allows us to obtain a gradient for STFT parameters with respect to arbitrary cost functions, and thus enable the ability to employ gradient descent optimization of quantities like the STFT window length, or the STFT hop size. We do so for parameter values that stay constant throughout an input, but also for cases where these parameters have to dynamically change over time to accommodate varying signal characteristics.

[93]  arXiv:2010.15058 (cross-list from cs.NE) [pdf, other]
Title: Measuring non-trivial compositionality in emergent communication
Comments: 4th Workshop on Emergent Communication, NeurIPS 2020
Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Machine Learning (cs.LG)

Compositionality is an important explanatory target in emergent communication and language evolution. The vast majority of computational models of communication account for the emergence of only a very basic form of compositionality: trivial compositionality. A compositional protocol is trivially compositional if the meaning of a complex signal (e.g. blue circle) boils down to the intersection of meanings of its constituents (e.g. the intersection of the set of blue objects and the set of circles). A protocol is non-trivially compositional (NTC) if the meaning of a complex signal (e.g. biggest apple) is a more complex function of the meanings of their constituents. In this paper, we review several metrics of compositionality used in emergent communication and experimentally show that most of them fail to detect NTC - i.e. they treat non-trivial compositionality as a failure of compositionality. The one exception is tree reconstruction error, a metric motivated by formal accounts of compositionality. These results emphasise important limitations of emergent communication research that could hamper progress on modelling the emergence of NTC.

[94]  arXiv:2010.15065 (cross-list from q-bio.BM) [pdf, other]
Title: Fixed-Length Protein Embeddings using Contextual Lenses
Subjects: Biomolecules (q-bio.BM); Computation and Language (cs.CL); Machine Learning (cs.LG)

The Basic Local Alignment Search Tool (BLAST) is currently the most popular method for searching databases of biological sequences. BLAST compares sequences via similarity defined by a weighted edit distance, which results in it being computationally expensive. As opposed to working with edit distance, a vector similarity approach can be accelerated substantially using modern hardware or hashing techniques. Such an approach would require fixed-length embeddings for biological sequences. There has been recent interest in learning fixed-length protein embeddings using deep learning models under the hypothesis that the hidden layers of supervised or semi-supervised models could produce potentially useful vector embeddings. We consider transformer (BERT) protein language models that are pretrained on the TrEMBL data set and learn fixed-length embeddings on top of them with contextual lenses. The embeddings are trained to predict the family a protein belongs to for sequences in the Pfam database. We show that for nearest-neighbor family classification, pretraining offers a noticeable boost in performance and that the corresponding learned embeddings are competitive with BLAST. Furthermore, we show that the raw transformer embeddings, obtained via static pooling, do not perform well on nearest-neighbor family classification, which suggests that learning embeddings in a supervised manner via contextual lenses may be a compute-efficient alternative to fine-tuning.

[95]  arXiv:2010.15067 (cross-list from cs.CL) [pdf, ps, other]
Title: Graph-based Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News Articles
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify unstructured corpora of text into `topics' that stem intrinsically from content similarity. Here we present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning that can reveal natural partitions at different resolutions without making a priori assumptions about the number of clusters in the corpus. We show the advantages of graph-based clustering through end-to-end comparisons with other popular clustering and topic modelling methods, and also evaluate different text vector embeddings, from classic Bag-of-Words to Doc2Vec to the recent transformers based model Bert. This comparative work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.

[96]  arXiv:2010.15090 (cross-list from cs.CL) [pdf, other]
Title: Handling Class Imbalance in Low-Resource Dialogue Systems by Combining Few-Shot Classification and Interpolation
Comments: 5 pages, 4 figures, 3 tables
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Utterance classification performance in low-resource dialogue systems is constrained by an inevitably high degree of data imbalance in class labels. We present a new end-to-end pairwise learning framework that is designed specifically to tackle this phenomenon by inducing a few-shot classification capability in the utterance representations and augmenting data through an interpolation of utterance representations. Our approach is a general purpose training methodology, agnostic to the neural architecture used for encoding utterances. We show significant improvements in macro-F1 score over standard cross-entropy training for three different neural architectures, demonstrating improvements on a Virtual Patient dialogue dataset as well as a low-resourced emulation of the Switchboard dialogue act classification dataset.

[97]  arXiv:2010.15111 (cross-list from q-fin.ST) [pdf, other]
Title: Evaluating data augmentation for financial time series classification
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG)

Data augmentation methods in combination with deep neural networks have been used extensively in computer vision on classification tasks, achieving great success; however, their use in time series classification is still at an early stage. This is even more so in the field of financial prediction, where data tends to be small, noisy and non-stationary. In this paper we evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models. The results show that several augmentation methods significantly improve financial performance when used in combination with a trading strategy. For a relatively small dataset ($\approx30K$ samples), augmentation methods achieve up to $400\%$ improvement in risk adjusted return performance; for a larger stock dataset ($\approx300K$ samples), results show up to $40\%$ improvement.

Replacements for Thu, 29 Oct 20

[98]  arXiv:1811.12823 (replaced) [pdf, other]
Title: Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (stat.ML)
[99]  arXiv:1903.11991 (replaced) [pdf, other]
Title: Parabolic Approximation Line Search for DNNs
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[100]  arXiv:1910.09089 (replaced) [pdf, other]
Title: Multi-player Multi-Armed Bandits with non-zero rewards on collisions for uncoordinated spectrum access
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
[101]  arXiv:2001.03985 (replaced) [pdf, other]
Title: Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial Sampling
Comments: Bas van Opheusden and Luigi Acerbi contributed equally to this work
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)
[102]  arXiv:2003.02821 (replaced) [pdf, other]
Title: What went wrong and when? Instance-wise Feature Importance for Time-series Models
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[103]  arXiv:2003.03977 (replaced) [pdf, other]
Title: Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[104]  arXiv:2004.03083 (replaced) [pdf, other]
Title: Direct loss minimization algorithms for sparse Gaussian processes
Comments: 31 pages, 16 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[105]  arXiv:2006.01893 (replaced) [pdf, other]
Title: Unsupervised Discretization by Two-dimensional MDL-based Histogram
Comments: 30 pages, 9 figures, submitted to Machine Learning Journal
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[106]  arXiv:2006.07214 (replaced) [pdf, other]
Title: Sparse and Continuous Attention Mechanisms
Comments: Accepted for spotlight presentation at NeurIPS 2020
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[107]  arXiv:2006.07361 (replaced) [pdf, other]
Title: Gaussian Processes on Graphs via Spectral Kernel Learning
Comments: 13 pages, 5 Figures
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[108]  arXiv:2006.07507 (replaced) [pdf, other]
Title: Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[109]  arXiv:2006.07710 (replaced) [pdf, other]
Title: The Pitfalls of Simplicity Bias in Neural Networks
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[110]  arXiv:2006.08149 (replaced) [pdf, other]
Title: GNNGuard: Defending Graph Neural Networks against Adversarial Attacks
Comments: Accepted by NeurIPS 2020. More info about GNNGuard: this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[111]  arXiv:2006.08877 (replaced) [pdf, other]
Title: Practical Quasi-Newton Methods for Training Deep Neural Networks
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[112]  arXiv:2006.10255 (replaced) [pdf, other]
Title: Calibrated Reliable Regression using Maximum Mean Discrepancy
Comments: Accepted to NeurIPS'2020. Full version with appendix
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[113]  arXiv:2006.12972 (replaced) [pdf, ps, other]
Title: Sparse Symplectically Integrated Neural Networks
Comments: Accepted as a conference paper to NeurIPS 2020. Main paper has 9 pages and 4 figures
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)
[114]  arXiv:2007.00211 (replaced) [pdf, other]
Title: Ultrahyperbolic Representation Learning
Authors: Marc T. Law, Jos Stam
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[115]  arXiv:2007.08199 (replaced) [pdf, other]
Title: Learning from Noisy Labels with Deep Neural Networks: A Survey
Comments: If your paper is highly related, but it is missing, please contact me: songhwanjun@kaist.ac.kr
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[116]  arXiv:2007.11838 (replaced) [pdf, other]
Title: PClean: Bayesian Data Cleaning at Scale with Domain-Specific Probabilistic Programming
Comments: Correct formatting error
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation (stat.CO); Machine Learning (stat.ML)
[117]  arXiv:2008.00645 (replaced) [pdf, other]
Title: Active Classification with Uncertainty Comparison Queries
Comments: Code and Dataset: this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[118]  arXiv:2008.00938 (replaced) [pdf, other]
Title: Implicit Regularization via Neural Feature Alignment
Comments: 27 pages including appendices. Submitted to AISTATS 2021. A preliminary version of this work has been presented at the NeurIPS 2019 Workshops on "Machine Learning with Guarantees" and "Science meets Engineering of Deep Learning"
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[119]  arXiv:2009.00707 (replaced) [pdf, ps, other]
Title: Pursuing a Prospective Perspective
Authors: Steven Kearnes
Subjects: Machine Learning (cs.LG)
[120]  arXiv:2009.12682 (replaced) [pdf, other]
Title: Decision-Aware Conditional GANs for Time Series Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[121]  arXiv:2010.05113 (replaced) [pdf]
Title: Contrastive Representation Learning: A Framework and Review
Comments: 28 pages, 9 figures, update with the accepted version in IEEE Access
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[122]  arXiv:2010.07154 (replaced) [pdf, other]
Title: Learning Deep Features in Instrumental Variable Regression
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[123]  arXiv:2010.09546 (replaced) [pdf, other]
Title: Model-based Policy Optimization with Unsupervised Model Adaptation
Comments: Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[124]  arXiv:2010.13764 (replaced) [pdf, other]
Title: Enforcing Interpretability and its Statistical Impacts: Trade-offs between Accuracy and Interpretability
Comments: 12 pages; minor edits
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[125]  arXiv:1902.04495 (replaced) [pdf, other]
Title: The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy
Comments: 33 pages, 4 figures
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[126]  arXiv:1905.11814 (replaced) [pdf, other]
Title: Shredder: Learning Noise Distributions to Protect Inference Privacy
Comments: Presented in ASPLOS 2020
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[127]  arXiv:1906.01558 (replaced) [pdf, other]
Title: Disentangling neural mechanisms for perceptual grouping
Comments: Published in ICLR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[128]  arXiv:1906.02944 (replaced) [pdf, other]
Title: Learning Adaptive Classifiers Synthesis for Generalized Few-Shot Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[129]  arXiv:1908.11472 (replaced) [pdf, other]
Title: Kinematic Single Vehicle Trajectory Prediction Baselines and Applications with the NGSIM Dataset
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[130]  arXiv:1909.09541 (replaced) [pdf, other]
Title: A Transfer Learning Approach for Automated Segmentation of Prostate Whole Gland and Transition Zone in Diffusion Weighted MRI
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[131]  arXiv:2001.10280 (replaced) [pdf, other]
Title: Reservoir computing model of two-dimensional turbulent convection
Comments: 16 pages, 12 figures
Subjects: Fluid Dynamics (physics.flu-dyn); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[132]  arXiv:2002.00291 (replaced) [pdf, ps, other]
Title: Oracle lower bounds for stochastic gradient sampling algorithms
Comments: 21 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[133]  arXiv:2002.03657 (replaced) [pdf, other]
Title: Semialgebraic Optimization for Lipschitz Constants of ReLU Networks
Comments: NeurIPS 2020
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[134]  arXiv:2002.05049 (replaced) [pdf, other]
Title: Detect and Correct Bias in Multi-Site Neuroimaging Datasets
Journal-ref: Medical Image Analysis, 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[135]  arXiv:2002.10107 (replaced) [pdf]
Title: Predicting Subjective Features of Questions of QA Websites using BERT
Comments: 5 pages, 4 figures, 2 tables
Journal-ref: 2020 6th International Conference on Web Research (ICWR), Tehran, Iran, 2020, pp. 240-244
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[136]  arXiv:2002.11843 (replaced) [pdf, ps, other]
Title: A Deep Unsupervised Feature Learning Spiking Neural Network with Binarized Classification Layers for EMNIST Classification using SpykeFlow
Comments: A section of of this work is Submitted to IEEE TETCI 2020 Journal
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
[137]  arXiv:2003.02395 (replaced) [pdf, other]
Title: A Simple Convergence Proof of Adam and Adagrad
Comments: 24 pages, 1 figures, preprint version
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[138]  arXiv:2003.03167 (replaced) [pdf, other]
Title: When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs)
Comments: Published in Applied Sciences
Journal-ref: Appl. Sci. 2020, 10(21), 7524
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[139]  arXiv:2003.08355 (replaced) [pdf, other]
Title: Dynamic Point Cloud Denoising via Manifold-to-Manifold Distance
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[140]  arXiv:2004.08046 (replaced) [pdf, other]
Title: Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete Space
Comments: Accepted to EMNLP 2020 Findings
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[141]  arXiv:2005.00159 (replaced) [pdf, other]
Title: Why and when should you pool? Analyzing Pooling in Recurrent Architectures
Comments: Accepted to Findings of EMNLP 2020, to be presented at BlackBoxNLP. Updated Version
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[142]  arXiv:2005.12368 (replaced) [pdf, other]
Title: FT Speech: Danish Parliament Speech Corpus
Comments: Accepted at Interspeech 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143]  arXiv:2006.05168 (replaced) [pdf, other]
Title: Manifold structure in graph embeddings
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[144]  arXiv:2006.05645 (replaced) [pdf, other]
Title: Hypergraph Clustering for Finding Diverse and Experienced Groups
Comments: Added new experiments and refocused around diversity
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)
[145]  arXiv:2006.07397 (replaced) [pdf, other]
Title: The DeepFake Detection Challenge (DFDC) Dataset
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[146]  arXiv:2006.07506 (replaced) [pdf, other]
Title: Uncertainty Quantification for Inferring Hawkes Networks
Comments: 16 pages including appendix, 1 figure, accepted to 2020 Neurips
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[147]  arXiv:2006.11132 (replaced) [pdf, other]
Title: Deep Transformation-Invariant Clustering
Comments: Accepted at NeurIPS 2020 (oral). Project webpage: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[148]  arXiv:2006.11313 (replaced) [pdf, other]
Title: Information theoretic limits of learning a sparse rule
Comments: 56 pages, 4 figures, accepted to the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Extended version that includes the supplementary material
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
[149]  arXiv:2006.12504 (replaced) [pdf, other]
Title: The GCE in a New Light: Disentangling the $γ$-ray Sky with Bayesian Graph Convolutional Neural Networks
Comments: 7+47 pages, 2+36 figures, accepted by Phys. Rev. Lett
Subjects: High Energy Astrophysical Phenomena (astro-ph.HE); Cosmology and Nongalactic Astrophysics (astro-ph.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph)
[150]  arXiv:2007.01722 (replaced) [pdf, ps, other]
Title: Learning Utilities and Equilibria in Non-Truthful Auctions
Authors: Hu Fu, Tao Lin
Comments: NeurIPS 2020
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Theoretical Economics (econ.TH)
[151]  arXiv:2007.03285 (replaced) [pdf, other]
Title: Stochastic Linear Bandits Robust to Adversarial Attacks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[152]  arXiv:2007.08926 (replaced) [pdf, ps, other]
Title: Smart Choices and the Selection Monad
Subjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG); Programming Languages (cs.PL)
[153]  arXiv:2007.09170 (replaced) [pdf, other]
Title: Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation
Comments: Extension of our IVA'19 paper. Submitted to the International Journal of Human-Computer Interaction. arXiv admin note: substantial text overlap with arXiv:1903.03369
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[154]  arXiv:2009.01325 (replaced) [pdf, other]
Title: Learning to summarize from human feedback
Comments: NeurIPS 2020 camera ready
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[155]  arXiv:2009.05859 (replaced) [pdf, other]
Title: Towards Automatic Manipulation of Intra-cardiac Echocardiography Catheter
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[156]  arXiv:2009.07964 (replaced) [pdf, other]
Title: Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis
Comments: EMNLP 2020, long paper
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
[157]  arXiv:2009.08267 (replaced) [pdf, other]
Title: Integration of AI and mechanistic modeling in generative adversarial networks for stochastic inverse problems
Comments: New appendix
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[158]  arXiv:2010.00770 (replaced) [pdf, other]
Title: XDA: Accurate, Robust Disassembly with Transfer Learning
Comments: To appear in 2021 Network and Distributed System Security Symposium (NDSS 2021)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[159]  arXiv:2010.02847 (replaced) [pdf, other]
Title: Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs
Comments: Accepted at AACL-IJCNLP 2020
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[160]  arXiv:2010.09895 (replaced) [pdf, other]
Title: Multi-Window Data Augmentation Approach for Speech Emotion Recognition
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[161]  arXiv:2010.10569 (replaced) [pdf, other]
Title: Bayesian Algorithms for Decentralized Stochastic Bandits
Comments: Submitted to IEEE Journal on Selected Areas in Information Theory (JSAIT) issue on Sequential, Active, and Reinforcement Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[162]  arXiv:2010.11910 (replaced) [pdf, other]
Title: Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
Comments: submitted to ICASSP 2021
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[163]  arXiv:2010.13118 (replaced) [pdf, other]
Title: Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model
Comments: 9 pages of content, 11 pages in total, 1 figure, 5 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[164]  arXiv:2010.13483 (replaced) [pdf, other]
Title: High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards
Comments: Published at Conference on Robot Learning (CoRL) 2020
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Machine Learning (stat.ML)
[165]  arXiv:2010.13787 (replaced) [pdf, other]
Title: Hierarchical Inference With Bayesian Neural Networks: An Application to Strong Gravitational Lensing
Authors: Sebastian Wagner-Carena, Ji Won Park, Simon Birrer, Philip J. Marshall, Aaron Roodman, Risa H. Wechsler (for the LSST Dark Energy Science Collaboration)
Comments: Code available at this https URL
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
[166]  arXiv:2010.13887 (replaced) [pdf, other]
Title: LightSeq: A High Performance Inference Library for Sequence Processing and Generation
Comments: 6 pages, 8 figures
Subjects: Mathematical Software (cs.MS); Machine Learning (cs.LG)
[ total of 166 entries: 1-166 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2010, contact, help  (Access key information)