Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Thu, 29 Oct 20
 [1] arXiv:2010.14535 [pdf, other]

Title: Neural Architecture Search of SPD Manifold NetworksAuthors: Rhea Sanjay Sukthanker, Zhiwu Huang, Suryansh Kumar, Erik Goron Endsjo, Yan Wu, Luc Van GoolComments: Info: 19 pages, 11 Figures, and 9 TablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks. Unlike the conventional NAS problem, our problem requires to search for a unique computational cell called the SPD cell. This SPD cell serves as a basic building block of SPD neural architectures. An efficient solution to our problem is important to minimize the extraneous manual effort in the SPD neural architecture design. To accomplish this goal, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem using the supernet strategy which models the architecture search problem as a oneshot training process of a single supernet. Based on the supernet modeling, we exploit a differentiable NAS algorithm on our relaxed continuous search space for SPD neural architecture search. Statistical evaluation of our method on drone, action, and emotion recognition tasks mostly provides better results than the stateoftheart SPD networks and NAS algorithms. Empirical results show that our algorithm excels in discovering better SPD network design, and providing models that are more than 3 times lighter than searched by stateoftheart NAS algorithms.
 [2] arXiv:2010.14543 [pdf, other]

Title: Unsupervised Domain Adaptation for Visual NavigationAuthors: Shangda Li, Devendra Singh Chaplot, YaoHung Hubert Tsai, Yue Wu, LouisPhilippe Morency, Ruslan SalakhutdinovComments: Deep Reinforcement Learning Workshop at NeurIPS 2020. Camera Ready VersionSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Advances in visual navigation methods have led to intelligent embodied navigation agents capable of learning meaningful representations from raw RGB images and perform a wide variety of tasks involving structural and semantic reasoning. However, most learningbased navigation policies are trained and tested in simulation environments. In order for these policies to be practically useful, they need to be transferred to the realworld. In this paper, we propose an unsupervised domain adaptation method for visual navigation. Our method translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy. The proposed method outperforms several baselines across two different navigation tasks in simulation. We further show that our method can be used to transfer the navigation policies learned in simulation to the real world.
 [3] arXiv:2010.14563 [pdf, ps, other]

Title: Adversarial Dueling BanditsComments: 26 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce the problem of regret minimization in Adversarial Dueling Bandits. As in classic Dueling Bandits, the learner has to repeatedly choose a pair of items and observe only a relative binary `winloss' feedback for this pair, but here this feedback is generated from an arbitrary preference matrix, possibly chosen adversarially. Our main result is an algorithm whose $T$round regret compared to the \emph{Bordawinner} from a set of $K$ items is $\tilde{O}(K^{1/3}T^{2/3})$, as well as a matching $\Omega(K^{1/3}T^{2/3})$ lower bound. We also prove a similar high probability regret bound. We further consider a simpler \emph{fixedgap} adversarial setup, which bridges between two extreme preference feedback models for dueling bandits: stationary preferences and an arbitrary sequence of preferences. For the fixedgap adversarial setup we give an $\smash{ \tilde{O}((K/\Delta^2)\log{T}) }$ regret algorithm, where $\Delta$ is the gap in Borda scores between the best item and all other items, and show a lower bound of $\Omega(K/\Delta^2)$ indicating that our dependence on the main problem parameters $K$ and $\Delta$ is tight (up to logarithmic factors).
 [4] arXiv:2010.14592 [pdf, other]

Title: Shapley Flow: A Graphbased Approach to Interpreting Model PredictionsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Many existing approaches for estimating feature importance are problematic because they ignore or hide dependencies among features. A causal graph, which encodes the relationships among input variables, can aid in assigning feature importance. However, current approaches that assign credit to nodes in the causal graph fail to explain the entire graph. In light of these limitations, we propose Shapley Flow, a novel approach to interpreting machine learning models. It considers the entire causal graph, and assigns credit to \textit{edges} instead of treating nodes as the fundamental unit of credit assignment. Shapley Flow is the unique solution to a generalization of the Shapley value axioms to directed acyclic graphs. We demonstrate the benefit of using Shapley Flow to reason about the impact of a model's input on its output. In addition to maintaining insights from existing approaches, Shapley Flow extends the flat, setbased, view prevalent in game theory based explanation methods to a deeper, \textit{graphbased}, view. This graphbased view enables users to understand the flow of importance through a system, and reason about potential interventions.
 [5] arXiv:2010.14603 [pdf, other]

Title: Learning to be Safe: Deep RL with a Safety CriticComments: In submission, 16 pages (including appendix)Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Safety is an essential component for deploying reinforcement learning (RL) algorithms in realworld scenarios, and is critical during the learning process itself. A natural first approach toward safe RL is to manually specify constraints on the policy's behavior. However, just as learning has enabled progress in largescale development of AI systems, learning safety specifications may also be necessary to ensure safety in messy openworld environments where manual safety specifications cannot scale. Akin to how humans learn incrementally starting in childsafe environments, we propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors when learning new, modified tasks. We empirically study this form of safetyconstrained transfer learning in three challenging domains: simulated navigation, quadruped locomotion, and dexterous inhand manipulation. In comparison to standard deep RL techniques and prior approaches to safe RL, we find that our method enables the learning of new tasks and in new environments with both substantially fewer safety incidents, such as falling or dropping an object, and faster, more stable learning. This suggests a path forward not only for safer RL systems, but also for more effective RL systems.
 [6] arXiv:2010.14641 [pdf, other]

Title: Learning to Plan Optimistically: UncertaintyGuided Deep Exploration via Latent Model EnsemblesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Learning complex behaviors through interaction requires coordinated longterm planning. Random exploration and novelty search lack taskcentric guidance and waste effort on noninformative interactions. Instead, decision making should target samples with the potential to optimize performance far into the future, while only reducing uncertainty where conducive to this objective. This paper presents latent optimistic value exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain longterm rewards. We combine finite horizon rollouts from a latent model with value function estimates to predict infinite horizon returns and recover associated uncertainty through ensembling. Policy training then proceeds on an upper confidence bound (UCB) objective to identify and select the interactions most promising to improve longterm performance. We apply LOVE to visual control tasks in continuous stateaction spaces and demonstrate improved sample complexity on a selection of benchmarking tasks.
 [7] arXiv:2010.14657 [pdf, ps, other]

Title: Temporal Difference Learning as Gradient SplittingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Temporal difference learning with linear function approximation is a popular method to obtain a lowdimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in terms of a splitting of the gradient of an appropriately chosen function. As a consequence of this interpretation, convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning. Beyond giving a new, fuller explanation of why temporal difference works, our interpretation also yields improved convergence times. We consider the setting with $1/\sqrt{T}$ stepsize, where previous comparable finitetime convergence time bounds for temporal difference learning had the multiplicative factor $1/(1\gamma)$ in front of the bound, with $\gamma$ being the discount factor. We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where $1/(1\gamma)$ only multiplies an asymptotically negligible term.
 [8] arXiv:2010.14658 [pdf, ps, other]

Title: Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMCComments: To appear in NeurIPS 2020Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Probability (math.PR)
Various differentially private algorithms instantiate the exponential mechanism, and require sampling from the distribution $\exp(f)$ for a suitable function $f$. When the domain of the distribution is highdimensional, this sampling can be computationally challenging. Using heuristic sampling schemes such as Gibbs sampling does not necessarily lead to provable privacy. When $f$ is convex, techniques from logconcave sampling lead to polynomialtime algorithms, albeit with large polynomials. Langevin dynamicsbased algorithms offer much faster alternatives under some distance measures such as statistical distance. In this work, we establish rapid convergence for these algorithms under distance measures more suitable for differential privacy. For smooth, stronglyconvex $f$, we give the first results proving convergence in R\'enyi divergence. This gives us fast differentially private algorithms for such $f$. Our techniques and simple and generic and apply also to underdamped Langevin dynamics.
 [9] arXiv:2010.14664 [pdf, other]

Title: System Identification via MetaLearning in Linear TimeVarying EnvironmentsSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
System identification is a fundamental problem in reinforcement learning, control theory and signal processing, and the nonasymptotic analysis of the corresponding sample complexity is challenging and elusive, even for linear timevarying (LTV) systems. To tackle this challenge, we develop an episodic block model for the LTV system where the model parameters remain constant within each block but change from block to block. Based on the observation that the model parameters across different blocks are related, we treat each episodic block as a learning task and then run metalearning over many blocks for system identification, using two steps, namely offline metalearning and online adaptation. We carry out a comprehensive nonasymptotic analysis of the performance of metalearning based system identification. To deal with the technical challenges rooted in the sample correlation and small sample sizes in each block, we devise a new twoscale martingale smallball approach for offline metalearning, for arbitrary model correlation structure across blocks. We then quantify the finite time error of online adaptation by leveraging recent advances in linear stochastic approximation with correlated samples.
 [10] arXiv:2010.14670 [pdf, ps, other]

Title: Online Learning with Primary and Secondary LossesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the problem of online learning with primary and secondary losses. For example, a recruiter making decisions of which job applicants to hire might weigh false positives and false negatives equally (the primary loss) but the applicants might weigh false negatives much higher (the secondary loss). We consider the following question: Can we combine "expert advice" to achieve low regret with respect to the primary loss, while at the same time performing {\em not much worse than the worst expert} with respect to the secondary loss? Unfortunately, we show that this goal is unachievable without any bounded variance assumption on the secondary loss. More generally, we consider the goal of minimizing the regret with respect to the primary loss and bounding the secondary loss by a linear threshold. On the positive side, we show that running any switchinglimited algorithm can achieve this goal if all experts satisfy the assumption that the secondary loss does not exceed the linear threshold by $o(T)$ for any time interval. If not all experts satisfy this assumption, our algorithms can achieve this goal given access to some external oracles which determine when to deactivate and reactivate experts.
 [11] arXiv:2010.14672 [pdf, other]

Title: Why Does MAML Outperform ERM? An Optimization PerspectiveSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
ModelAgnostic MetaLearning (MAML) has demonstrated widespread success in training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard Empirical Risk Minimization (ERM), and little is understood about how much MAML improves over ERM in terms of the fast adaptability of their solutions in various scenarios. We analytically address this issue in a linear regression setting consisting of a mixture of easy and hard tasks, where hardness is determined by the number of gradient steps required to solve the task. Specifically, we prove that for $\Omega(d_{\text{eff}})$ labelled test samples (for gradientbased finetuning) where $d_{\text{eff}}$ is the effective dimension of the problem, in order for MAML to achieve substantial gain over ERM, the optimal solutions of the hard tasks must be closely packed together with the center far from the center of the easy task optimal solutions. We show that these insights also apply in a lowdimensional feature space when both MAML and ERM learn a representation of the tasks, which reduces the effective problem dimension. Further, our fewshot image classification experiments suggest that our results generalize beyond linear regression.
 [12] arXiv:2010.14680 [pdf, other]

Title: Learning to Represent Action Values as a Hypergraph on the Action VerticesComments: 9 pages, 10 figures, 3 tablesSubjects: Machine Learning (cs.LG)
Actionvalue estimation is a critical component of many reinforcement learning (RL) methods whereby sample complexity relies heavily on how fast a good estimator for action value can be learned. By viewing this problem through the lens of representation learning, good representations of both state and action can facilitate actionvalue estimation. While advances in deep learning have seamlessly driven progress in learning state representations, given the specificity of the notion of agency to RL, little attention has been paid to learning action representations. We conjecture that leveraging the combinatorial structure of multidimensional action spaces is a key ingredient for learning good representations of action. To test this, we set forth the action hypergraph networks frameworka class of functions for learning action representations with a relational inductive bias. Using this framework we realise an agent class based on a combination with deep Qnetworks, which we dub hypergraph Qnetworks. We show the effectiveness of our approach on a myriad of domains: illustrative prediction problems under minimal confounding effects, Atari 2600 games, and physical control benchmarks.
 [13] arXiv:2010.14687 [pdf, other]

Title: MILR: Mathematically Induced Layer Recovery for Plaintext Space Error Correction of CNNsComments: 12 pagesSubjects: Machine Learning (cs.LG)
The increased use of Convolutional Neural Networks (CNN) in mission critical systems has increased the need for robust and resilient networks in the face of both naturally occurring faults as well as security attacks. The lack of robustness and resiliency can lead to unreliable inference results. Current methods that address CNN robustness require hardware modification, network modification, or network duplication. This paper proposes MILR a software based CNN error detection and error correction system that enables selfhealing of the network from single and multi bit errors. The selfhealing capabilities are based on mathematical relationships between the inputs,outputs, and parameters(weights) of a layers, exploiting these relationships allow the recovery of erroneous parameters (weights) throughout a layer and the network. MILR is suitable for plaintextspace error correction (PSEC) given its ability to correct wholeweight and even wholelayer errors in CNNs.
 [14] arXiv:2010.14689 [pdf, other]

Title: Expressive yet Tractable Bayesian Deep Learning via Subnetwork InferenceAuthors: Erik Daxberger, Eric Nalisnick, James Urquhart Allingham, Javier Antorán, José Miguel HernándezLobatoComments: 15 pages, extended version with supplementary materialSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Bayesian paradigm has the potential to solve some of the core issues in modern deep learning, such as poor calibration, data inefficiency, and catastrophic forgetting. However, scaling Bayesian inference to the highdimensional parameter spaces of deep neural networks requires restrictive approximations. In this paper, we propose performing inference over only a small subset of the model parameters while keeping all others as point estimates. This enables us to use expressive posterior approximations that would otherwise be intractable for the full model. In particular, we develop a practical and scalable Bayesian deep learning method that first trains a point estimate, and then infers a full covariance Gaussian posterior approximation over a subnetwork. We propose a subnetwork selection procedure which aims to optimally preserve posterior uncertainty. We empirically demonstrate the effectiveness of our approach compared to pointestimated networks and methods that use less expressive posterior approximations over the full network.
 [15] arXiv:2010.14700 [pdf, other]

Title: Sparse Symmetric Tensor Regression for Functional Connectivity AnalysisAuthors: Da XuSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Tensor regression models, such as CP regression and Tucker regression, have many successful applications in neuroimaging analysis where the covariates are of ultrahigh dimensionality and possess complex spatial structures. The highdimensional covariate arrays, also known as tensors, can be approximated by lowrank structures and fit into the generalized linear models. The resulting tensor regression achieves a significant reduction in dimensionality while remaining efficient in estimation and prediction. Brain functional connectivity is an essential measure of brain activity and has shown significant association with neurological disorders such as Alzheimer's disease. The symmetry nature of functional connectivity is a property that has not been explored in previous tensor regression models. In this work, we propose a sparse symmetric tensor regression that further reduces the number of free parameters and achieves superior performance over symmetrized and ordinary CP regression, under a variety of simulation settings. We apply the proposed method to a study of Alzheimer's disease (AD) and normal ageing from the Berkeley Aging Cohort Study (BACS) and detect two regions of interest that have been identified important to AD.
 [16] arXiv:2010.14701 [pdf, other]

Title: Scaling Laws for Autoregressive Generative ModelingAuthors: Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlishComments: 20+15 pages, 30 figuresSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
We identify empirical scaling laws for the crossentropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a powerlaw plus constant scaling law. The optimal model size also depends on the compute budget through a powerlaw, with exponents that are nearly universal across all data domains.
The crossentropy loss has an information theoretic interpretation as $S($True$) + D_{\mathrm{KL}}($True$$Model$)$, and the empirical scaling laws suggest a prediction for both the true data distribution's entropy and the KL divergence between the true and model distributions. With this interpretation, billionparameter Transformers are nearly perfect models of the YFCC100M image distribution downsampled to an $8\times 8$ resolution, and we can forecast the model size needed to achieve any given reducible loss (ie $D_{\mathrm{KL}}$) in nats/image for other resolutions.
We find a number of additional scaling laws in specific domains: (a) we identify a scaling relation for the mutual information between captions and images in multimodal models, and show how to answer the question "Is a picture worth a thousand words?"; (b) in the case of mathematical problem solving, we identify scaling laws for model performance when extrapolating beyond the training distribution; (c) we finetune generative image models for ImageNet classification and find smooth scaling of the classification loss and error rate, even as the generative loss levels off. Taken together, these results strengthen the case that scaling laws have important implications for neural network performance, including on downstream tasks.  [17] arXiv:2010.14753 [pdf, ps, other]

Title: A short note on the decision tree based neural turing machineAuthors: Yingshi ChenComments: 5 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:2010.02921Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Turing machine and decision tree have developed independently for a long time. With the recent development of differentiable models, there is an intersection between them. Neural turing machine(NTM) opens door for the memory network. It use differentiable attention mechanism to read/write external memory bank. Differentiable forest brings differentiable properties to classical decision tree. In this short note, we show the deep connection between these two models. That is: differentiable forest is a special case of NTM. Differentiable forest is actually decision tree based neural turing machine. Based on this deep connection, we propose a response augmented differential forest (RaDF). The controller of RaDF is differentiable forest, the external memory of RaDF are response vectors which would be read/write by leaf nodes.
 [18] arXiv:2010.14761 [pdf, other]

Title: Wide flat minima and optimal generalization in classifying highdimensional Gaussian mixturesComments: 18 pages, 4 figures. arXiv admin note: text overlap with arXiv:2006.07897Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); Statistics Theory (math.ST)
We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a thresholdlinear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayesoptimal generalization error, even in the case of unbalanced clusters. We explore analytically the errorcounting loss landscape in the vicinity of a Bayesoptimal solution, and show that the closer we get to such configurations, the higher the local entropy, implying that the Bayesoptimal solution lays inside a wide flat region. We also consider the algorithmically relevant case of targeting wide flat minima of the (differentiable) mean squared error loss. Our analytical and numerical results show not only that in the balanced case the dependence on the norm of the weights is mild, but also, in the unbalanced case, that the performances can be improved.
 [19] arXiv:2010.14763 [pdf, other]

Title: Hogwild! over Distributed Local Data Sets with Linearly Increasing MiniBatch SizesAuthors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc TranDinh, Phuong Ha NguyenComments: arXiv admin note: substantial text overlap with arXiv:2007.09208Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets  and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small minibatch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We prove a tight and novel nontrivial convergence analysis for strongly convex problems which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and nonconvex problems for biased and unbiased local data sets.
 [20] arXiv:2010.14765 [pdf, other]

Title: Deep Networks from the Principle of Rate ReductionComments: arXiv admin note: text overlap with arXiv:1611.05431 by other authorsSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned features naturally leads to a multilayer deep network, one iteration per layer. The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layerbylayer in a forward propagation fashion by emulating the gradient scheme. All components of this "white box" network have precise optimization, statistical, and geometric interpretation. This principled framework also reveals and justifies the role of multichannel lifting and sparse coding in early stage of deep networks. Moreover, all linear operators of the soderived network naturally become multichannel convolutions when we enforce classification to be rigorously shiftinvariant. The derivation also indicates that such a convolutional network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments indicate that so constructed deep network can already learn a good discriminative representation even without any back propagation training.
 [21] arXiv:2010.14766 [pdf, other]

Title: A Sober Look at the Unsupervised Learning of Disentangled Representations and their EvaluationAuthors: Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier BachemComments: arXiv admin note: substantial text overlap with arXiv:1811.12359Journalref: Journal of Machine Learning Research 2020, Volume 21, Number 209Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The idea behind the \emph{unsupervised} learning of \emph{disentangled} representations is that realworld data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train over $14000$ models covering most prominent methods and evaluation metrics in a reproducible largescale experimental study on eight data sets. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, welldisentangled models seemingly cannot be identified without supervision. Furthermore, different evaluation metrics do not always agree on what should be considered "disentangled" and exhibit systematic differences in the estimation. Finally, increased disentanglement does not seem to necessarily lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.
 [22] arXiv:2010.14771 [pdf, other]

Title: Batch Reinforcement Learning with a Nonparametric OffPolicy Policy GradientComments: arXiv admin note: substantial text overlap with arXiv:2001.02435Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Offpolicy Reinforcement Learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment. Current offpolicy policy gradient methods either suffer from high bias or high variance, delivering often unreliable estimates. The price of inefficiency becomes evident in realworld scenarios such as interactiondriven robot learning, where the success of RL has been rather limited, and a very high sample cost hinders straightforward application. In this paper, we propose a nonparametric Bellman equation, which can be solved in closed form. The solution is differentiable w.r.t the policy parameters and gives access to an estimation of the policy gradient. In this way, we avoid the high variance of importance sampling approaches, and the high bias of semigradient methods. We empirically analyze the quality of our gradient estimate against stateoftheart methods, and show that it outperforms the baselines in terms of sample efficiency on classical control tasks.
 [23] arXiv:2010.14773 [pdf, ps, other]

Title: Graph embedding using multilayer adjacent point merging modelSubjects: Machine Learning (cs.LG)
For graph classification tasks, many traditional kernel methods focus on measuring the similarity between graphs. These methods have achieved great success on resolving graph isomorphism problems. However, in some classification problems, the graph class depends on not only the topological similarity of the whole graph, but also constituent subgraph patterns. To this end, we propose a novel graph embedding method using a multilayer adjacent point merging model. This embedding method allows us to extract different subgraph patterns from traindata. Then we present a flexible loss function for feature selection which enhances the robustness of our method for different classification problems. Finally, numerical evaluations demonstrate that our proposed method outperforms many stateoftheart methods.
 [24] arXiv:2010.14774 [pdf, other]

Title: Structural Causal Model with Expert Augmented Knowledge to Estimate the Effect of Oxygen Therapy on Mortality in the ICUSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recent advances in causal inference techniques, more specifically, in the theory of structural causal models, provide the framework for identification of causal effects from observational data in the cases where the causal graph is identifiable, i.e., the data generating mechanism can be recovered from the joint distribution. However, no such studies have been done to demonstrate this concept with a clinical example. We present a complete framework to estimate the causal effect from observational data by augmenting expert knowledge in the model development phase and with a practical clinical application. Our clinical application entails a timely and important research question, i.e., the effect of oxygen therapy intervention in the intensive care unit (ICU); the result of this project is useful in a variety of disease conditions, including severe acute respiratory syndrome coronavirus2 (SARSCoV2) patients in the ICU. We used data from the MIMIC III database, a standard database in the machine learning community that contains 58,976 admissions from an ICU in Boston, MA, for estimating the oxygen therapy effect on morality. We also identified the covariatespecific effect to oxygen therapy from the model for more personalized intervention.
 [25] arXiv:2010.14778 [pdf, other]

Title: DNA: Differentiable NetworkAccelerator CoSearchAuthors: Yongan Zhang, Yonggan Fu, Weiwen Jiang, Chaojian Li, Haoran You, Meng Li, Vikas Chandra, Yingyan LinSubjects: Machine Learning (cs.LG)
Powerful yet complex deep neural networks (DNNs) have fueled a booming demand for efficient DNN solutions to bring DNNpowered intelligence into numerous applications. Jointly optimizing the networks and their accelerators are promising in providing optimal performance. However, the great potential of such solutions have yet to be unleashed due to the challenge of simultaneously exploring the vast and entangled, yet different design spaces of the networks and their accelerators. To this end, we propose DNA, a Differentiable NetworkAccelerator cosearch framework for automatically searching for matched networks and accelerators to maximize both the task accuracy and acceleration efficiency. Specifically, DNA integrates two enablers: (1) a generic design space for DNN accelerators that is applicable to both FPGA and ASICbased DNN accelerators and compatible with DNN frameworks such as PyTorch to enable algorithmic exploration for more efficient DNNs and their accelerators; and (2) a joint DNN network and accelerator cosearch algorithm that enables simultaneously searching for optimal DNN structures and their accelerators' microarchitectures and mapping methods to maximize both the task accuracy and acceleration efficiency. Experiments and ablation studies based on FPGA measurements and ASIC synthesis show that the matched networks and accelerators generated by DNA consistently outperform stateoftheart (SOTA) DNNs and DNN accelerators (e.g., 3.04x better FPS with a 5.46% higher accuracy on ImageNet), while requiring notably reduced search time (up to 1234.3x) over SOTA coexploration methods, when evaluated over ten SOTA baselines on three datasets. All codes will be released upon acceptance.
 [26] arXiv:2010.14785 [pdf, other]

Title: Designing Interpretable Approximations to Deep Reinforcement Learning with Soft Decision TreesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or requirements such as verifiable safety guarantees, it may not be feasible to actually use such highperforming DNNs in practice. Many techniques have been developed in recent years to compress or distill complex DNNs into smaller, faster or more understandable models and controllers. This work seeks to provide a quantitative framework with metrics to systematically evaluate the outcome of such conversion processes, and identify reduced models that not only preserve a desired performance level, but also, for example, succinctly explain the latent knowledge represented by a DNN. We illustrate the effectiveness of the proposed approach on the evaluation of decision tree variants in the context of benchmark reinforcement learning tasks.
 [27] arXiv:2010.14816 [pdf, other]

Title: Higher Order Linear TransformerAuthors: Jean MercatSubjects: Machine Learning (cs.LG)
Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is reused and extended to a secondorder approximation of the softmax normalization.
 [28] arXiv:2010.14831 [pdf, other]

Title: Deep Manifold Computing and VisualizationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); HumanComputer Interaction (cs.HC); Machine Learning (stat.ML)
The ability to preserve local geometry of highly nonlinear manifolds in high dimensional spaces and properly unfold them into lower dimensional hyperplanes is the key to the success of manifold computing, nonlinear dimensionality reduction (NLDR) and visualization. This paper proposes a novel method, called elastic locally isometric smoothness (ELIS), to empower deep neural networks with such an ability. ELIS requires that a desired metric between points should be preserved across layers in order to preserve local geometry; such a smoothness constraint effectively regularizes vectorbased transformations to become wellbehaved local metricpreserving homeomorphisms. Moreover, ELIS requires that the smoothness should be imposed in a way to render sufficient flexibility for tackling complicated nonlinearity and nonEuclideanity; this is achieved layerwisely via nonlinearity in both the similarity and activation functions. The ELIS method incorporates a class of suitable nonlinear similarity functions into a twoway divergence loss and uses hyperparameter continuation in finding optimal solutions. Extensive experiments, comparisons, and ablation study demonstrate that ELIS can deliver results not only superior to UMAP and tSNE for and visualization but also better than other leading counterparts of manifold and autoencoder learning for NLDR and manifold data generation.
 [29] arXiv:2010.14864 [pdf, other]

Title: Treestructured Ising models can be learned efficientlySubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (stat.ML)
We provide the first polynomialsample and polynomialtime algorithm for learning treestructured Ising models. In particular, we show that $n$variable treestructured Ising models can be learned computationallyefficiently to within total variation distance~$\epsilon$ from an optimal $O(n \log n/\epsilon^2)$ samples, where $O(.)$ hides an absolute constant which does not depend on the model being learned  neither its tree nor the magnitude of its edge strengths, on which we place no assumptions. Our guarantees hold, in fact, for the celebrated ChowLiu [1968] algorithm, using the plugin estimator for mutual information. While this (or any other) algorithm may fail to identify the structure of the underlying model correctly from a finite sample, we show that it will still learn a treestructured model that is close to the true one in TV distance, a guarantee called "proper learning."
Prior to our work there were no known sample and timeefficient algorithms for learning (properly or nonproperly) arbitrary treestructured graphical models. In particular, our guarantees cannot be derived from known results for the ChowLiu algorithm and the ensuing literature on learning graphical models, including a recent renaissance of algorithms on this learning challenge, which only yield asymptotic consistency results, or sampleinefficient and/or timeinefficient algorithms, unless further assumptions are placed on the graphical model, such as bounds on the "strengths" of the model's edges. While we establish guarantees for a widely known and simple algorithm, the analysis that this algorithm succeeds is quite complex, requiring a hierarchical classification of the edges into layers with different reconstruction guarantees, depending on their strength, combined with delicate uses of the subadditivity of the squared Hellinger distance over graphical models to control the error accumulation.  [30] arXiv:2010.14876 [pdf, other]

Title: Fighting Copycat Agents in Behavioral Cloning from Observation HistoriesComments: Published at NeurIPS 2020 9 pages(exclude reference and appendices)Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
Imitation learning trains policies to map from input observations to the actions that an expert would choose. In this setting, distribution shift frequently exacerbates the effect of misattributing expert actions to nuisance correlates among the observed variables. We observe that a common instance of this causal confusion occurs in partially observed settings when expert actions are strongly correlated over time: the imitator learns to cheat by predicting the expert's previous action, rather than the next action. To combat this "copycat problem", we propose an adversarial approach to learn a feature representation that removes excess information about the previous expert action nuisance correlate, while retaining the information necessary to predict the next action. In our experiments, our approach improves performance significantly across a variety of partially observed imitation learning tasks.
 [31] arXiv:2010.14878 [pdf, other]

Title: An Optimal Control Approach to Learning in SIDARTHE Epidemic modelAuthors: Andrea Zugarini, Enrico Meloni, Alessandro Betti, Andrea Panizza, Marco Corneli, Marco GoriComments: 11 pages, 7 figures, submitted at TNNLSSubjects: Machine Learning (cs.LG); Physics and Society (physics.socph)
The COVID19 outbreak has stimulated the interest in the proposal of novel epidemiological models to predict the course of the epidemic so as to help planning effective control strategies. In particular, in order to properly interpret the available data, it has become clear that one must go beyond most classic epidemiological models and consider models that, like the recently proposed SIDARTHE, offer a richer description of the stages of infection. The problem of learning the parameters of these models is of crucial importance especially when assuming that they are timevariant, which further enriches their effectiveness. In this paper we propose a general approach for learning timevariant parameters of dynamic compartmental models from epidemic data. We formulate the problem in terms of a functional risk that depends on the learning variables through the solutions of a dynamic system. The resulting variational problem is then solved by using a gradient flow on a suitable, regularized functional. We forecast the epidemic evolution in Italy and France. Results indicate that the model provides reliable and challenging predictions over all available data as well as the fundamental role of the chosen strategy on the timevariant parameters.
 [32] arXiv:2010.14900 [pdf, other]

Title: Dynamic Bayesian Approach for decisionmaking in EgoThingsAuthors: Divya Kanapram, Damian Campo, Mohamad Baydoun, Lucio Marcenaro, Eliane L. Bodanese, Carlo Regazzoni, Mario MarcheseComments: IEEE 5th World Forum on Internet of Things at Limerick, IrelandSubjects: Machine Learning (cs.LG)
This paper presents a novel approach to detect abnormalities in dynamic systems based on multisensory data and feature selection. The proposed method produces multiple inference models by considering several features of the observed data. This work facilitates the obtainment of the most precise features for predicting future instances and detecting abnormalities. Growing neural gas (GNG) is employed for clustering multisensory data into a set of nodes that provide a semantic interpretation of data and define local linear models for prediction purposes. Our method uses a Markov Jump particle filter (MJPF) for state estimation and abnormality detection. The proposed method can be used for selecting the optimal set features to be shared in networking operations such that state prediction, decisionmaking, and abnormality detection processes are favored. This work is evaluated by using a real dataset consisting of a moving vehicle performing some tasks in a controlled environment.
 [33] arXiv:2010.14907 [pdf, other]

Title: Online feature selection for rapid, lowoverhead learning in networked systemsAuthors: Xiaoxuan Wang (1), Forough Shahab Samani (1 and 2), Rolf Stadler (1 and 2) ((1) KTH Royal Institute of Technology, Sweden (2) RISE Research Institutes of Sweden)Comments: A short version of this paper has been published at IFIP/IEEE 16th International Conference on Network and Service Management, 26 November 2020Subjects: Machine Learning (cs.LG)
Datadriven functions for operation and management often require measurements collected through monitoring for model training and prediction. The number of data sources can be very large, which requires a significant communication and computing overhead to continuously extract and collect this data, as well as to train and update the machinelearning models. We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources, which allows for rapid, lowoverhead, and effective learning and prediction. OSFS is instantiated with a feature ranking algorithm and applies the concept of a stable feature set, which we introduce in the paper. We perform extensive, experimental evaluation of our method on data from an inhouse testbed. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude, from which models are trained with acceptable prediction accuracy. While our method is heuristic and can be improved in many ways, the results clearly suggests that many learning tasks do not require a lengthy monitoring phase and expensive offline training.
 [34] arXiv:2010.14908 [pdf, other]

Title: Collective Awareness for Abnormality Detection in Connected Autonomous VehiclesAuthors: Divya Thekke Kanapram, Fabio Patrone, Pablo MarinPlaza, Mario Marchese, Eliane L. Bodanese, Lucio Marcenaro, David Martín Gómez, Carlo RegazzoniComments: IEEE Internet of Things JournalSubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Signal Processing (eess.SP)
The advancements in connected and autonomous vehicles in these times demand the availability of tools providing the agents with the capability to be aware and predict their own states and context dynamics. This article presents a novel approach to develop an initial level of collective awareness in a network of intelligent agents. A specific collective self awareness functionality is considered, namely, agent centered detection of abnormal situations present in the environment around any agent in the network. Moreover, the agent should be capable of analyzing how such abnormalities can influence the future actions of each agent. Data driven dynamic Bayesian network (DBN) models learned from time series of sensory data recorded during the realization of tasks (agent network experiences) are here used for abnormality detection and prediction. A set of DBNs, each related to an agent, is used to allow the agents in the network to each synchronously aware possible abnormalities occurring when available models are used on a new instance of the task for which DBNs have been learned. A growing neural gas (GNG) algorithm is used to learn the node variables and conditional probabilities linking nodes in the DBN models; a Markov jump particle filter (MJPF) is employed for state estimation and abnormality detection in each agent using learned DBNs as filter parameters. Performance metrics are discussed to asses the algorithms reliability and accuracy. The impact is also evaluated by the communication channel used by the network to share the data sensed in a distributed way by each agent of the network. The IEEE 802.11p protocol standard has been considered for communication among agents. Real data sets are also used acquired by autonomous vehicles performing different tasks in a controlled environment.
 [35] arXiv:2010.14927 [pdf, other]

Title: Most ReLU Networks Suffer from $\ell^2$ Adversarial PerturbationsSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
We consider ReLU networks with random weights, in which the dimension decreases at each layer. We show that for most such networks, most examples $x$ admit an adversarial perturbation at an Euclidean distance of $O\left(\frac{\x\}{\sqrt{d}}\right)$, where $d$ is the input dimension. Moreover, this perturbation can be found via gradient flow, as well as gradient descent with sufficiently small steps. This result can be seen as an explanation to the abundance of adversarial examples, and to the fact that they are found via gradient descent.
 [36] arXiv:2010.14945 [pdf, other]

Title: Graph Contrastive Learning with Adaptive AugmentationComments: Work in progress; 11 pages, 3 figures, 5 tables. arXiv admin note: substantial text overlap with arXiv:2006.04131Subjects: Machine Learning (cs.LG)
Recently, contrastive learning (CL) has emerged as a successful method for unsupervised graph representation learning. Most graph CL methods first perform stochastic augmentation on the input graph to obtain two graph views and maximize the agreement of representations in the two views. Despite the prosperous development of graph CL methods, the design of graph augmentation schemesa crucial component in CLremains rarely explored. We argue that the data augmentation schemes should preserve intrinsic structural and attribute information of graphs, which will force the model to learn representations that are insensitive to perturbation on unimportant nodes and edges. However, most existing methods adopt uniform data augmentation schemes, like uniformly dropping edges and uniformly shuffling features, leading to suboptimal performance. In this paper, we propose a novel graph contrastive representation learning method with adaptive augmentation that incorporates various priors for topological and semantic aspects of the graph. Specifically, on the topology level, we design augmentation schemes based on node centrality measures to highlight important connective structures. On the node attribute level, we corrupt node features by adding more noise to unimportant node features, to enforce the model to recognize underlying semantic information. We perform extensive experiments of node classification on a variety of realworld datasets. Experimental results demonstrate that our proposed method consistently outperforms existing stateoftheart methods and even surpasses some supervised counterparts, which validates the effectiveness of the proposed contrastive framework with adaptive augmentation.
 [37] arXiv:2010.14946 [pdf, other]

Title: Smart Anomaly Detection in Sensor SystemsSubjects: Machine Learning (cs.LG); General Literature (cs.GL)
Anomaly detection is concerned with identifying data patterns that deviate remarkably from the expected behaviour. This is an important research problem, due to its broad set of application domains, from data analysis to ehealth, cybersecurity, predictive maintenance, fault prevention, and industrial automation. Herein, we review stateoftheart methods that may be employed to detect anomalies in the specific area of sensor systems, which poses hard challenges in terms of information fusion, data volumes, data speed, and network/energy efficiency, to mention but the most pressing ones. In this context, anomaly detection is a particularly hard problem, given the need to find computingenergy accuracy tradeoffs in a constrained environment. We taxonomize methods ranging from conventional techniques (statistical methods, timeseries analysis, signal processing, etc.) to datadriven techniques (supervised learning, reinforcement learning, deep learning, etc.). We also look at the impact that different architectural environments (Cloud, Fog, Edge) can have on the sensors ecosystem. The review points to the most promising intelligentsensing methods, and pinpoints a set of interesting open issues and challenges.
 [38] arXiv:2010.14957 [pdf, other]

Title: Dimensionality Reduction and Anomaly Detection for CPPS Data using AutoencoderComments: Copyright IEEE 2019Journalref: 2019 IEEE International Conference on Industrial Technology (ICIT)Subjects: Machine Learning (cs.LG)
Unsupervised anomaly detection (AD) is a major topic in the field of CyberPhysical Production Systems (CPPSs). A closely related concern is dimensionality reduction (DR) which is: 1) often used as a preprocessing step in an AD solution, 2) a sort of AD, if a measure of observation conformity to the learned data manifold is provided.
We argue that the two aspects can be complementary in a CPPS anomaly detection solution. In this work, we focus on the nonlinear autoencoder (AE) as a DR/AD approach. The contribution of this work is: 1) we examine the suitability of AE reconstruction error as an AD decision criterion in CPPS data. 2) we analyze its relation to a potential secondphase AD approach in the AE latent space 3) we evaluate the performance of the approach on three realworld datasets. Moreover, the approach outperforms stateoftheart techniques, alongside a relatively simple and straightforward application.  [39] arXiv:2010.14978 [pdf, ps, other]

Title: GameTheoretic Interactions of Different OrdersSubjects: Machine Learning (cs.LG)
In this study, we define interaction components of different orders between two input variables based on game theory. We further prove that interaction components of different orders satisfy several desirable properties.
 [40] arXiv:2010.14986 [pdf, other]

Title: Evaluating Robustness of Predictive Uncertainty Estimation: Are Dirichletbased Models Reliable?Authors: AnnaKathrin Kopetzki, Bertrand Charpentier, Daniel Zügner, Sandhya Giri, Stephan GünnemannSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Robustness to adversarial perturbations and accurate uncertainty estimation are crucial for reliable application of deep learning in real world settings. Dirichletbased uncertainty (DBU) models are a family of models that predict the parameters of a Dirichlet distribution (instead of a categorical one) and promise to signal when not to trust their predictions. Untrustworthy predictions are obtained on unknown or ambiguous samples and marked with a high uncertainty by the models. In this work, we show that DBU models with standard training are not robust w.r.t. three important tasks in the field of uncertainty estimation. In particular, we evaluate how useful the uncertainty estimates are to (1) indicate correctly classified samples, and (2) to detect adversarial examples that try to fool classification. We further evaluate the reliability of DBU models on the task of (3) distinguishing between indistribution (ID) and outofdistribution (OOD) data. To this end, we present the first study of certifiable robustness for DBU models. Furthermore, we propose novel uncertainty attacks that fool models into assigning high confidence to OOD data and low confidence to ID data, respectively. Based on our results, we explore the first approaches to make DBU models more robust. We use adversarial training procedures based on label attacks, uncertainty attacks, or random noise and demonstrate how they affect robustness of DBU models on ID data and OOD data.
 [41] arXiv:2010.15003 [pdf, other]

Title: Estimating Product Relations in Neural NetworksAuthors: Bhaavan GoelComments: 5 pages, 8 figuresSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Universal approximation theorem suggests that a shallow neural network can approximate any function. The input to neurons at each layer is a weighted sum of previous layer neurons and then an activation is applied. These activation functions perform very well when the output is a linear combination of input data. When trying to learn a function which involves product of input data, the neural networks tend to overfit the data to approximate the function. In this paper we will use properties of logarithmic functions to propose a pair of custom activation functions which can translate products into linear expression and learn using backpropagation. We will try to generalize this approach for some complex arithmetic functions and test the accuracy on a disjoint distribution with the training set.
 [42] arXiv:2010.15010 [pdf, other]

Title: Geometric Scattering Attention NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Geometric scattering has recently gained recognition in graph representation learning, and recent work has shown that integrating scattering features in graph convolution networks (GCNs) can alleviate the typical oversmoothing of features in node representation learning. However, scattering methods often rely on handcrafted design, requiring careful selection of frequency bands via a cascade of wavelet transforms, as well as an effective weight sharing scheme to combine together low and bandpass information. Here, we introduce a new attentionbased architecture to produce adaptive taskdriven node representations by implicitly learning nodewise weights for combining multiple scattering and GCN channels in the network. We show the resulting geometric scattering attention network (GSAN) outperforms previous networks in semisupervised node classification, while also enabling a spectral study of extracted information by examining nodewise attention weights.
 [43] arXiv:2010.15011 [pdf, other]

Title: Predicting Classification Accuracy when Adding New Unobserved ClassesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Multiclass classifiers are often designed and evaluated only on a sample from the classes on which they will eventually be applied. Hence, their final accuracy remains unknown. In this work we study how a classifier's performance over the initial class sample can be used to extrapolate its expected accuracy on a larger, unobserved set of classes. For this, we define a measure of separation between correct and incorrect classes that is independent of the number of classes: the reversed ROC (rROC), which is obtained by replacing the roles of classes and datapoints in the common ROC. We show that the classification accuracy is a function of the rROC in multiclass classifiers, for which the learned representation of data from the initial class sample remains unchanged when new classes are added. Using these results we formulate a robust neuralnetworkbased algorithm, CleaneX, which learns to estimate the accuracy of such classifiers on arbitrarily large sets of classes. Our method achieves remarkably better predictions than current stateoftheart methods on both simulations and real datasets of object detection, face recognition, and brain decoding.
 [44] arXiv:2010.15020 [pdf, other]

Title: Provably Efficient Online Agnostic Learning in Markov GamesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study online agnostic learning, a problem that arises in episodic multiagent reinforcement learning where the actions of the opponents are unobservable. We show that in this challenging setting, achieving sublinear regret against the best response in hindsight is statistically hard. We then consider a weaker notion of regret, and present an algorithm that achieves after $K$ episodes a sublinear $\tilde{\mathcal{O}}(K^{3/4})$ regret. This is the first sublinear regret bound (to our knowledge) in the online agnostic setting. Importantly, our regret bound is independent of the size of the opponents' action spaces. As a result, even when the opponents' actions are fully observable, our regret bound improves upon existing analysis (e.g., (Xie et al., 2020)) by an exponential factor in the number of opponents.
 [45] arXiv:2010.15028 [pdf, other]

Title: DeepRite: Deep Recurrent Inverse TreatmEnt Weighting for Adjusting Timevarying Confounding in Modern Longitudinal Observational DataSubjects: Machine Learning (cs.LG)
Counterfactual prediction is about predicting outcome of the unobserved situation from the data. For example, given patient is on drug A, what would be the outcome if she switch to drug B. Most of existing works focus on modeling counterfactual outcome based on static data. However, many applications have timevarying confounding effects such as multiple treatments over time. How to model such timevarying effects from longitudinal observational data? How to model complex highdimensional dependency in the data? To address these challenges, we propose Deep Recurrent Inverse TreatmEnt weighting (DeepRite) by incorporating recurrent neural networks into twophase adjustments for the existence of timevarying confounding in modern longitudinal data. In phase I cohort reweighting we fit one network for emitting time dependent inverse probabilities of treatment, use them to generate a pseudo balanced cohort. In phase II outcome progression, we input the adjusted data to the subsequent predictive network for making counterfactual predictions. We evaluate DeepRite on both synthetic data and a real data collected from sepsis patients in the intensive care units. DeepRite is shown to recover the ground truth from synthetic data, and estimate unbiased treatment effects from real data that can be better aligned with the standard guidelines for management of sepsis thanks to its applicability to create balanced cohorts.
 [46] arXiv:2010.15031 [pdf, ps, other]

Title: On Learning Continuous Pairwise Markov Random FieldsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We consider learning a sparse pairwise Markov Random Field (MRF) with continuousvalued variables from i.i.d samples. We adapt the algorithm of Vuffray et al. (2019) to this setting and provide finitesample analysis revealing sample complexity scaling logarithmically with the number of variables, as in the discrete and Gaussian settings. Our approach is applicable to a large class of pairwise MRFs with continuous variables and also has desirable asymptotic properties, including consistency and normality under mild conditions. Further, we establish that the population version of the optimization criterion employed in Vuffray et al. (2019) can be interpreted as local maximum likelihood estimation (MLE). As part of our analysis, we introduce a robust variation of sparse linear regression a` la Lasso, which may be of interest in its own right.
 [47] arXiv:2010.15054 [pdf, other]

Title: Attribution Preservation in Network Compression for Reliable Network InterpretationComments: NeurIPS 2020. Code: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Neural networks embedded in safetysensitive applications such as selfdriving cars and wearable health monitors rely on two important techniques: input attribution for hindsight analysis and network compression to reduce its size for edgecomputing. In this paper, we show that these seemingly unrelated techniques conflict with each other as network compression deforms the produced attributions, which could lead to dire consequences for missioncritical applications. This phenomenon arises due to the fact that conventional network compression methods only preserve the predictions of the network while ignoring the quality of the attributions. To combat the attribution inconsistency problem, we present a framework that can preserve the attributions while compressing a network. By employing the Weighted Collapsed Attribution Matching regularizer, we match the attribution maps of the network being compressed to its precompression former self. We demonstrate the effectiveness of our algorithm both quantitatively and qualitatively on diverse compression methods.
 [48] arXiv:2010.15056 [pdf, other]

Title: Selfawareness in Intelligent Vehicles: Experience Based Abnormality DetectionAuthors: Divya Kanapram, Pablo MarinPlaza, Lucio Marcenaro, David Martin, Arturo de la Escalera, Carlo RegazzoniComments: Robot 2019: Fourth Iberian Robotics ConferenceSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
The evolution of Intelligent Transportation System in recent times necessitates the development of selfdriving agents: the selfawareness consciousness. This paper aims to introduce a novel method to detect abnormalities based on internal crosscorrelation parameters of the vehicle. Before the implementation of Machine Learning, the detection of abnormalities were manually programmed by checking every variable and creating huge nested conditions that are very difficult to track. Nowadays, it is possible to train a Dynamic Bayesian Network (DBN) model to automatically evaluate and detect when the vehicle is potentially misbehaving. In this paper, different scenarios have been set in order to train and test a switching DBN for Perimeter Monitoring Task using a semantic segmentation for the DBN model and Hellinger Distance metric for abnormality measurements.
 [49] arXiv:2010.15088 [pdf, other]

Title: FiniteTime Analysis of Decentralized Stochastic Approximation with Applications in MultiAgent and MultiTask LearningSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Stochastic approximation, a datadriven approach for finding the fixed point of an unknown operator, provides a unified framework for treating many problems in stochastic optimization and reinforcement learning. Motivated by a growing interest in multiagent and multitask learning, we consider in this paper a decentralized variant of stochastic approximation. A network of agents, each with their own unknown operator and data observations, cooperatively find the fixed point of the aggregate operator. The agents work by running a local stochastic approximation algorithm using noisy samples from their operators while averaging their iterates with their neighbors' on a decentralized communication graph. Our main contribution provides a finitetime analysis of this decentralized stochastic approximation algorithm and characterizes the impacts of the underlying communication topology between agents. Our model for the data observed at each agent is that it is sampled from a Markov processes; this lack of independence makes the iterates biased and (potentially) unbounded. Under mild assumptions on the Markov processes, we show that the convergence rate of the proposed methods is essentially the same as if the samples were independent, differing only by a log factor that represents the mixing time of the Markov process. We also present applications of the proposed method on a number of interesting learning problems in multiagent systems, including a decentralized variant of Qlearning for solving multitask reinforcement learning.
 [50] arXiv:2010.15100 [pdf, other]

Title: Evaluating Model Robustness to Dataset ShiftSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
As the use of machine learning in safetycritical domains becomes widespread, the importance of evaluating their safety has increased. An important aspect of this is evaluating how robust a model is to changes in setting or population, which typically requires applying the model to multiple, independent datasets. Since the cost of collecting such datasets is often prohibitive, in this paper, we propose a framework for evaluating this type of robustness using a single, fixed evaluation dataset. We use the original evaluation data to define an uncertainty set of possible evaluation distributions and estimate the algorithm's performance on the "worstcase" distribution within this set. Specifically, we consider distribution shifts defined by conditional distributions, allowing some distributions to shift while keeping other portions of the data distribution fixed. This results in finergrained control over the considered shifts and more plausible worstcase distributions than previous approaches based on covariate shifts. To address the challenges associated with estimation in complex, highdimensional distributions, we derive a "debiased" estimator which maintains $\sqrt{N}$consistency even when machine learning methods with slower convergence rates are used to estimate the nuisance parameters. In experiments on a real medical risk prediction task, we show that this estimator can be used to evaluate robustness and accounts for realistic shifts that cannot be expressed as covariate shift. The proposed framework provides a means for practitioners to proactively evaluate the safety of their models using a single validation dataset.
 [51] arXiv:2010.15110 [pdf, other]

Title: Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent KernelAuthors: Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya GanguliComments: 19 pages, 19 figures, In Advances in Neural Information Processing Systems 34 (NeurIPS 2020)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is wellapproximated by a linear weight expansion of the network at initialization. Standard training, however, diverges from its linearization in ways that are poorly understood. We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a datadependent NTK. We do so through a largescale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics. In multiple neural architectures and datasets, we find these diverse measures evolve in a highly correlated manner, revealing a universal picture of the deep learning process. In this picture, deep network training exhibits a highly chaotic rapid initial transient that within 2 to 3 epochs determines the final linearly connected basin of low loss containing the end point of training. During this chaotic transient, the NTK changes rapidly, learning useful features from the training data that enables it to outperform the standard initial NTK by a factor of 3 in less than 3 to 4 epochs. After this rapid chaotic transient, the NTK changes at constant velocity, and its performance matches that of full network training in 15% to 45% of training time. Overall, our analysis reveals a striking correlation between a diverse set of metrics over training time, governed by a rapid chaotic to stable transition in the first few epochs, that together poses challenges and opportunities for the development of more accurate theories of deep learning.
 [52] arXiv:2010.15114 [pdf, other]

Title: The geometry of integration in text classification RNNsAuthors: Kyle Aitken, Vinay V. Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru MaheswaranathanComments: 9+19 pages, 30 figuresSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Despite the widespread application of recurrent neural networks (RNNs) across a variety of tasks, a unified understanding of how RNNs solve these tasks remains elusive. In particular, it is unclear what dynamical patterns arise in trained RNNs, and how those patterns depend on the training dataset or task. This work addresses these questions in the context of a specific natural language processing task: text classification. Using tools from dynamical systems analysis, we study recurrent networks trained on a battery of both natural and synthetic text classification tasks. We find the dynamics of these trained RNNs to be both interpretable and lowdimensional. Specifically, across architectures and datasets, RNNs accumulate evidence for each class as they process the text, using a lowdimensional attractor manifold as the underlying mechanism. Moreover, the dimensionality and geometry of the attractor manifold are determined by the structure of the training dataset; in particular, we describe how simple wordcount statistics computed on the training dataset can be used to predict these properties. Our observations span multiple architectures and datasets, reflecting a common mechanism RNNs employ to perform text classification. To the degree that integration of evidence towards a decision is a common computational primitive, this work lays the foundation for using dynamical systems techniques to study the inner workings of RNNs.
 [53] arXiv:2010.15116 [pdf, other]

Title: On Graph Neural Networks versus GraphAugmented MLPsSubjects: Machine Learning (cs.LG); Combinatorics (math.CO); Machine Learning (stat.ML)
From the perspective of expressive power, this work compares multilayer Graph Neural Networks (GNNs) with a simplified alternative that we call GraphAugmented MultiLayer Perceptrons (GAMLPs), which first augments node features with certain multihop operators on the graph and then applies an MLP in a nodewise fashion. From the perspective of graph isomorphism testing, we show both theoretically and numerically that GAMLPs with suitable operators can distinguish almost all nonisomorphic graphs, just like the WeifeilerLehman (WL) test. However, by viewing them as nodelevel functions and examining the equivalence classes they induce on rooted graphs, we prove a separation in expressive power between GAMLPs and GNNs that grows exponentially in depth. In particular, unlike GNNs, GAMLPs are unable to count the number of attributed walks. We also demonstrate via community detection experiments that GAMLPs can be limited by their choice of operator family, as compared to GNNs with higher flexibility in learning.
Crosslists for Thu, 29 Oct 20
 [54] arXiv:2010.13891 (crosslist from qfin.CP) [pdf]

Title: Stock Price Prediction Using CNN and LSTMBased Deep Learning ModelsComments: The paper consists of 7 pages, 10 figures, and 5 tables. This is the accepted version of our paper in the IEEE International Conference on Decision Aid Sciences and Applications (DASA'20), November 89, 2020, BahrainSubjects: Computational Finance (qfin.CP); Machine Learning (cs.LG)
Designing robust and accurate predictive models for stock price prediction has been an active area of research for a long time. While on one side, the supporters of the efficient market hypothesis claim that it is impossible to forecast stock prices accurately, many researchers believe otherwise. There exist propositions in the literature that have demonstrated that if properly designed and optimized, predictive models can very accurately and reliably predict future values of stock prices. This paper presents a suite of deep learning based models for stock price prediction. We use the historical records of the NIFTY 50 index listed in the National Stock Exchange of India, during the period from December 29, 2008 to July 31, 2020, for training and testing the models. Our proposition includes two regression models built on convolutional neural networks and three long and short term memory network based predictive models. To forecast the open values of the NIFTY 50 index records, we adopted a multi step prediction technique with walk forward validation. In this approach, the open values of the NIFTY 50 index are predicted on a time horizon of one week, and once a week is over, the actual index values are included in the training set before the model is trained again, and the forecasts for the next week are made. We present detailed results on the forecasting accuracies for all our proposed models. The results show that while all the models are very accurate in forecasting the NIFTY 50 open values, the univariate encoder decoder convolutional LSTM with the previous two weeks data as the input is the most accurate model. On the other hand, a univariate CNN model with previous one week data as the input is found to be the fastest model in terms of its execution speed.
 [55] arXiv:2010.14557 (crosslist from cs.CL) [pdf, other]

Title: DGST: a DualGenerator Network for Text Style TransferComments: Accepted by EMNLP 2020, camera ready versionSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
We propose DGST, a novel and simple DualGenerator network architecture for text Style Transfer. Our model employs two generators only, and does not rely on any discriminators or parallel corpus for training. Both quantitative and qualitative experiments on the Yelp and IMDb datasets show that our model gives competitive performance compared to several strong baselines with more complicated architecture designs.
 [56] arXiv:2010.14570 (crosslist from cs.IR) [pdf, other]

Title: Addressing PurchaseImpression Gap through a Sequential RerankerAuthors: Shubhangi Tandon, Saratchandra Indrakanti, Amit Jaiswal, Svetlana Strunjas, Manojkumar Rangasamy KannadasanSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Large scale eCommerce platforms such as eBay carry a wide variety of inventory and provide several buying choices to online shoppers. It is critical for eCommerce search engines to showcase in the top results the variety and selection of inventory available, specifically in the context of the various buying intents that may be associated with a search query. Search rankers are most commonly powered by learningtorank models which learn the preference between items during training. However, they score items independent of other items at runtime. Although the items placed at top of the results by such scoring functions may be independently optimal, they can be suboptimal as a set. This may lead to a mismatch between the ideal distribution of items in the top results vs what is actually impressed. In this paper, we present methods to address the purchaseimpression gap observed in top search results on eCommerce sites. We establish the ideal distribution of items based on historic shopping patterns. We then present a sequential reranker that methodically reranks top search results produced by a conventional pointwise scoring ranker. The reranker produces a reordered list by sequentially selecting candidates trading off between their independent relevance and potential to address the purchaseimpression gap by utilizing specially constructed features that capture impression distribution of items already added to a reranked list. The sequential reranker enables addressing purchase impression gap with respect to multiple item aspects. Early version of the reranker showed promising lifts in conversion and engagement metrics at eBay. Based on experiments on randomly sampled validation datasets, we observe that the reranking methodology presented produces around 10% reduction in purchaseimpression gap at an average for the top 20 results, while making improvements to conversion metrics.
 [57] arXiv:2010.14571 (crosslist from cs.CL) [pdf, other]

Title: Language ID in the Wild: Unexpected Challenges on the Path to a ThousandLanguage Web Text CorpusComments: Accepted to COLING 2020. 9 pages with 8 page abstractSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Large text corpora are increasingly important for a wide variety of Natural Language Processing (NLP) tasks, and automatic language identification (LangID) is a core technology needed to collect such datasets in a multilingual context. LangID is largely treated as solved in the literature, with models reported that achieve over 90% average F1 on as many as 1,366 languages. We train LangID models on up to 1,629 languages with comparable quality on heldout test sets, but find that humanjudged LangID accuracy for webcrawl text corpora created using these models is only around 5% for many lowerresource languages, suggesting a need for more robust evaluation. Further analysis revealed a variety of error modes, arising from domain mismatch, class imbalance, language similarity, and insufficiently expressive models. We propose two classes of techniques to mitigate these errors: wordlistbased tunableprecision filters (for which we release curated lists in about 500 languages) and transformerbased semisupervised LangID models, which increase median dataset precision from 5.5% to 71.2%. These techniques enable us to create an initial data set covering 100K or more relatively clean sentences in each of 500+ languages, paving the way towards a 1,000language web text corpus.
 [58] arXiv:2010.14575 (crosslist from cs.RO) [pdf]

Title: Learning Time Reduction Using Warm Start Methods for a Reinforcement Learning Based Supervisory Control in Hybrid Electric Vehicle ApplicationsSubjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
Reinforcement Learning (RL) is widely utilized in the field of robotics, and as such, it is gradually being implemented in the Hybrid Electric Vehicle (HEV) supervisory control. Even though RL exhibits excellent performance in terms of fuel consumption minimization in simulation, the large learning iteration number needs a long learning time, making it hardly applicable in realworld vehicles. In addition, the fuel consumption of initial learning phases is much worse than baseline controls. This study aims to reduce the learning iterations of Qlearning in HEV application and improve fuel consumption in initial learning phases utilizing warm start methods. Different from previous studies, which initiated Qlearning with zero or random Q values, this study initiates the Qlearning with different supervisory controls (i.e., Equivalent Consumption Minimization Strategy control and heuristic control), and detailed analysis is given. The results show that the proposed warm start Qlearning requires 68.8% fewer iterations than cold start Qlearning. The trained Qlearning is validated in two different driving cycles, and the results show 1016% MPG improvement when compared to Equivalent Consumption Minimization Strategy control. Furthermore, realtime feasibility is analyzed, and the guidance of vehicle implementation is provided. The results of this study can be used to facilitate the deployment of RL in vehicle supervisory control applications.
 [59] arXiv:2010.14585 (crosslist from eess.SP) [pdf, other]

Title: Nonlinear StateSpace Generalizations of Graph Convolutional Neural NetworksComments: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Graph convolutional neural networks (GCNNs) learn compositional representations from network data by nesting linear graph convolutions into nonlinearities. In this work, we approach GCNNs from a statespace perspective revealing that the graph convolutional module is a minimalistic linear statespace model, in which the state update matrix is the graph shift operator. We show this state update may be problematic because it is nonparametric, and depending on the graph spectrum it may explode or vanish. Therefore, the GCNN has to trade its degrees of freedom between extracting features from data and handling these instabilities. To improve such tradeoff, we propose a novel family of nodal aggregation rules that aggregates node features within a layer in a nonlinear statespace parametric fashion and allowing for a better tradeoff. We develop two architectures within this family inspired by the recursive ideas with and without nodal gating mechanisms. The proposed solutions generalize the GCNN and provide an additional handle to control the state update and learn from the data. Numerical results on source localization and authorship attribution show the superiority of the nonlinear statespace generalization models over the baseline GCNN.
 [60] arXiv:2010.14602 (crosslist from cs.SD) [pdf, ps, other]

Title: CopyPaste: An Augmentation Method for Speech Emotion RecognitionComments: Under ICASSP2021 peerreviewSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Data augmentation is a widely used strategy for training robust machine learning models. It partially alleviates the problem of limited data for tasks like speech emotion recognition (SER), where collecting data is expensive and challenging. This study proposes CopyPaste, a perceptually motivated novel augmentation procedure for SER. Assuming that the presence of emotions other than neutral dictates a speaker's overall perceived emotion in a recording, concatenation of an emotional (emotion E) and a neutral utterance can still be labeled with emotion E. We hypothesize that SER performance can be improved using these concatenated utterances in model training. To verify this, three CopyPaste schemes are tested on two deep learning models: one trained independently and another using transfer learning from an xvector model, a speaker recognition model. We observed that all three CopyPaste schemes improve SER performance on all the three datasets considered: MSPPodcast, CremaD, and IEMOCAP. Additionally, CopyPaste performs better than noise augmentation and, using them together improves the SER performance further. Our experiments on noisy test sets suggested that CopyPaste is effective even in noisy test conditions.
 [61] arXiv:2010.14605 (crosslist from cs.NI) [pdf, other]

Title: Beyond Accuracy: CostAware Data Representation Exploration for Network Traffic Model PerformanceSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
In this paper, we explore how different representations of network traffic affect the performance of machine learning models for a range of network management tasks, including application performance diagnosis and attack detection. We study the relationship between the systemslevel costs of different representations of network traffic to the ultimate target performance metric  e.g., accuracy  of the models trained from these representations. We demonstrate the benefit of exploring a range of representations of network traffic and present Network Microscope, a proofofconcept reference implementation that both monitors network traffic at high speed and transforms the traffic in real time to produce a variety of representations for input to machine learning models. Systems like Network Microscope can ultimately help network operators better explore the design space of data representation for learning, balancing systems costs related to feature extraction and model training against resulting model performance.
 [62] arXiv:2010.14611 (crosslist from cs.NE) [pdf, other]

Title: Hybrid Backpropagation Parallel Reservoir NetworksSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
In many realworld applications, fullydifferentiable RNNs such as LSTMs and GRUs have been widely deployed to solve time series learning tasks. These networks train via Backpropagation Through Time, which can work well in practice but involves a biologically unrealistic unrolling of the network in time for gradient updates, are computationally expensive, and can be hard to tune. A second paradigm, Reservoir Computing, keeps the recurrent weight matrix fixed and random. Here, we propose a novel hybrid network, which we call Hybrid Backpropagation Parallel Echo State Network (HBPESN) which combines the effectiveness of learning random temporal features of reservoirs with the readout power of a deep neural network with batch normalization. We demonstrate that our new network outperforms LSTMs and GRUs, including multilayer "deep" versions of these networks, on two complex realworld multidimensional time series datasets: gesture recognition using skeleton keypoints from ChaLearn, and the DEAP dataset for emotion recognition from EEG measurements. We show also that the inclusion of a novel metaring structure, which we call HBPESN MRing, achieves similar performance to one large reservoir while decreasing the memory required by an order of magnitude. We thus offer this new hybrid reservoir deep learning paradigm as a new alternative direction for RNN learning of temporal or sequential data.
 [63] arXiv:2010.14615 (crosslist from cs.NE) [pdf, ps, other]

Title: Discretetime signatures and randomness in reservoir computingComments: 14 pagesSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
A new explanation of geometric nature of the reservoir computing phenomenon is presented. Reservoir computing is understood in the literature as the possibility of approximating input/output systems with randomly chosen recurrent neural systems and a trained linear readout layer. Light is shed on this phenomenon by constructing what is called strongly universal reservoir systems as random projections of a family of statespace systems that generate Volterra series expansions. This procedure yields a stateaffine reservoir system with randomly generated coefficients in a dimension that is logarithmically reduced with respect to the original system. This reservoir system is able to approximate any element in the fading memory filters class just by training a different linear readout for each different filter. Explicit expressions for the probability distributions needed in the generation of the projected reservoir system are stated and bounds for the committed approximation error are provided.
 [64] arXiv:2010.14616 (crosslist from cs.NE) [pdf, other]

Title: Lineage Evolution Reinforcement LearningSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
We propose a general agent population learning system, and on this basis, we propose lineage evolution reinforcement learning algorithm. Lineage evolution reinforcement learning is a kind of derivative algorithm which accords with the general agent population learning system. We take the agents in DQN and its related variants as the basic agents in the population, and add the selection, mutation and crossover modules in the genetic algorithm to the reinforcement learning algorithm. In the process of agent evolution, we refer to the characteristics of natural genetic behavior, add lineage factor to ensure the retention of potential performance of agent, and comprehensively consider the current performance and lineage value when evaluating the performance of agent. Without changing the parameters of the original reinforcement learning algorithm, lineage evolution reinforcement learning can optimize different reinforcement learning algorithms. Our experiments show that the idea of evolution with lineage improves the performance of original reinforcement learning algorithm in some games in Atari 2600.
 [65] arXiv:2010.14620 (crosslist from cs.SI) [pdf, other]

Title: Correlation Robust Influence MaximizationJournalref: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, CanadaSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Optimization and Control (math.OC)
We propose a distributionally robust model for the influence maximization problem. Unlike the classic independent cascade model \citep{kempe2003maximizing}, this model's diffusion process is adversarially adapted to the choice of seed set. Hence, instead of optimizing under the assumption that all influence relationships in the network are independent, we seek a seed set whose expected influence under the worst correlation, i.e. the "worstcase, expected influence", is maximized. We show that this worstcase influence can be efficiently computed, and though the optimization is NPhard, a ($1  1/e$) approximation guarantee holds. We also analyze the structure to the adversary's choice of diffusion process, and contrast with established models. Beyond the key computational advantages, we also highlight the extent to which the independence assumption may cost optimality, and provide insights from numerical experiments comparing the adversarial and independent cascade model.
 [66] arXiv:2010.14640 (crosslist from cs.DL) [pdf, other]

Title: Improving Text Relationship Modeling with Artificial DataComments: 9 pages, 3 figuresSubjects: Digital Libraries (cs.DL); Machine Learning (cs.LG)
Data augmentation uses artificiallycreated examples to support supervised machine learning, adding robustness to the resulting models and helping to account for limited availability of labelled data. We apply and evaluate a synthetic data approach to relationship classification in digital libraries, generating artificial books with relationships that are common in digital libraries but not easier inferred from existing metadata. We find that for classification on wholepart relationships between books, synthetic data improves a deep neural network classifier by 91%. Further, we consider the ability of synthetic data to learn a useful new text relationship class from fully artificial training data.
 [67] arXiv:2010.14649 (crosslist from cs.CL) [pdf]

Title: Learning Contextualised Crosslingual Word Embeddings for Extremely LowResource Languages Using Parallel CorporaComments: 9 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We propose a new approach for learning contextualised crosslingual word embeddings based only on a small parallel corpus (e.g. a few hundred sentence pairs). Our method obtains word embeddings via an LSTMbased encoderdecoder model that performs bidirectional translation and reconstruction of the input sentence. Through sharing model parameters among different languages, our model jointly trains the word embeddings in a common multilingual space. We also propose a simple method to combine word and subword embeddings to make use of orthographic similarities across different languages. We base our experiments on realworld data from endangered languages, namely Yongning Na, ShipiboKonibo and Griko. Our experiments on bilingual lexicon induction and word alignment tasks show that our model outperforms existing methods by a large margin for most language pairs. These results demonstrate that, contrary to common belief, an encoderdecoder translation model is beneficial for learning crosslingual representations, even in extremely lowresource scenarios.
 [68] arXiv:2010.14660 (crosslist from cs.CL) [pdf, other]

Title: DualTKB: A Dual Learning Bridge between Text and Knowledge BaseComments: Equal Contributions of Authors Pierre L. Dognin, Igor Melnyk, and Inkit Padhi. Accepted at EMNLP'20Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs). We investigate the impact of weak supervision by creating a weakly supervised dataset and show that even a slight amount of supervision can significantly improve the model performance and enable betterquality transfers. We examine different model architectures, and evaluation metrics, proposing a novel Commonsense KB completion metric tailored for generative models. Extensive experimental results show that the proposed method compares very favorably to the existing baselines. This approach is a viable step towards a more advanced system for automatic KB construction/expansion and the reverse operation of KB conversion to coherent textual descriptions.
 [69] arXiv:2010.14694 (crosslist from econ.EM) [pdf, other]

Title: Deep Learning for Individual HeterogeneitySubjects: Econometrics (econ.EM); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
We propose a methodology for effectively modeling individual heterogeneity using deep learning while still retaining the interpretability and economic discipline of classical models. We pair a transparent, interpretable modeling structure with rich data environments and machine learning methods to estimate heterogeneous parameters based on potentially high dimensional or complex observable characteristics. Our framework is widelyapplicable, covering numerous settings of economic interest. We recover, as special cases, wellknown examples such as average treatment effects and parametric components of partially linear models. However, we also seamlessly deliver new results for diverse examples such as price elasticities, willingnesstopay, and surplus measures in choice models, average marginal and partial effects of continuous treatment variables, fractional outcome models, count data, heterogeneous production function components, and more. Deep neural networks are wellsuited to structured modeling of heterogeneity: we show how the network architecture can be designed to match the global structure of the economic model, giving novel methodology for deep learning as well as, more formally, improved rates of convergence. Our results on deep learning have consequences for other structured modeling environments and applications, such as for additive models. Our inference results are based on an influence function we derive, which we show to be flexible enough to to encompass all settings with a single, unified calculation, removing any requirement for casebycase derivations. The usefulness of the methodology in economics is shown in two empirical applications: the response of 410(k) participation rates to firm matching and the impact of prices on subscription choices for an online service. Extensions to instrumental variables and multinomial choices are shown.
 [70] arXiv:2010.14709 (crosslist from cs.SD) [pdf, other]

Title: MelodyConditioned Lyrics Generation with SeqGANsSubjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Automatic lyrics generation has received attention from both music and AI communities for years. Early rulebased approaches have~due to increases in computational power and evolution in datadriven models~mostly been replaced with deeplearningbased systems. Many existing approaches, however, either rely heavily on prior knowledge in music and lyrics writing or oversimplify the task by largely discarding melodic information and its relationship with the text. We propose an endtoend melodyconditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN), which generates a line of lyrics given the corresponding melody as the input. Furthermore, we investigate the performance of the generator with an additional input condition: the theme or overarching topic of the lyrics to be generated. We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
 [71] arXiv:2010.14712 (crosslist from cs.RO) [pdf, other]

Title: SociallyCompatible Behavior Design of Autonomous Vehicles with Verification on Real Human DataComments: 9 pages, 10 figure, submitted to IEEE Robotics and Automation Letters (RAL) and 2021 IEEE International Conference on Robotics and Automation (ICRA 2021)Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As more and more autonomous vehicles (AVs) are being deployed on public roads, designing socially compatible behaviors for them is of critical importance. Based on observations, AVs need to predict the future behaviors of other traffic participants, and be aware of the uncertainties associated with such prediction so that safe, efficient, and humanlike motions can be generated. In this paper, we propose an integrated prediction and planning framework that allows the AVs to online infer the characteristics of other road users and generate behaviors optimizing not only their own rewards, but also their courtesy to others, as well as their confidence on the consequences in the presence of uncertainties. Based on the definitions of courtesy and confidence, we explore the influences of such factors on the behaviors of AVs in interactive driving scenarios. Moreover, we evaluate the proposed algorithm on naturalistic human driving data by comparing the generated behavior with the ground truth. Results show that the online inference can significantly improve the humanlikeness of the generated behaviors. Furthermore, we find that human drivers show great courtesy to others, even for those without rightofway.
 [72] arXiv:2010.14713 (crosslist from cs.CV) [pdf, other]

Title: CompRess: SelfSupervised Learning by Compressing RepresentationsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Selfsupervised learning aims to learn good representations with unlabeled data. Recent works have shown that larger models benefit more from selfsupervised learning than smaller models. As a result, the gap between supervised and selfsupervised learning has been greatly reduced for larger models. In this work, instead of designing a new pseudo task for selfsupervised learning, we develop a model compression method to compress an already learned, deep selfsupervised model (teacher) to a smaller one (student). We train the student model so that it mimics the relative similarity between the data points in the teacher's embedding space. For AlexNet, our method outperforms all previous methods including the fully supervised model on ImageNet linear evaluation (59.0% compared to 56.5%) and on nearest neighbor evaluation (50.7% compared to 41.4%). To the best of our knowledge, this is the first time a selfsupervised AlexNet has outperformed supervised one on ImageNet classification. Our code is available here: https://github.com/UMBCvision/CompRess
 [73] arXiv:2010.14731 (crosslist from cs.CV) [pdf, other]

Title: MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical ImagesComments: 5 pages, 3 figures, 2 tables; Submitted to ISBI 2021Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Semisupervised learning via learning from limited quantities of labeled data has been investigated as an alternative to supervised counterparts. Maximizing knowledge gains from copious unlabeled data benefit semisupervised learning settings. Moreover, learning multiple tasks within the same model further improves model generalizability. We propose a novel multitask learning model, namely MultiMix, which jointly learns disease classification and anatomical segmentation in a sparingly supervised manner, while preserving explainability through bridge saliency between the two tasks. Our extensive experimentation with varied quantities of labeled data in the training sets justify the effectiveness of our multitasking model for the classification of pneumonia and segmentation of lungs from chest Xray images. Moreover, both indomain and crossdomain evaluations across the tasks further showcase the potential of our model to adapt to challenging generalization scenarios.
 [74] arXiv:2010.14734 (crosslist from stat.CO) [pdf, other]

Title: Generalized eigen, singular value, and partial least squares decompositions: The GSVD packageAuthors: Derek Beaton (1) ((1) Rotman Research Institute, Baycrest Health Sciences)Comments: 38 pages, 9 figures, 3 tablesSubjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME)
The generalized singular value decomposition (GSVD, a.k.a. "SVD triplet", "duality diagram" approach) provides a unified strategy and basis to perform nearly all of the most common multivariate analyses (e.g., principal components, correspondence analysis, multidimensional scaling, canonical correlation, partial least squares). Though the GSVD is ubiquitous, powerful, and exible, it has very few implementations. Here I introduce the GSVD package for R. The general goal of GSVD is to provide a small set of accessible functions to perform the GSVD and two other related decompositions (generalized eigenvalue decomposition, generalized partial least squaressingular value decomposition). Furthermore, GSVD helps provide a more unified conceptual approach and nomenclature to many techniques. I first introduce the concept of the GSVD, followed by a formal definition of the generalized decompositions. Next I provide some key decisions made during development, and then a number of examples of how to use GSVD to implement various statistical techniques. These examples also illustrate one of the goals of GSVD: how others can (or should) build analysis packages that depend on GSVD. Finally, I discuss the possible future of GSVD.
 [75] arXiv:2010.14784 (crosslist from cs.CL) [pdf]

Title: A Chinese Text Classification Method With Low Hardware Requirement Based on Improved Model ConcatenationAuthors: Yuanhao ZhuoComments: 5 pages, 2 figures, 5 tablesSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
In order to improve the accuracy performance of Chinese text classification models with low hardware requirements, an improved concatenationbased model is designed in this paper, which is a concatenation of 5 different submodels, including TextCNN, LSTM, and BiLSTM. Compared with the existing ensemble learning method, for a text classification mission, this model's accuracy is 2% higher. Meanwhile, the hardware requirements of this model are much lower than the BERTbased model.
 [76] arXiv:2010.14793 (crosslist from cs.CV) [pdf, other]

Title: ClassAgnostic Segmentation Loss and Its Application to Salient Object Detection and SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In this paper we present a novel loss function, called classagnostic segmentation (CAS) loss. With CAS loss the class descriptors are learned during training of the network. We don't require to define the label of a class apriori, rather the CAS loss clusters regions with similar appearance together in a weaklysupervised manner. Furthermore, we show that the CAS loss function is sparse, bounded, and robust to classimbalance. We apply our CAS loss function with fullyconvolutional ResNet101 and DeepLabv3 architectures to the binary segmentation problem of salient object detection. We investigate the performance against the stateoftheart methods in two settings of low and highfidelity training data on seven salient object detection datasets. For lowfidelity training data (incorrect class label) classagnostic segmentation loss outperforms the stateoftheart methods on salient object detection datasets by staggering margins of around 50%. For highfidelity training data (correct class labels) classagnostic segmentation models perform as good as the stateoftheart approaches while beating the stateoftheart methods on most datasets. In order to show the utility of the loss function across different domains we also test on general segmentation dataset, where classagnostic segmentation loss outperforms crossentropy based loss by huge margins on both region and edge metrics.
 [77] arXiv:2010.14810 (crosslist from cs.CV) [pdf, other]

Title: CycleContrast for SelfSupervised Video Representation LearningComments: 12 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We present CycleContrastive Learning (CCL), a novel selfsupervised method for learning video representation. Following a nature that there is a belong and inclusion relation of video and its frames, CCL is designed to find correspondences across frames and videos considering the contrastive representation in their domains respectively. It is different from recent approaches that merely learn correspondences across frames or clips. In our method, the frame and video representations are learned from a single network based on an R3D architecture, with a shared nonlinear transformation for embedding both frame and video features before the cyclecontrastive loss. We demonstrate that the video representation learned by CCL can be transferred well to downstream tasks of video understanding, outperforming previous methods in nearest neighbour retrieval and action recognition tasks on UCF101, HMDB51 and MMAct.
 [78] arXiv:2010.14824 (crosslist from cs.CG) [pdf]

Title: Explainable Artificial Intelligence for Manufacturing Cost Estimation and Machining Feature VisualizationSubjects: Computational Geometry (cs.CG); Machine Learning (cs.LG)
Studies on manufacturing cost prediction based on deep learning have begun in recent years, but the cost prediction rationale cannot be explained because the models are still used as a black box. This study aims to propose a manufacturing cost prediction process for 3D computeraided design (CAD) models using explainable artificial intelligence. The proposed process can visualize the machining features of the 3D CAD model that are influencing the increase in manufacturing costs. The proposed process consists of (1) data collection and preprocessing, (2) 3D deep learning architecture exploration, and (3) visualization to explain the prediction results. The proposed deep learning model shows high predictability of manufacturing cost for the computer numerical control (CNC) machined parts. In particular, using 3D gradientweighted class activation mapping proves that the proposed model not only can detect the CNC machining features but also can differentiate the machining difficulty for the same feature. Using the proposed process, we can provide a design guidance to engineering designers in reducing manufacturing costs during the conceptual design phase. We can also provide realtime quotations and redesign proposals to online manufacturing platform customers.
 [79] arXiv:2010.14860 (crosslist from stat.ML) [pdf, other]

Title: The Evidence Lower Bound of Variational Autoencoders Converges to a Sum of Three EntropiesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The central objective function of a variational autoencoder (VAE) is its variational lower bound. Here we show that for standard VAEs the variational bound is at convergence equal to the sum of three entropies: the (negative) entropy of the latent distribution, the expected (negative) entropy of the observable distribution, and the average entropy of the variational distributions. Our derived analytical results are exact and apply for small as well as complex neural networks for decoder and encoder. Furthermore, they apply for finite and infinitely many data points and at any stationary point (including local and global maxima). As a consequence, we show that the variance parameters of encoder and decoder play the key role in determining the values of variational bounds at convergence. Furthermore, the obtained results can allow for closedform analytical expressions at convergence, which may be unexpected as neither variational bounds of VAEs nor loglikelihoods of VAEs are closedform during learning. As our main contribution, we provide the proofs for convergence of standard VAEs to sums of entropies. Furthermore, we numerically verify our analytical results and discuss some potential applications. The obtained equality to entropy sums provides novel information on those points in parameter space that variational learning converges to. As such, we believe they can potentially significantly contribute to our understanding of established as well as novel VAE approaches.
 [80] arXiv:2010.14863 (crosslist from condmat.disnn) [pdf, other]

Title: Highdimensional inference: a statistical mechanics perspectiveAuthors: Jean BarbierSubjects: Disordered Systems and Neural Networks (condmat.disnn); Statistical Mechanics (condmat.statmech); Information Theory (cs.IT); Machine Learning (cs.LG)
Statistical inference is the science of drawing conclusions about some system from data. In modern signal processing and machine learning, inference is done in very high dimension: very many unknown characteristics about the system have to be deduced from a lot of highdimensional noisy data. This "highdimensional regime" is reminiscent of statistical mechanics, which aims at describing the macroscopic behavior of a complex system based on the knowledge of its microscopic interactions. It is by now clear that there are many connections between inference and statistical physics. This article aims at emphasizing some of the deep links connecting these apparently separated disciplines through the description of paradigmatic models of highdimensional inference in the language of statistical mechanics. This article has been published in the issue on artificial intelligence of Ithaca, an Italian popularizationofscience journal. The selected topics and references are highly biased and not intended to be exhaustive in any ways. Its purpose is to serve as introduction to statistical mechanics of inference through a very specific angle that corresponds to my own tastes and limited knowledge.
 [81] arXiv:2010.14877 (crosslist from stat.ML) [pdf, other]

Title: Hierarchical Gaussian Processes with Wasserstein2 KernelsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We investigate the usefulness of Wasserstein2 kernels in the context of hierarchical Gaussian Processes. Stemming from an observation that stacking Gaussian Processes severely diminishes the model's ability to detect outliers, which when combined with nonzero mean functions, further extrapolates low variance to regions with low training data density, we posit that directly taking into account the variance in the computation of Wasserstein2 kernels is of key importance towards maintaining outlier status as we progress through the hierarchy. We propose two new models operating in Wasserstein space which can be seen as equivalents to Deep Kernel Learning and Deep GPs. Through extensive experiments, we show improved performance on large scale datasets and improved outofdistribution detection on both toy and real data.
 [82] arXiv:2010.14881 (crosslist from eess.IV) [pdf]

Title: Medical Deep Learning  A systematic MetaReviewComments: 46 pages, 4 tables, 150 referencesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deep learning had a remarkable impact in different scientific disciplines during the last years. This was demonstrated in numerous tasks, where deep learning algorithms were able to outperform the stateofart methods, also in image processing and analysis. Moreover, deep learning delivers good results in tasks like autonomous driving, which could not have been performed automatically before. There are even applications where deep learning outperformed humans, like object recognition or games. Another field in which this development is showing a huge potential is the medical domain. With the collection of large quantities of patient records and data, and a trend towards personalized treatments, there is a great need for an automatic and reliable processing and analysis of this information. Patient data is not only collected in clinical centres, like hospitals, but it relates also to data coming from general practitioners, healthcare smartphone apps or online websites, just to name a few. This trend resulted in new, massive research efforts during the last years. In Q2/2020, the search engine PubMed returns already over 11.000 results for the search term $'$deep learning$'$, and around 90% of these publications are from the last three years. Hence, a complete overview of the field of $'$medical deep learning$'$ is almost impossible to obtain and getting a full overview of medical subfields gets increasingly more difficult. Nevertheless, several review and survey articles about medical deep learning have been presented within the last years. They focused, in general, on specific medical scenarios, like the analysis of medical images containing specific pathologies. With these surveys as foundation, the aim of this contribution is to provide a very first highlevel, systematic metareview of medical deep learning surveys.
 [83] arXiv:2010.14903 (crosslist from cs.CY) [pdf, other]

Title: A general method for estimating the prevalence of InfluenzaLikeSymptoms with Wikipedia dataSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Influenza is an acute respiratory seasonal disease that affects millions of people worldwide and causes thousands of deaths in Europe alone. Being able to estimate in a fast and reliable way the impact of an illness on a given country is essential to plan and organize effective countermeasures, which is now possible by leveraging unconventional data sources like web searches and visits. In this study, we show the feasibility of exploiting information about Wikipedia's page views of a selected group of articles and machine learning models to obtain accurate estimates of influenzalike illnesses incidence in four European countries: Italy, Germany, Belgium, and the Netherlands. We propose a novel languageagnostic method, based on two algorithms, Personalized PageRank and CycleRank, to automatically select the most relevant Wikipedia pages to be monitored without the need for expert supervision. We then show how our model is able to reach stateoftheart results by comparing it with previous solutions.
 [84] arXiv:2010.14921 (crosslist from cs.OH) [pdf]

Title: Comparison Analysis of Tree Based and Ensembled Regression Algorithms for Traffic Accident Severity PredictionSubjects: Other Computer Science (cs.OH); Machine Learning (cs.LG)
Rapid increase of traffic volume on urban roads over time has changed the traffic scenario globally. It has also increased the ratio of road accidents that can be severe and fatal in the worst case. To improve traffic safety and its management on urban roads, there is a need for prediction of severity level of accidents. Various machine learning models are being used for accident prediction. In this study, tree based ensemble models (Random Forest, AdaBoost, Extra Tree, and Gradient Boosting) and ensemble of two statistical models (Logistic Regression Stochastic Gradient Descent) as voting classifiers are compared for prediction of road accident severity. Significant features that are strongly correlated with the accident severity are identified by Random Forest. Analysis proved Random Forest as the best performing model with highest classification results with 0.974 accuracy, 0.954 precision, 0.930 recall and 0.942 Fscore using 20 most significant features as compared to other techniques classification of road accidents severity.
 [85] arXiv:2010.14925 (crosslist from cs.CV) [pdf, other]

Title: MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image AnalysisComments: Code and dataset are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We present MedMNIST, a collection of 10 preprocessed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multiclass, ordinal regression and multilabel). MedMNIST could be used for educational purpose, rapid prototyping, multimodal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including opensource or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.
 [86] arXiv:2010.14928 (crosslist from stat.ML) [pdf, other]

Title: Particle gradient descent model for point process generationSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR)
This paper introduces a generative model for planar point processes in a square window, built upon a single realization of a stationary, ergodic point process observed in this window. Inspired by recent advances in gradient descent methods for maximum entropy models, we propose a method to generate similar point patterns by jointly moving particles of an initial Poisson configuration towards a target counting measure. The target measure is generated via a deterministic gradient descent algorithm, so as to match a set of statistics of the given, observed realization. Our statistics are estimators of the multiscale wavelet phase harmonic covariance, recently proposed in image modeling. They allow one to capture geometric structures through multiscale interactions between wavelet coefficients. Both our statistics and the gradient descent algorithm scale better with the number of observed points than the classical knearest neighbour distances previously used in generative models for point processes, based on the rejection sampling or simulatedannealing. The overall quality of our model is evaluated on point processes with various geometric structures through spectral and topological data analysis.
 [87] arXiv:2010.14933 (crosslist from eess.IV) [pdf, other]

Title: Generative Tomography ReconstructionComments: Accepted as a poster for the NeurIPS 2020 Workshop on Deep Learning and Inverse ProblemsSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Numerical Analysis (math.NA)
We propose an endtoend differentiable architecture for tomography reconstruction that directly maps a noisy sinogram into a denoised reconstruction. Compared to existing approaches our endtoend architecture produces more accurate reconstructions while using less parameters and time. We also propose a generative model that, given a noisy sinogram, can sample realistic reconstructions. This generative model can be used as prior inside an iterative process that, by taking into consideration the physical model, can reduce artifacts and errors in the reconstructions.
 [88] arXiv:2010.14943 (crosslist from eess.SP) [pdf, other]

Title: An Approach for GCI Fusion With Labeled Multitarget DensitiesComments: 12 pages, 6 figures, submitted to IEEE Transactions on Signal ProcessingSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
This paper addresses the Generalized Covariance Intersection (GCI) fusion method for labeled random finite sets. We propose a joint label space for the support of fused labeled random finite sets to represent the label association between different agents, avoiding the label consistency condition for the labelwise GCI fusion algorithm. Specifically, we devise the joint label space by the direct product of all label spaces for each agent. Then we apply the GCI fusion method to obtain the joint labeled multitarget density. The joint labeled RFS is then marginalized into a general labeled RFS, providing that each target is represented by a single Bernoulli component with a unique label. The joint labeled GCI (JLGCI) for fusing LMB RFSs from different agents is demonstrated. We also propose the simplified JLGCI method given the assumption that targets are wellseparated in the scenario. The simulation result presents the effectiveness of label inconsistency and excellent performance in challenging tracking scenarios.
 [89] arXiv:2010.14977 (crosslist from cs.CV) [pdf, other]

Title: Realtime Tropical Cyclone Intensity Estimation by Handling Temporally Heterogeneous Satellite DataComments: under review of AAAI 2021Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Analyzing big geophysical observational data collected by multiple advanced sensors on various satellite platforms promotes our understanding of the geophysical system. For instance, convolutional neural networks (CNN) have achieved great success in estimating tropical cyclone (TC) intensity based on satellite data with fixed temporal frequency (e.g., 3 h). However, to achieve more timely (under 30 min) and accurate TC intensity estimates, a deep learning model is demanded to handle temporallyheterogeneous satellite observations. Specifically, infrared (IR1) and water vapor (WV) images are available under every 15 minutes, while passive microwave rain rate (PMW) is available for about every 3 hours. Meanwhile, the visible (VIS) channel is severely affected by noise and sunlight intensity, making it difficult to be utilized. Therefore, we propose a novel framework that combines generative adversarial network (GAN) with CNN. The model utilizes all data, including VIS and PMW information, during the training phase and eventually uses only the highfrequent IR1 and WV data for providing intensity estimates during the predicting phase. Experimental results demonstrate that the hybrid GANCNN framework achieves comparable precision to the stateoftheart models, while possessing the capability of increasing the maximum estimation frequency from 3 hours to less than 15 minutes.
 [90] arXiv:2010.15040 (crosslist from stat.ML) [pdf, other]

Title: Training Generative Adversarial Networks by Solving Ordinary Differential EquationsAuthors: Chongli Qin, Yan Wu, Jost Tobias Springenberg, Andrew Brock, Jeff Donahue, Timothy P. Lillicrap, Pushmeet KohliSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuoustime dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly stable. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error in discretising the continuous dynamics. We experimentally verify that wellknown ODE solvers (such as RungeKutta) can stabilise training  when combined with a regulariser that controls the integration error. Our approach represents a radical departure from previous methods which typically use adaptive optimisation and stabilisation techniques that constrain the functional space (e.g. Spectral Normalisation). Evaluation on CIFAR10 and ImageNet shows that our method outperforms several strong baselines, demonstrating its efficacy.
 [91] arXiv:2010.15045 (crosslist from cs.NE) [pdf, other]

Title: A multiagent model for growing spiking neural networksComments: 79 pages. Master's thesisSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Artificial Intelligence has looked into biological systems as a source of inspiration. Although there are many aspects of the brain yet to be discovered, neuroscience has found evidence that the connections between neurons continuously grow and reshape as a part of the learning process. This differs from the design of Artificial Neural Networks, that achieve learning by evolving the weights in the synapses between them and their topology stays unaltered through time.
This project has explored rules for growing the connections between the neurons in Spiking Neural Networks as a learning mechanism. These rules have been implemented on a multiagent system for creating simple logic functions, that establish a base for building up more complex systems and architectures. Results in a simulation environment showed that for a given set of parameters it is possible to reach topologies that reproduce the tested functions.
This project also opens the door to the usage of techniques like genetic algorithms for obtaining the best suited values for the model parameters, and hence creating neural networks that can adapt to different functions.  [92] arXiv:2010.15049 (crosslist from eess.AS) [pdf, other]

Title: Optimizing ShortTime Fourier Transform Parameters via Gradient DescentComments: Submitted for ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
The ShortTime Fourier Transform (STFT) has been a staple of signal processing, often being the first step for many audio tasks. A very familiar process when using the STFT is the search for the best STFT parameters, as they often have significant side effects if chosen poorly. These parameters are often defined in terms of an integer number of samples, which makes their optimization nontrivial. In this paper we show an approach that allows us to obtain a gradient for STFT parameters with respect to arbitrary cost functions, and thus enable the ability to employ gradient descent optimization of quantities like the STFT window length, or the STFT hop size. We do so for parameter values that stay constant throughout an input, but also for cases where these parameters have to dynamically change over time to accommodate varying signal characteristics.
 [93] arXiv:2010.15058 (crosslist from cs.NE) [pdf, other]

Title: Measuring nontrivial compositionality in emergent communicationComments: 4th Workshop on Emergent Communication, NeurIPS 2020Subjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Compositionality is an important explanatory target in emergent communication and language evolution. The vast majority of computational models of communication account for the emergence of only a very basic form of compositionality: trivial compositionality. A compositional protocol is trivially compositional if the meaning of a complex signal (e.g. blue circle) boils down to the intersection of meanings of its constituents (e.g. the intersection of the set of blue objects and the set of circles). A protocol is nontrivially compositional (NTC) if the meaning of a complex signal (e.g. biggest apple) is a more complex function of the meanings of their constituents. In this paper, we review several metrics of compositionality used in emergent communication and experimentally show that most of them fail to detect NTC  i.e. they treat nontrivial compositionality as a failure of compositionality. The one exception is tree reconstruction error, a metric motivated by formal accounts of compositionality. These results emphasise important limitations of emergent communication research that could hamper progress on modelling the emergence of NTC.
 [94] arXiv:2010.15065 (crosslist from qbio.BM) [pdf, other]

Title: FixedLength Protein Embeddings using Contextual LensesSubjects: Biomolecules (qbio.BM); Computation and Language (cs.CL); Machine Learning (cs.LG)
The Basic Local Alignment Search Tool (BLAST) is currently the most popular method for searching databases of biological sequences. BLAST compares sequences via similarity defined by a weighted edit distance, which results in it being computationally expensive. As opposed to working with edit distance, a vector similarity approach can be accelerated substantially using modern hardware or hashing techniques. Such an approach would require fixedlength embeddings for biological sequences. There has been recent interest in learning fixedlength protein embeddings using deep learning models under the hypothesis that the hidden layers of supervised or semisupervised models could produce potentially useful vector embeddings. We consider transformer (BERT) protein language models that are pretrained on the TrEMBL data set and learn fixedlength embeddings on top of them with contextual lenses. The embeddings are trained to predict the family a protein belongs to for sequences in the Pfam database. We show that for nearestneighbor family classification, pretraining offers a noticeable boost in performance and that the corresponding learned embeddings are competitive with BLAST. Furthermore, we show that the raw transformer embeddings, obtained via static pooling, do not perform well on nearestneighbor family classification, which suggests that learning embeddings in a supervised manner via contextual lenses may be a computeefficient alternative to finetuning.
 [95] arXiv:2010.15067 (crosslist from cs.CL) [pdf, ps, other]

Title: Graphbased Topic Extraction from Vector Embeddings of Text Documents: Application to a Corpus of News ArticlesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify unstructured corpora of text into `topics' that stem intrinsically from content similarity. Here we present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning that can reveal natural partitions at different resolutions without making a priori assumptions about the number of clusters in the corpus. We show the advantages of graphbased clustering through endtoend comparisons with other popular clustering and topic modelling methods, and also evaluate different text vector embeddings, from classic BagofWords to Doc2Vec to the recent transformers based model Bert. This comparative work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.
 [96] arXiv:2010.15090 (crosslist from cs.CL) [pdf, other]

Title: Handling Class Imbalance in LowResource Dialogue Systems by Combining FewShot Classification and InterpolationComments: 5 pages, 4 figures, 3 tablesSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Utterance classification performance in lowresource dialogue systems is constrained by an inevitably high degree of data imbalance in class labels. We present a new endtoend pairwise learning framework that is designed specifically to tackle this phenomenon by inducing a fewshot classification capability in the utterance representations and augmenting data through an interpolation of utterance representations. Our approach is a general purpose training methodology, agnostic to the neural architecture used for encoding utterances. We show significant improvements in macroF1 score over standard crossentropy training for three different neural architectures, demonstrating improvements on a Virtual Patient dialogue dataset as well as a lowresourced emulation of the Switchboard dialogue act classification dataset.
 [97] arXiv:2010.15111 (crosslist from qfin.ST) [pdf, other]

Title: Evaluating data augmentation for financial time series classificationSubjects: Statistical Finance (qfin.ST); Machine Learning (cs.LG)
Data augmentation methods in combination with deep neural networks have been used extensively in computer vision on classification tasks, achieving great success; however, their use in time series classification is still at an early stage. This is even more so in the field of financial prediction, where data tends to be small, noisy and nonstationary. In this paper we evaluate several augmentation methods applied to stocks datasets using two stateoftheart deep learning models. The results show that several augmentation methods significantly improve financial performance when used in combination with a trading strategy. For a relatively small dataset ($\approx30K$ samples), augmentation methods achieve up to $400\%$ improvement in risk adjusted return performance; for a larger stock dataset ($\approx300K$ samples), results show up to $40\%$ improvement.
Replacements for Thu, 29 Oct 20
 [98] arXiv:1811.12823 (replaced) [pdf, other]

Title: Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation ModelsAuthors: Daniil Polykovskiy, Alexander Zhebrak, Benjamin SanchezLengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, Artur Kadurin, Simon Johansson, Hongming Chen, Sergey Nikolenko, Alan AspuruGuzik, Alex ZhavoronkovSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (stat.ML)
 [99] arXiv:1903.11991 (replaced) [pdf, other]

Title: Parabolic Approximation Line Search for DNNsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [100] arXiv:1910.09089 (replaced) [pdf, other]

Title: Multiplayer MultiArmed Bandits with nonzero rewards on collisions for uncoordinated spectrum accessSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
 [101] arXiv:2001.03985 (replaced) [pdf, other]

Title: Unbiased and Efficient LogLikelihood Estimation with Inverse Binomial SamplingComments: Bas van Opheusden and Luigi Acerbi contributed equally to this workSubjects: Machine Learning (cs.LG); Neurons and Cognition (qbio.NC); Quantitative Methods (qbio.QM); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)
 [102] arXiv:2003.02821 (replaced) [pdf, other]

Title: What went wrong and when? Instancewise Feature Importance for Timeseries ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [103] arXiv:2003.03977 (replaced) [pdf, other]

Title: Wideminima Density Hypothesis and the ExploreExploit Learning Rate ScheduleSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [104] arXiv:2004.03083 (replaced) [pdf, other]

Title: Direct loss minimization algorithms for sparse Gaussian processesComments: 31 pages, 16 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [105] arXiv:2006.01893 (replaced) [pdf, other]

Title: Unsupervised Discretization by Twodimensional MDLbased HistogramComments: 30 pages, 9 figures, submitted to Machine Learning JournalSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [106] arXiv:2006.07214 (replaced) [pdf, other]

Title: Sparse and Continuous Attention MechanismsAuthors: André F. T. Martins, Marcos Treviso, António Farinhas, Vlad Niculae, Mário A. T. Figueiredo, Pedro M. Q. AguiarComments: Accepted for spotlight presentation at NeurIPS 2020Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [107] arXiv:2006.07361 (replaced) [pdf, other]

Title: Gaussian Processes on Graphs via Spectral Kernel LearningComments: 13 pages, 5 FiguresSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
 [108] arXiv:2006.07507 (replaced) [pdf, other]

Title: Better Parameterfree Stochastic Optimization with ODE Updates for CoinBettingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [109] arXiv:2006.07710 (replaced) [pdf, other]

Title: The Pitfalls of Simplicity Bias in Neural NetworksComments: NeurIPS 2020Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [110] arXiv:2006.08149 (replaced) [pdf, other]

Title: GNNGuard: Defending Graph Neural Networks against Adversarial AttacksComments: Accepted by NeurIPS 2020. More info about GNNGuard: this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [111] arXiv:2006.08877 (replaced) [pdf, other]

Title: Practical QuasiNewton Methods for Training Deep Neural NetworksSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [112] arXiv:2006.10255 (replaced) [pdf, other]
 [113] arXiv:2006.12972 (replaced) [pdf, ps, other]

Title: Sparse Symplectically Integrated Neural NetworksComments: Accepted as a conference paper to NeurIPS 2020. Main paper has 9 pages and 4 figuresSubjects: Machine Learning (cs.LG); Computational Physics (physics.compph); Machine Learning (stat.ML)
 [114] arXiv:2007.00211 (replaced) [pdf, other]

Title: Ultrahyperbolic Representation LearningComments: NeurIPS 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [115] arXiv:2007.08199 (replaced) [pdf, other]

Title: Learning from Noisy Labels with Deep Neural Networks: A SurveyComments: If your paper is highly related, but it is missing, please contact me: songhwanjun@kaist.ac.krSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [116] arXiv:2007.11838 (replaced) [pdf, other]

Title: PClean: Bayesian Data Cleaning at Scale with DomainSpecific Probabilistic ProgrammingComments: Correct formatting errorSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation (stat.CO); Machine Learning (stat.ML)
 [117] arXiv:2008.00645 (replaced) [pdf, other]

Title: Active Classification with Uncertainty Comparison QueriesComments: Code and Dataset: this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [118] arXiv:2008.00938 (replaced) [pdf, other]

Title: Implicit Regularization via Neural Feature AlignmentAuthors: Aristide Baratin, Thomas George, César Laurent, R Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon LacosteJulienComments: 27 pages including appendices. Submitted to AISTATS 2021. A preliminary version of this work has been presented at the NeurIPS 2019 Workshops on "Machine Learning with Guarantees" and "Science meets Engineering of Deep Learning"Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [119] arXiv:2009.00707 (replaced) [pdf, ps, other]

Title: Pursuing a Prospective PerspectiveAuthors: Steven KearnesSubjects: Machine Learning (cs.LG)
 [120] arXiv:2009.12682 (replaced) [pdf, other]

Title: DecisionAware Conditional GANs for Time Series DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [121] arXiv:2010.05113 (replaced) [pdf]

Title: Contrastive Representation Learning: A Framework and ReviewComments: 28 pages, 9 figures, update with the accepted version in IEEE AccessSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [122] arXiv:2010.07154 (replaced) [pdf, other]

Title: Learning Deep Features in Instrumental Variable RegressionAuthors: Liyuan Xu, Yutian Chen, Siddarth Srinivasan, Nando de Freitas, Arnaud Doucet, Arthur GrettonSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [123] arXiv:2010.09546 (replaced) [pdf, other]

Title: Modelbased Policy Optimization with Unsupervised Model AdaptationComments: Thirtyfourth Conference on Neural Information Processing Systems (NeurIPS 2020)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [124] arXiv:2010.13764 (replaced) [pdf, other]

Title: Enforcing Interpretability and its Statistical Impacts: Tradeoffs between Accuracy and InterpretabilityComments: 12 pages; minor editsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [125] arXiv:1902.04495 (replaced) [pdf, other]

Title: The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential PrivacyComments: 33 pages, 4 figuresSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
 [126] arXiv:1905.11814 (replaced) [pdf, other]

Title: Shredder: Learning Noise Distributions to Protect Inference PrivacyAuthors: Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani, Dean Tullsen, Hadi EsmaeilzadehComments: Presented in ASPLOS 2020Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [127] arXiv:1906.01558 (replaced) [pdf, other]

Title: Disentangling neural mechanisms for perceptual groupingComments: Published in ICLR 2020Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
 [128] arXiv:1906.02944 (replaced) [pdf, other]

Title: Learning Adaptive Classifiers Synthesis for Generalized FewShot LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [129] arXiv:1908.11472 (replaced) [pdf, other]

Title: Kinematic Single Vehicle Trajectory Prediction Baselines and Applications with the NGSIM DatasetSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [130] arXiv:1909.09541 (replaced) [pdf, other]

Title: A Transfer Learning Approach for Automated Segmentation of Prostate Whole Gland and Transition Zone in Diffusion Weighted MRIAuthors: Saman Motamed, Isha Gujrathi, Dominik Deniffel, Anton Oentoro, Masoom A. Haider, Farzad KhalvatiSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantitative Methods (qbio.QM)
 [131] arXiv:2001.10280 (replaced) [pdf, other]

Title: Reservoir computing model of twodimensional turbulent convectionComments: 16 pages, 12 figuresSubjects: Fluid Dynamics (physics.fludyn); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
 [132] arXiv:2002.00291 (replaced) [pdf, ps, other]

Title: Oracle lower bounds for stochastic gradient sampling algorithmsComments: 21 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [133] arXiv:2002.03657 (replaced) [pdf, other]

Title: Semialgebraic Optimization for Lipschitz Constants of ReLU NetworksComments: NeurIPS 2020Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
 [134] arXiv:2002.05049 (replaced) [pdf, other]

Title: Detect and Correct Bias in MultiSite Neuroimaging DatasetsJournalref: Medical Image Analysis, 2020Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
 [135] arXiv:2002.10107 (replaced) [pdf]

Title: Predicting Subjective Features of Questions of QA Websites using BERTComments: 5 pages, 4 figures, 2 tablesJournalref: 2020 6th International Conference on Web Research (ICWR), Tehran, Iran, 2020, pp. 240244Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [136] arXiv:2002.11843 (replaced) [pdf, ps, other]

Title: A Deep Unsupervised Feature Learning Spiking Neural Network with Binarized Classification Layers for EMNIST Classification using SpykeFlowComments: A section of of this work is Submitted to IEEE TETCI 2020 JournalSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (qbio.NC)
 [137] arXiv:2003.02395 (replaced) [pdf, other]

Title: A Simple Convergence Proof of Adam and AdagradComments: 24 pages, 1 figures, preprint versionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [138] arXiv:2003.03167 (replaced) [pdf, other]

Title: When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs)Authors: Victor VillenaMartinez, Sergiu Oprea, Marcelo SavalCalvo, Jorge AzorinLopez, Andres FusterGuillo, Robert B. FisherComments: Published in Applied SciencesJournalref: Appl. Sci. 2020, 10(21), 7524Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
 [139] arXiv:2003.08355 (replaced) [pdf, other]

Title: Dynamic Point Cloud Denoising via ManifoldtoManifold DistanceSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
 [140] arXiv:2004.08046 (replaced) [pdf, other]

Title: Active Sentence Learning by Adversarial Uncertainty Sampling in Discrete SpaceComments: Accepted to EMNLP 2020 FindingsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
 [141] arXiv:2005.00159 (replaced) [pdf, other]

Title: Why and when should you pool? Analyzing Pooling in Recurrent ArchitecturesComments: Accepted to Findings of EMNLP 2020, to be presented at BlackBoxNLP. Updated VersionSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
 [142] arXiv:2005.12368 (replaced) [pdf, other]

Title: FT Speech: Danish Parliament Speech CorpusComments: Accepted at Interspeech 2020Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
 [143] arXiv:2006.05168 (replaced) [pdf, other]

Title: Manifold structure in graph embeddingsAuthors: Patrick RubinDelanchySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [144] arXiv:2006.05645 (replaced) [pdf, other]

Title: Hypergraph Clustering for Finding Diverse and Experienced GroupsComments: Added new experiments and refocused around diversitySubjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Physics and Society (physics.socph); Machine Learning (stat.ML)
 [145] arXiv:2006.07397 (replaced) [pdf, other]

Title: The DeepFake Detection Challenge (DFDC) DatasetAuthors: Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, Cristian Canton FerrerSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [146] arXiv:2006.07506 (replaced) [pdf, other]

Title: Uncertainty Quantification for Inferring Hawkes NetworksComments: 16 pages including appendix, 1 figure, accepted to 2020 NeuripsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [147] arXiv:2006.11132 (replaced) [pdf, other]

Title: Deep TransformationInvariant ClusteringComments: Accepted at NeurIPS 2020 (oral). Project webpage: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [148] arXiv:2006.11313 (replaced) [pdf, other]

Title: Information theoretic limits of learning a sparse ruleComments: 56 pages, 4 figures, accepted to the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Extended version that includes the supplementary materialSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [149] arXiv:2006.12504 (replaced) [pdf, other]

Title: The GCE in a New Light: Disentangling the $γ$ray Sky with Bayesian Graph Convolutional Neural NetworksComments: 7+47 pages, 2+36 figures, accepted by Phys. Rev. LettSubjects: High Energy Astrophysical Phenomena (astroph.HE); Cosmology and Nongalactic Astrophysics (astroph.CO); Instrumentation and Methods for Astrophysics (astroph.IM); Machine Learning (cs.LG); High Energy Physics  Phenomenology (hepph)
 [150] arXiv:2007.01722 (replaced) [pdf, ps, other]
 [151] arXiv:2007.03285 (replaced) [pdf, other]

Title: Stochastic Linear Bandits Robust to Adversarial AttacksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [152] arXiv:2007.08926 (replaced) [pdf, ps, other]

Title: Smart Choices and the Selection MonadSubjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG); Programming Languages (cs.PL)
 [153] arXiv:2007.09170 (replaced) [pdf, other]

Title: Moving fast and slow: Analysis of representations and postprocessing in speechdriven automatic gesture generationComments: Extension of our IVA'19 paper. Submitted to the International Journal of HumanComputer Interaction. arXiv admin note: substantial text overlap with arXiv:1903.03369Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); HumanComputer Interaction (cs.HC); Machine Learning (cs.LG)
 [154] arXiv:2009.01325 (replaced) [pdf, other]

Title: Learning to summarize from human feedbackAuthors: Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M. Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, Paul ChristianoComments: NeurIPS 2020 camera readySubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [155] arXiv:2009.05859 (replaced) [pdf, other]

Title: Towards Automatic Manipulation of Intracardiac Echocardiography CatheterAuthors: YoungHo Kim, Jarrod Collins, Zhongyu Li, Ponraj Chinnadurai, Ankur Kapoor, C. Huie Lin, Tommaso MansiSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [156] arXiv:2009.07964 (replaced) [pdf, other]

Title: Tasty Burgers, Soggy Fries: Probing Aspect Robustness in AspectBased Sentiment AnalysisComments: EMNLP 2020, long paperSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
 [157] arXiv:2009.08267 (replaced) [pdf, other]

Title: Integration of AI and mechanistic modeling in generative adversarial networks for stochastic inverse problemsComments: New appendixSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [158] arXiv:2010.00770 (replaced) [pdf, other]

Title: XDA: Accurate, Robust Disassembly with Transfer LearningComments: To appear in 2021 Network and Distributed System Security Symposium (NDSS 2021)Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
 [159] arXiv:2010.02847 (replaced) [pdf, other]

Title: Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base PairsComments: Accepted at AACLIJCNLP 2020Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [160] arXiv:2010.09895 (replaced) [pdf, other]

Title: MultiWindow Data Augmentation Approach for Speech Emotion RecognitionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
 [161] arXiv:2010.10569 (replaced) [pdf, other]

Title: Bayesian Algorithms for Decentralized Stochastic BanditsComments: Submitted to IEEE Journal on Selected Areas in Information Theory (JSAIT) issue on Sequential, Active, and Reinforcement LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [162] arXiv:2010.11910 (replaced) [pdf, other]

Title: Neural Audio Fingerprint for Highspecific Audio Retrieval based on Contrastive LearningAuthors: Sungkyun Chang, Donmoon Lee, Jeongsoo Park, Hyungui Lim, Kyogu Lee, Karam Ko, Yoonchang HanComments: submitted to ICASSP 2021Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
 [163] arXiv:2010.13118 (replaced) [pdf, other]

Title: Monocular Depth Estimation via Listwise Ranking using the PlackettLuce ModelComments: 9 pages of content, 11 pages in total, 1 figure, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [164] arXiv:2010.13483 (replaced) [pdf, other]

Title: High Acceleration Reinforcement Learning for RealWorld Juggling with Binary RewardsComments: Published at Conference on Robot Learning (CoRL) 2020Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [165] arXiv:2010.13787 (replaced) [pdf, other]

Title: Hierarchical Inference With Bayesian Neural Networks: An Application to Strong Gravitational LensingAuthors: Sebastian WagnerCarena, Ji Won Park, Simon Birrer, Philip J. Marshall, Aaron Roodman, Risa H. Wechsler (for the LSST Dark Energy Science Collaboration)Comments: Code available at this https URLSubjects: Instrumentation and Methods for Astrophysics (astroph.IM); Machine Learning (cs.LG)
 [166] arXiv:2010.13887 (replaced) [pdf, other]

Title: LightSeq: A High Performance Inference Library for Sequence Processing and GenerationComments: 6 pages, 8 figuresSubjects: Mathematical Software (cs.MS); Machine Learning (cs.LG)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2010, contact, help (Access key information)