We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 77 entries: 1-77 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 30 Sep 20

[1]  arXiv:2009.13831 [pdf, other]
Title: Testing for Normality with Neural Networks
Authors: Milos Simic
Comments: 50 pages, 10 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

In this paper, we treat the problem of testing for normality as a binary classification problem and construct a feedforward neural network that can successfully detect normal distributions by inspecting small samples from them. The numerical experiments conducted on small samples with no more than 100 elements indicated that the neural network which we trained was more accurate and far more powerful than the most frequently used and most powerful standard tests of normality: Shapiro-Wilk, Anderson-Darling, Lilliefors and Jarque-Berra, as well as the kernel tests of goodness-of-fit. The neural network had the AUROC score of almost 1, which corresponds to the perfect binary classifier. Additionally, the network's accuracy was higher than 96% on a set of larger samples with 250-1000 elements. Since the normality of data is an assumption of numerous techniques for analysis and inference, the neural network constructed in this study has a very high potential for use in everyday practice of statistics, data analysis and machine learning in both science and industry.

[2]  arXiv:2009.13881 [pdf, ps, other]
Title: Lipschitz neural networks are dense in the set of all Lipschitz functions
Authors: Stephan Eckstein
Comments: 7 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)

This note shows that, for a fixed Lipschitz constant $L > 0$, one layer neural networks that are $L$-Lipschitz are dense in the set of all $L$-Lipschitz functions with respect to the uniform norm on bounded sets.

[3]  arXiv:2009.13961 [pdf, ps, other]
Title: Online Action Learning in High Dimensions: A New Exploration Rule for Contextual $ε_t$-Greedy Heuristics
Comments: 43 pages, 9 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)

Bandit problems are pervasive in various fields of research and are also present in several practical applications. Examples, including dynamic pricing and assortment and the design of auctions and incentives, permeate a large number of sequential treatment experiments. Different applications impose distinct levels of restrictions on viable actions. Some favor diversity of outcomes, while others require harmful actions to be closely monitored or mainly avoided. In this paper, we extend one of the most popular bandit solutions, the original $\epsilon_t$-greedy heuristics, to high-dimensional contexts. Moreover, we introduce a competing exploration mechanism that counts with searching sets based on order statistics. We view our proposals as alternatives for cases where pluralism is valued or, in the opposite direction, cases where the end-user should carefully tune the range of exploration of new actions. We find reasonable bounds for the cumulative regret of a decaying $\epsilon_t$-greedy heuristic in both cases and we provide an upper bound for the initialization phase that implies the regret bounds when order statistics are considered to be at most equal but mostly better than the case when random searching is the sole exploration mechanism. Additionally, we show that end-users have sufficient flexibility to avoid harmful actions since any cardinality for the higher-order statistics can be used to achieve an stricter upper bound. In a simulation exercise, we show that the algorithms proposed in this paper outperform simple and adapted counterparts.

Cross-lists for Wed, 30 Sep 20

[4]  arXiv:2009.13562 (cross-list from cs.LG) [pdf, other]
Title: STRATA: Building Robustness with a Simple Method for Generating Black-box Adversarial Attacks for Models of Code
Comments: 13 pages, 3 figures, 10 tables
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Adversarial examples are imperceptible perturbations in the input to a neural model that result in misclassification. Generating adversarial examples for source code poses an additional challenge compared to the domains of images and natural language, because source code perturbations must adhere to strict semantic guidelines so the resulting programs retain the functional meaning of the code. We propose a simple and efficient black-box method for generating state-of-the-art adversarial examples on models of code. Our method generates untargeted and targeted attacks, and empirically outperforms competing gradient-based methods with less information and less computational effort. We also use adversarial training to construct a model robust to these attacks; our attack reduces the F1 score of code2seq by 42%. Adversarial training brings the F1 score on adversarial examples up to 99% of baseline.

[5]  arXiv:2009.13566 (cross-list from cs.LG) [pdf, other]
Title: Graph Neural Networks with Heterophily
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Graph Neural Networks (GNNs) have proven to be useful for many different practical applications. However, most existing GNN models have an implicit assumption of homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily. In this work, we propose a novel framework called CPGNN that generalizes GNNs for graphs with either homophily or heterophily. The proposed framework incorporates an interpretable compatibility matrix for modeling the heterophily or homophily level in the graph, which can be learned in an end-to-end fashion, enabling it to go beyond the assumption of strong homophily. Theoretically, we show that replacing the compatibility matrix in our framework with the identity (which represents pure homophily) reduces to GCN. Our extensive experiments demonstrate the effectiveness of our approach in more realistic and challenging experimental settings with significantly less training data compared to previous works: CPGNN variants achieve state-of-the-art results in heterophily settings with or without contextual node features, while maintaining comparable performance in homophily settings.

[6]  arXiv:2009.13579 (cross-list from cs.LG) [pdf, other]
Title: Novelty Search in representational space for sample efficient exploration
Comments: 9 pages + references + appendix. Oral presentation at NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.

[7]  arXiv:2009.13586 (cross-list from cs.LG) [pdf, other]
Title: Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
Authors: Xuezhe Ma
Comments: First version of preprint. 15 pages (plus appendix), 4 figures, 4 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix. Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite. Experiments on three tasks of CV and NLP show that Apollo achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in term of both convergence speed and generalization performance. The implementation of the algorithm is available at https://github.com/XuezheMax/apollo.

[8]  arXiv:2009.13598 (cross-list from cs.LG) [pdf, ps, other]
Title: Anomaly Detection and Sampling Cost Control via Hierarchical GANs
Comments: 6 pages, 7 figures, has been accepted by Globecom 2020
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

Anomaly detection incurs certain sampling and sensing costs and therefore it is of great importance to strike a balance between the detection accuracy and these costs. In this work, we study anomaly detection by considering the detection of threshold crossings in a stochastic time series without the knowledge of its statistics. To reduce the sampling cost in this detection process, we propose the use of hierarchical generative adversarial networks (GANs) to perform nonuniform sampling. In order to improve the detection accuracy and reduce the delay in detection, we introduce a buffer zone in the operation of the proposed GAN-based detector. In the experiments, we analyze the performance of the proposed hierarchical GAN detector considering the metrics of detection delay, miss rates, average cost of error, and sampling ratio. We identify the tradeoffs in the performance as the buffer zone sizes and the number of GAN levels in the hierarchy vary. We also compare the performance with that of a sampling policy that approximately minimizes the sum of average costs of sampling and error given the parameters of the stochastic process. We demonstrate that the proposed GAN-based detector can have significant performance improvements in terms of detection delay and average cost of error with a larger buffer zone but at the cost of increased sampling rates.

[9]  arXiv:2009.13697 (cross-list from cs.LG) [pdf, ps, other]
Title: A Fast Graph Neural Network-Based Method for Winner Determination in Multi-Unit Combinatorial Auctions
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)

The combinatorial auction (CA) is an efficient mechanism for resource allocation in different fields, including cloud computing. It can obtain high economic efficiency and user flexibility by allowing bidders to submit bids for combinations of different items instead of only for individual items. However, the problem of allocating items among the bidders to maximize the auctioneers" revenue, i.e., the winner determination problem (WDP), is NP-complete to solve and inapproximable. Existing works for WDPs are generally based on mathematical optimization techniques and most of them focus on the single-unit WDP, where each item only has one unit. On the contrary, few works consider the multi-unit WDP in which each item may have multiple units. Given that the multi-unit WDP is more complicated but prevalent in cloud computing, we propose leveraging machine learning (ML) techniques to develop a novel low-complexity algorithm for solving this problem with negligible revenue loss. Specifically, we model the multi-unit WDP as an augmented bipartite bid-item graph and use a graph neural network (GNN) with half-convolution operations to learn the probability of each bid belonging to the optimal allocation. To improve the sample generation efficiency and decrease the number of needed labeled instances, we propose two different sample generation processes. We also develop two novel graph-based post-processing algorithms to transform the outputs of the GNN into feasible solutions. Through simulations on both synthetic instances and a specific virtual machine (VM) allocation problem in a cloud computing platform, we validate that our proposed method can approach optimal performance with low complexity and has good generalization ability in terms of problem size and user-type distribution.

[10]  arXiv:2009.13714 (cross-list from cs.LG) [pdf, other]
Title: Learned Fine-Tuner for Incongruous Few-Shot Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Model-agnostic meta-learning (MAML) effectively meta-learns an initialization of model parameters for few-shot learning where all learning problems share the same format of model parameters -- congruous meta-learning. We extend MAML to incongruous meta-learning where different yet related few-shot learning problems may not share any model parameters. In this setup, we propose the use of a Learned Fine Tuner (LFT) to replace hand-designed optimizers (such as SGD) for the task-specific fine-tuning. The meta-learned initialization in MAML is replaced by learned optimizers based on the learning-to-optimize (L2O) framework to meta-learn across incongruous tasks such that models fine-tuned with LFT (even from random initializations) adapt quickly to new tasks. The introduction of LFT within MAML (i) offers the capability to tackle few-shot learning tasks by meta-learning across incongruous yet related problems (e.g., classification over images of different sizes and model architectures), and (ii) can {efficiently} work with first-order and derivative-free few-shot learning problems. Theoretically, we quantify the difference between LFT (for MAML) and L2O. Empirically, we demonstrate the effectiveness of LFT through both synthetic and real problems and a novel application of generating universal adversarial attacks across different image sources in the few-shot learning regime.

[11]  arXiv:2009.13734 (cross-list from cs.LG) [pdf, other]
Title: New GCNN-Based Architecture for Semi-Supervised Node Classification
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

The nodes of a graph existing in a specific cluster are more likely to connect to each other than with other nodes in the graph. Then revealing some information about the nodes, the structure of the graph (the graph edges) provides this opportunity to know more information about the other nodes. From this perspective, this paper revisits the node classification task in a semi-supervised scenario by graph convolutional neural network. The goal is to benefit from the flow of information that circulates around the revealed node labels. For this aim, this paper provides a new graph convolutional neural network architecture. This architecture benefits efficiently from the revealed training nodes, the node features, and the graph structure. On the other hand, in many applications, non-graph observations (side information) exist beside a given graph realization. The non-graph observations are usually independent of the graph structure. This paper shows that the proposed architecture is also powerful in combining a graph realization and independent non-graph observations. For both cases, the experiments on the synthetic and real-world datasets demonstrate that our proposed architecture achieves a higher prediction accuracy in comparison to the existing state-of-the-art methods for the node classification task.

[12]  arXiv:2009.13736 (cross-list from cs.LG) [pdf, other]
Title: Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy
Comments: 20 pages (with appendices), 5 figures, preprint
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Experience replay (ER) improves the data efficiency of off-policy reinforcement learning (RL) algorithms by allowing an agent to store and reuse its past experiences in a replay buffer. While many techniques have been proposed to enhance ER by biasing how experiences are sampled from the buffer, thus far they have not considered strategies for refreshing experiences inside the buffer. In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy. LiDER 1) moves an agent back to a past state; 2) lets the agent try following its current policy to execute different actions---as if the agent were "dreaming" about the past, but is aware of the situation and can control the dream to encounter new experiences; and 3) stores and reuses the new experience if it turned out better than what the agent previously experienced, i.e., to refresh its memories. LiDER is designed to be easily incorporated into off-policy, multi-worker RL algorithms that use ER; we present in this work a case study of applying LiDER to an actor-critic based algorithm. Results show LiDER consistently improves performance over the baseline in four Atari 2600 games. Our open-source implementation of LiDER and the data used to generate all plots in this paper are available at github.com/duyunshu/lucid-dreaming-for-exp-replay.

[13]  arXiv:2009.13794 (cross-list from cs.SI) [pdf, other]
Title: From Twitter to Traffic Predictor: Next-Day Morning Traffic Prediction Using Social Media Data
Authors: Weiran Yao, Sean Qian
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)

The effectiveness of traditional traffic prediction methods is often extremely limited when forecasting traffic dynamics in early morning. The reason is that traffic can break down drastically during the early morning commute, and the time and duration of this break-down vary substantially from day to day. Early morning traffic forecast is crucial to inform morning-commute traffic management, but they are generally challenging to predict in advance, particularly by midnight. In this paper, we propose to mine Twitter messages as a probing method to understand the impacts of people's work and rest patterns in the evening/midnight of the previous day to the next-day morning traffic. The model is tested on freeway networks in Pittsburgh as experiments. The resulting relationship is surprisingly simple and powerful. We find that, in general, the earlier people rest as indicated from Tweets, the more congested roads will be in the next morning. The occurrence of big events in the evening before, represented by higher or lower tweet sentiment than normal, often implies lower travel demand in the next morning than normal days. Besides, people's tweeting activities in the night before and early morning are statistically associated with congestion in morning peak hours. We make use of such relationships to build a predictive framework which forecasts morning commute congestion using people's tweeting profiles extracted by 5 am or as late as the midnight prior to the morning. The Pittsburgh study supports that our framework can precisely predict morning congestion, particularly for some road segments upstream of roadway bottlenecks with large day-to-day congestion variation. Our approach considerably outperforms those existing methods without Twitter message features, and it can learn meaningful representation of demand from tweeting profiles that offer managerial insights.

[14]  arXiv:2009.13801 (cross-list from cs.LG) [pdf, other]
Title: Framework for Designing Filters of Spectral Graph Convolutional Neural Networks in the Context of Regularization Theory
Authors: Asif Salim, Sumitra S
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph convolutional neural networks (GCNNs) have been widely used in graph learning. It has been observed that the smoothness functional on graphs can be defined in terms of the graph Laplacian. This fact points out in the direction of using Laplacian in deriving regularization operators on graphs and its consequent use with spectral GCNN filter designs. In this work, we explore the regularization properties of graph Laplacian and proposed a generalized framework for regularized filter designs in spectral GCNNs. We found that the filters used in many state-of-the-art GCNNs can be derived as a special case of the framework we developed. We designed new filters that are associated with well-defined regularization behavior and tested their performance on semi-supervised node classification tasks. Their performance was found to be superior to that of the other state-of-the-art techniques.

[15]  arXiv:2009.13807 (cross-list from cs.LG) [pdf]
Title: Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Time series anomaly detection has been a perennially important topic in data science, with papers dating back to the 1950s. However, in recent years there has been an explosion of interest in this topic, much of it driven by the success of deep learning in other domains and for other time series tasks. Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim. The majority of the individual exemplars in these datasets suffer from one or more of four flaws. Because of these four flaws, we believe that many published comparisons of anomaly detection algorithms may be unreliable, and more importantly, much of the apparent progress in recent years may be illusionary. In addition to demonstrating these claims, with this paper we introduce the UCR Time Series Anomaly Datasets. We believe that this resource will perform a similar role as the UCR Time Series Classification Archive, by providing the community with a benchmark that allows meaningful comparisons between approaches and a meaningful gauge of overall progress.

[16]  arXiv:2009.13826 (cross-list from cs.LG) [pdf, other]
Title: EEMC: Embedding Enhanced Multi-tag Classification
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

The recently occurred representation learning make an attractive performance in NLP and complex network, it is becoming a fundamental technology in machine learning and data mining. How to use representation learning to improve the performance of classifiers is a very significance research direction. We using representation learning technology to map raw data(node of graph) to a low-dimensional feature space. In this space, each raw data obtained a lower dimensional vector representation, we do some simple linear operations for those vectors to produce some virtual data, using those vectors and virtual data to training multi-tag classifier. After that we measured the performance of classifier by F1 score(Macro% F1 and Micro% F1). Our method make Macro F1 rise from 28 % - 450% and make average F1 score rise from 12 % - 224%. By contrast, we trained the classifier directly with the lower dimensional vector, and measured the performance of classifiers. We validate our algorithm on three public data sets, we found that the virtual data helped the classifier greatly improve the F1 score. Therefore, our algorithm is a effective way to improve the performance of classifier. These result suggest that the virtual data generated by simple linear operation, in representation space, still retains the information of the raw data. It's also have great significance to the learning of small sample data sets.

[17]  arXiv:2009.13853 (cross-list from cs.LG) [pdf, other]
Title: Efficient SVDD Sampling with Approximation Guarantees for the Decision Boundary
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Support Vector Data Description (SVDD) is a popular one-class classifiers for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the training data on which SVDD trains a decision boundary hopefully equivalent to the one obtained on the full data set. According to the literature, a good sample should therefore contain so-called boundary observations that SVDD would select as support vectors on the full data set. However, non-boundary observations also are essential to not fragment contiguous inlier regions and avoid poor classification accuracy. Other aspects, such as selecting a sufficiently representative sample, are important as well. But existing sampling methods largely overlook them, resulting in poor classification accuracy. In this article, we study how to select a sample considering these points. Our approach is to frame SVDD sampling as an optimization problem, where constraints guarantee that sampling indeed approximates the original decision boundary. We then propose RAPID, an efficient algorithm to solve this optimization problem. RAPID does not require any tuning of parameters, is easy to implement and scales well to large data sets. We evaluate our approach on real-world and synthetic data. Our evaluation is the most comprehensive one for SVDD sampling so far. Our results show that RAPID outperforms its competitors in classification accuracy, in sample size, and in runtime.

[18]  arXiv:2009.13878 (cross-list from physics.chem-ph) [pdf, other]
Title: Physics-Constrained Predictive Molecular Latent Space Discovery with Graph Scattering Variational Autoencoder
Subjects: Chemical Physics (physics.chem-ph); Machine Learning (stat.ML)

Recent advances in artificial intelligence have propelled the development of innovative computational materials modeling and design techniques. In particular, generative deep learning models have been used for molecular representation, discovery and design with applications ranging from drug discovery to solar cell development. In this work, we assess the predictive capabilities of a molecular generative model developed based on variational inference and graph theory. The encoder network is based on the scattering transform, which allows for a better generalization of the model in the presence of limited training data. The scattering layers incorporate adaptive spectral filters which are tailored to the training dataset based on the molecular graphs' spectra. The decoding network is a one-shot graph generative model that conditions atom types on molecular topology. We present a quantitative assessment of the latent space in terms of its predictive ability for organic molecules in the QM9 dataset. To account for the limited size training data set, a Bayesian formalism is considered that allows us capturing the uncertainties in the predicted properties.

[19]  arXiv:2009.13891 (cross-list from cs.LG) [pdf, other]
Title: Towards Effective Context for Meta-Reinforcement Learning: an Approach based on Contrastive Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Context, the embedding of previous collected trajectories, is a powerful construct for Meta-Reinforcement Learning (Meta-RL) algorithms. By conditioning on an effective context, Meta-RL policies can easily generalize to new tasks within a few adaptation steps. We argue that improving the quality of context involves answering two questions: 1. How to train a compact and sufficient encoder that can embed the task-specific information contained in prior trajectories? 2. How to collect informative trajectories of which the corresponding context reflects the specification of tasks? To this end, we propose a novel Meta-RL framework called CCM (Contrastive learning augmented Context-based Meta-RL). We first focus on the contrastive nature behind different tasks and leverage it to train a compact and sufficient context encoder. Further, we train a separate exploration policy and theoretically derive a new information-gain-based objective which aims to collect informative trajectories in a few steps. Empirically, we evaluate our approaches on common benchmarks as well as several complex sparse-reward environments. The experimental results show that CCM outperforms state-of-the-art algorithms by addressing previously mentioned problems respectively.

[20]  arXiv:2009.13895 (cross-list from cs.LG) [pdf, other]
Title: Message Passing Neural Processes
Comments: 18 pages, 6 figures. The first two authors contributed equally
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Neural Processes (NPs) are powerful and flexible models able to incorporate uncertainty when representing stochastic processes, while maintaining a linear time complexity. However, NPs produce a latent description by aggregating independent representations of context points and lack the ability to exploit relational information present in many datasets. This renders NPs ineffective in settings where the stochastic process is primarily governed by neighbourhood rules, such as cellular automata (CA), and limits performance for any task where relational information remains unused. We address this shortcoming by introducing Message Passing Neural Processes (MPNPs), the first class of NPs that explicitly makes use of relational structure within the model. Our evaluation shows that MPNPs thrive at lower sampling rates, on existing benchmarks and newly-proposed CA and Cora-Branched tasks. We further report strong generalisation over density-based CA rule-sets and significant gains in challenging arbitrary-labelling and few-shot learning setups.

[21]  arXiv:2009.13939 (cross-list from cs.LG) [pdf, other]
Title: Tackling unsupervised multi-source domain adaptation with optimism and consistency
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

It has been known for a while that the problem of multi-source domain adaptation can be regarded as a single source domain adaptation task where the source domain corresponds to a mixture of the original source domains. Nonetheless, how to adjust the mixture distribution weights remains an open question. Moreover, most existing work on this topic focuses only on minimizing the error on the source domains and achieving domain-invariant representations, which is insufficient to ensure low error on the target domain. In this work, we present a novel framework that addresses both problems and beats the current state of the art by using a mildly optimistic objective function and consistency regularization on the target samples.

[22]  arXiv:2009.13962 (cross-list from cs.LG) [pdf, other]
Title: Think before you act: A simple baseline for compositional generalization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Contrarily to humans who have the ability to recombine familiar expressions to create novel ones, modern neural networks struggle to do so. This has been emphasized recently with the introduction of the benchmark dataset "gSCAN" (Ruis et al. 2020), aiming to evaluate models' performance at compositional generalization in grounded language understanding. In this work, we challenge the gSCAN benchmark by proposing a simple model that achieves surprisingly good performance on two of the gSCAN test splits. Our model is based on the observation that, to succeed on gSCAN tasks, the agent must (i) identify the target object (think) before (ii) navigating to it successfully (act). Concretely, we propose an attention-inspired modification of the baseline model from (Ruis et al. 2020), together with an auxiliary loss, that takes into account the sequential nature of steps (i) and (ii). While two compositional tasks are trivially solved with our approach, we also find that the other tasks remain unsolved, validating the relevance of gSCAN as a benchmark for evaluating models' compositional abilities.

[23]  arXiv:2009.13975 (cross-list from cs.LG) [pdf, other]
Title: Identification of Probability weighted ARX models with arbitrary domains
Comments: 6 pages, 3 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Hybrid system identification is a key tool to achieve reliable models of Cyber-Physical Systems from data. PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system. Still, PWA identification is a challenging problem, requiring the concurrent solution of regression and classification tasks. In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX), thus not restricted to polyhedral domains, and characterized by discontinuous maps. To this end, we propose a method based on a probabilistic mixture model, where the discrete state is represented through a multinomial distribution conditioned by the input regressors. The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field. To achieve nonlinear partitioning, we parametrize the discriminant function using a neural network. Then, the parameters of both the ARX submodels and the classifier are concurrently estimated by maximizing the likelihood of the overall model using Expectation Maximization. The proposed method is demonstrated on a nonlinear piece-wise problem with discontinuous maps.

[24]  arXiv:2009.13977 (cross-list from cs.LG) [pdf, other]
Title: What if Neural Networks had SVDs?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations. The algorithm increases the degree of parallelism of an underlying matrix multiplication $H\cdot X$ where $H$ is an orthogonal matrix represented by a product of Householder matrices. Code is available at www.github.com/AlexanderMath/fasth .

[25]  arXiv:2009.13982 (cross-list from cs.LG) [pdf, other]
Title: A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning
Comments: Accepted to The International Joint Conference on Neural Networks (IJCNN) 2020, to appear
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers). We assume the communication network between the workers is synchronized and can be modeled as a doubly-stochastic mixing matrix without having any master node. In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. Using alternating-direction-method-of-multipliers (ADMM) along with a layerwise convex optimization approach, we propose a decentralized learning algorithm which enjoys low computational complexity and communication cost among the workers. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place. Finally, we experimentally illustrate the time complexity and convergence behavior of the algorithm.

[26]  arXiv:2009.13987 (cross-list from cs.LG) [pdf, other]
Title: Geometric Disentanglement by Random Convex Polytopes
Comments: 21 pages, preprint
Subjects: Machine Learning (cs.LG); Metric Geometry (math.MG); Machine Learning (stat.ML)

Finding and analyzing meaningful representations of data is the purpose of machine learning. The idea of representation learning is to extract representations from the data itself, e.g., by utilizing deep neural networks. In this work, we examine representation learning from a geometric perspective. Especially, we focus on the convexity of classes and clusters as a natural and desirable representation property, for which robust and scalable measures are still lacking. To address this, we propose a new approach called Random Polytope Descriptor that allows a convex description of data points based on the construction of random convex polytopes. This ties in with current methods for statistical disentanglement. We demonstrate the use of our technique on well-known deep learning methods for representation learning. Specifically we find that popular regularization variants such as the Variational Autoencoder can destroy crucial information that is relevant for tasks such as out-of-distribution detection.

[27]  arXiv:2009.14024 (cross-list from cs.LG) [pdf, other]
Title: Realistic Image Normalization for Multi-Domain Segmentation
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Image normalization is a building block in medical image analysis. Conventional approaches are customarily utilized on a per-dataset basis. This strategy, however, prevents the current normalization algorithms from fully exploiting the complex joint information available across multiple datasets. Consequently, ignoring such joint information has a direct impact on the performance of segmentation algorithms. This paper proposes to revisit the conventional image normalization approach by instead learning a common normalizing function across multiple datasets. Jointly normalizing multiple datasets is shown to yield consistent normalized images as well as an improved image segmentation. To do so, a fully automated adversarial and task-driven normalization approach is employed as it facilitates the training of realistic and interpretable images while keeping performance on-par with the state-of-the-art. The adversarial training of our network aims at finding the optimal transfer function to improve both the segmentation accuracy and the generation of realistic images. We evaluated the performance of our normalizer on both infant and adult brains images from the iSEG, MRBrainS and ABIDE datasets. Results reveal the potential of our normalization approach for segmentation, with Dice improvements of up to 57.5% over our baseline. Our method can also enhance data availability by increasing the number of samples available when learning from multiple imaging domains.

[28]  arXiv:2009.14061 (cross-list from cs.LG) [pdf, other]
Title: GraphITE: Estimating Individual Effects of Graph-structured Treatments
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Outcome estimation of treatments for target individuals is an important foundation for decision making based on causal relations. Most existing outcome estimation methods deal with binary or multiple-choice treatments; however, in some applications, the number of treatments can be significantly large, while the treatments themselves have rich information. In this study, we considered one important instance of such cases: the outcome estimation problem of graph-structured treatments such as drugs. Owing to the large number of possible treatments, the counterfactual nature of observational data that appears in conventional treatment effect estimation becomes more of a concern for this problem. Our proposed method, GraphITE (pronounced "graphite") learns the representations of graph-structured treatments using graph neural networks while mitigating observation biases using Hilbert-Schmidt Independence Criterion regularization, which increases the independence of the representations of the targets and treatments. Experiments on two real-world datasets show that GraphITE outperforms baselines, especially in cases with a large number of treatments.

[29]  arXiv:2009.14073 (cross-list from cs.LG) [pdf, other]
Title: Estimation of Switched Markov Polynomial NARX models
Comments: 7 pages, 2 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This work targets the identification of a class of models for hybrid dynamical systems characterized by nonlinear autoregressive exogenous (NARX) components, with finite-dimensional polynomial expansions, and by a Markovian switching mechanism. The estimation of the model parameters is performed under a probabilistic framework via Expectation Maximization, including submodel coefficients, hidden state values and transition probabilities. Discrete mode classification and NARX regression tasks are disentangled within the iterations. Soft-labels are assigned to latent states on the trajectories by averaging over the state posteriors and updated using the parametrization obtained from the previous maximization phase. Then, NARXs parameters are repeatedly fitted by solving weighted regression subproblems through a cyclical coordinate descent approach with coordinate-wise minimization. Moreover, we investigate a two stage selection scheme, based on a l1-norm bridge estimation followed by hard-thresholding, to achieve parsimonious models through selection of the polynomial expansion. The proposed approach is demonstrated on a SMNARX problem composed by three nonlinear sub-models with specific regressors.

[30]  arXiv:2009.14075 (cross-list from cs.LG) [pdf, other]
Title: Fast Fréchet Inception Distance
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The Fr\'echet Inception Distance (FID) has been used to evaluate thousands of generative models. We present a novel algorithm, FastFID, which allows fast computation and backpropagation for FID. FastFID can efficiently (1) evaluate generative model *during* training and (2) construct adversarial examples for FID.

[31]  arXiv:2009.14096 (cross-list from cs.LG) [pdf, other]
Title: Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification
Authors: Min Qian, Yan-Fu Li
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

With the abundance of industrial datasets, imbalanced classification has become a common problem in several application domains. Oversampling is an effective method to solve imbalanced classification. One of the main challenges of existing oversampling methods is to accurately label the new synthetic samples. Inaccurate labels of synthetic samples would distort the distribution of the dataset and possibly worsen the classification performance. This paper introduces the idea of weakly supervised learning to handle the inaccurate labeling of synthetic samples by traditional oversampling methods. Graph semi-supervised SMOTE is developed to improve the credibility of the synthetic samples' labels. In addition, we propose cost-sensitive neighborhood components analysis for high dimensionality datasets and bootstrap based ensemble framework for high imbalance datasets. The proposed method has achieved good classification performance on 8 synthetic datasets and 3 real-world datasets, especially for high imbalance and high dimensionality problems. The average performances and robustness are better than the benchmark methods.

[32]  arXiv:2009.14108 (cross-list from cs.LG) [pdf, other]
Title: Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into sub-tasks. A step in the Q-function can be associated with solving a sub-task, where the expectation of the return increases. RUDDER has been introduced to identify these steps and then redistribute reward to them, thus immediately giving reward if sub-tasks are solved. Since the problem of delayed rewards is mitigated, learning is considerably sped up. However, for complex tasks, current exploration strategies as deployed in RUDDER struggle with discovering episodes with high rewards. Therefore, we assume that episodes with high rewards are given as demonstrations and do not have to be discovered by exploration. Typically the number of demonstrations is small and RUDDER's LSTM model as a deep learning method does not learn well. Hence, we introduce Align-RUDDER, which is RUDDER with two major modifications. First, Align-RUDDER assumes that episodes with high rewards are given as demonstrations, replacing RUDDER's safe exploration and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile model that is obtained from multiple sequence alignment of demonstrations. Profile models can be constructed from as few as two demonstrations as known from bioinformatics. Align-RUDDER inherits the concept of reward redistribution, which considerably reduces the delay of rewards, thus speeding up learning. Align-RUDDER outperforms competitors on complex artificial tasks with delayed reward and few demonstrations. On the MineCraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Github: https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-UY

[33]  arXiv:2009.14111 (cross-list from cs.LG) [pdf, other]
Title: Inverse Classification with Limited Budget and Maximum Number of Perturbed Samples
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Most recent machine learning research focuses on developing new classifiers for the sake of improving classification accuracy. With many well-performing state-of-the-art classifiers available, there is a growing need for understanding interpretability of a classifier necessitated by practical purposes such as to find the best diet recommendation for a diabetes patient. Inverse classification is a post modeling process to find changes in input features of samples to alter the initially predicted class. It is useful in many business applications to determine how to adjust a sample input data such that the classifier predicts it to be in a desired class. In real world applications, a budget on perturbations of samples corresponding to customers or patients is usually considered, and in this setting, the number of successfully perturbed samples is key to increase benefits. In this study, we propose a new framework to solve inverse classification that maximizes the number of perturbed samples subject to a per-feature-budget limits and favorable classification classes of the perturbed samples. We design algorithms to solve this optimization problem based on gradient methods, stochastic processes, Lagrangian relaxations, and the Gumbel trick. In experiments, we find that our algorithms based on stochastic processes exhibit an excellent performance in different budget settings and they scale well.

[34]  arXiv:2009.14131 (cross-list from stat.ME) [pdf, other]
Title: Dynamic sparsity on dynamic regression models
Comments: 31 pages, 5 figures
Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)

In the present work, we consider variable selection and shrinkage for the Gaussian dynamic linear regression within a Bayesian framework. In particular, we propose a novel method that allows for time-varying sparsity, based on an extension of spike-and-slab priors for dynamic models. This is done by assigning appropriate Markov switching priors for the time-varying coefficients' variances, extending the previous work of Ishwaran and Rao (2005). Furthermore, we investigate different priors, including the common Inverted gamma prior for the process variances, and other mixture prior distributions such as Gamma priors for both the spike and the slab, which leads to a mixture of Normal-Gammas priors (Griffin ad Brown, 2010) for the coefficients. In this sense, our prior can be view as a dynamic variable selection prior which induces either smoothness (through the slab) or shrinkage towards zero (through the spike) at each time point. The MCMC method used for posterior computation uses Markov latent variables that can assume binary regimes at each time point to generate the coefficients' variances. In that way, our model is a dynamic mixture model, thus, we could use the algorithm of Gerlach et al (2000) to generate the latent processes without conditioning on the states. Finally, our approach is exemplified through simulated examples and a real data application.

[35]  arXiv:2009.14133 (cross-list from cs.LG) [pdf, other]
Title: EEG to fMRI Synthesis: Is Deep Learning a candidate?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Advances on signal, image and video generation underly major breakthroughs on generative medical imaging tasks, including Brain Image Synthesis. Still, the extent to which functional Magnetic Ressonance Imaging (fMRI) can be mapped from the brain electrophysiology remains largely unexplored. This work provides the first comprehensive view on how to use state-of-the-art principles from Neural Processing to synthesize fMRI data from electroencephalographic (EEG) data. Given the distinct spatiotemporal nature of haemodynamic and electrophysiological signals, this problem is formulated as the task of learning a mapping function between multivariate time series with highly dissimilar structures. A comparison of state-of-the-art synthesis approaches, including Autoencoders, Generative Adversarial Networks and Pairwise Learning, is undertaken. Results highlight the feasibility of EEG to fMRI brain image mappings, pinpointing the role of current advances in Machine Learning and showing the relevance of upcoming contributions to further improve performance. EEG to fMRI synthesis offers a way to enhance and augment brain image data, and guarantee access to more affordable, portable and long-lasting protocols of brain activity monitoring. The code used in this manuscript is available in Github and the datasets are open source.

[36]  arXiv:2009.14136 (cross-list from q-fin.PM) [pdf, other]
Title: Time your hedge with Deep Reinforcement Learning
Comments: 9 pages, 8 figures
Subjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG); Machine Learning (stat.ML)

Can an asset manager plan the optimal timing for her/his hedging strategies given market conditions? The standard approach based on Markowitz or other more or less sophisticated financial rules aims to find the best portfolio allocation thanks to forecasted expected returns and risk but fails to fully relate market conditions to hedging strategies decision. In contrast, Deep Reinforcement Learning (DRL) can tackle this challenge by creating a dynamic dependency between market information and hedging strategies allocation decisions. In this paper, we present a realistic and augmented DRL framework that: (i) uses additional contextual information to decide an action, (ii) has a one period lag between observations and actions to account for one day lag turnover of common asset managers to rebalance their hedge, (iii) is fully tested in terms of stability and robustness thanks to a repetitive train test method called anchored walk forward training, similar in spirit to k fold cross validation for time series and (iv) allows managing leverage of our hedging strategy. Our experiment for an augmented asset manager interested in sizing and timing his hedges shows that our approach achieves superior returns and lower risk.

[37]  arXiv:2009.14138 (cross-list from cs.LG) [pdf, other]
Title: Selective Cascade of Residual ExtraTrees
Authors: Qimin Liu, Fang Liu
Comments: To appear in SN Computer Science
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a novel tree-based ensemble method named Selective Cascade of Residual ExtraTrees (SCORE). SCORE draws inspiration from representation learning, incorporates regularized regression with variable selection features, and utilizes boosting to improve prediction and reduce generalization errors. We also develop a variable importance measure to increase the explainability of SCORE. Our computer experiments show that SCORE provides comparable or superior performance in prediction against ExtraTrees, random forest, gradient boosting machine, and neural networks; and the proposed variable importance measure for SCORE is comparable to studied benchmark methods. Finally, the predictive performance of SCORE remains stable across hyper-parameter values, suggesting potential robustness to hyperparameter specification.

[38]  arXiv:2009.14148 (cross-list from cs.LG) [pdf, other]
Title: Unbalanced Sobolev Descent
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce Unbalanced Sobolev Descent (USD), a particle descent algorithm for transporting a high dimensional source distribution to a target distribution that does not necessarily have the same mass. We define the Sobolev-Fisher discrepancy between distributions and show that it relates to advection-reaction transport equations and the Wasserstein-Fisher-Rao metric between distributions. USD transports particles along gradient flows of the witness function of the Sobolev-Fisher discrepancy (advection step) and reweighs the mass of particles with respect to this witness function (reaction step). The reaction step can be thought of as a birth-death process of the particles with rate of growth proportional to the witness function. When the Sobolev-Fisher witness function is estimated in a Reproducing Kernel Hilbert Space (RKHS), under mild assumptions we show that USD converges asymptotically (in the limit of infinite particles) to the target distribution in the Maximum Mean Discrepancy (MMD) sense. We then give two methods to estimate the Sobolev-Fisher witness with neural networks, resulting in two Neural USD algorithms. The first one implements the reaction step with mirror descent on the weights, while the second implements it through a birth-death process of particles. We show on synthetic examples that USD transports distributions with or without conservation of mass faster than previous particle descent algorithms, and finally demonstrate its use for molecular biology analyses where our method is naturally suited to match developmental stages of populations of differentiating cells based on their single-cell RNA sequencing profile. Code is available at https://github.com/ibm/usd .

[39]  arXiv:2009.14168 (cross-list from cs.LG) [pdf, other]
Title: Self-Supervised Few-Shot Learning on Point Clouds
Comments: Accepted at NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The increased availability of massive point clouds coupled with their utility in a wide variety of applications such as robotics, shape synthesis, and self-driving cars has attracted increased attention from both industry and academia. Recently, deep neural networks operating on labeled point clouds have shown promising results on supervised learning tasks like classification and segmentation. However, supervised learning leads to the cumbersome task of annotating the point clouds. To combat this problem, we propose two novel self-supervised pre-training tasks that encode a hierarchical partitioning of the point clouds using a cover-tree, where point cloud subsets lie within balls of varying radii at each level of the cover-tree. Furthermore, our self-supervised learning network is restricted to pre-train on the support set (comprising of scarce training examples) used to train the downstream network in a few-shot learning (FSL) setting. Finally, the fully-trained self-supervised network's point embeddings are input to the downstream task's network. We present a comprehensive empirical evaluation of our method on both downstream classification and segmentation tasks and show that supervised methods pre-trained with our self-supervised learning method significantly improve the accuracy of state-of-the-art methods. Additionally, our method also outperforms previous unsupervised methods in downstream classification tasks.

[40]  arXiv:2009.14193 (cross-list from cs.CV) [pdf, other]
Title: Uncertainty Sets for Image Classifiers using Conformal Prediction
Comments: Codebase available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)

Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies any classifier to output a predictive set containing the true label with a user-specified probability, such as 90%. The algorithm is simple and fast like Platt scaling, but provides a formal finite-sample coverage guarantee for every model and dataset. Furthermore, our method generates much smaller predictive sets than alternative methods, since we introduce a regularizer to stabilize the small scores of unlikely classes after Platt scaling. In experiments on both Imagenet and Imagenet-V2 with a ResNet-152 and other classifiers, our scheme outperforms existing approaches, achieving exact coverage with sets that are often factors of 5 to 10 smaller.

Replacements for Wed, 30 Sep 20

[41]  arXiv:1906.10652 (replaced) [pdf, other]
Title: Monte Carlo Gradient Estimation in Machine Learning
Comments: 62 pages
Journal-ref: Journal of Machine Learning Research, 21(132):1-62, 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[42]  arXiv:1908.06852 (replaced) [pdf, other]
Title: SIRUS: Stable and Interpretable RUle Set
Authors: Clément Bénard (LPSM (UMR\_8001)), Gérard Biau (LPSM (UMR\_8001)), Sébastien da Veiga, Erwan Scornet (CMAP)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[43]  arXiv:1908.11133 (replaced) [pdf, ps, other]
Title: On the rate of convergence of fully connected very deep neural network regression estimates
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[44]  arXiv:2005.12123 (replaced) [pdf, other]
Title: Feature Robust Optimal Transport for High-dimensional Data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[45]  arXiv:2009.11713 (replaced) [pdf, ps, other]
Title: Online Structural Change-point Detection of High-dimensional Streaming Data via Dynamic Sparse Subspace Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[46]  arXiv:1805.08045 (replaced) [pdf, other]
Title: A universal framework for learning the elliptical mixture model
Comments: This work has been accepted to IEEE Transactions on Neural Networks and Learning Systems with DOI:10.1109/TNNLS.2020.3010198. The abstract link is this https URL
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[47]  arXiv:1811.01146 (replaced) [pdf, other]
Title: Closed-Loop Memory GAN for Continual Learning
Comments: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-2019). this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[48]  arXiv:1901.08544 (replaced) [pdf, other]
Title: Learning Space Partitions for Nearest Neighbor Search
Comments: ICLR 2020
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[49]  arXiv:1901.09997 (replaced) [pdf, other]
Title: Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample
Comments: 49 pages, 23 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[50]  arXiv:1906.01115 (replaced) [pdf, ps, other]
Title: Convergence Rate of $\mathcal{O}(1/k)$ for Optimistic Gradient and Extra-gradient Methods in Smooth Convex-Concave Saddle Point Problems
Comments: 19 pages
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[51]  arXiv:1906.07549 (replaced) [pdf]
Title: An Attention-Guided Deep Regression Model for Landmark Detection in Cephalograms
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[52]  arXiv:1909.11285 (replaced) [pdf, other]
Title: A Dictionary Approach to Domain-Invariant Learning in Deep Networks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[53]  arXiv:1909.11373 (replaced) [pdf, other]
Title: Multi-task Batch Reinforcement Learning with Metric Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[54]  arXiv:1909.13403 (replaced) [pdf, other]
Title: Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Comments: Published in IMC 2020. 20 pages, 26 figures
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)
[55]  arXiv:1910.00482 (replaced) [pdf, other]
Title: Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[56]  arXiv:1910.03201 (replaced) [pdf, other]
Title: Differentiable Sparsification for Deep Neural Networks
Authors: Yognjin Lee
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[57]  arXiv:2001.08322 (replaced) [pdf, other]
Title: FsNet: Feature Selection Network on High-dimensional Biological Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[58]  arXiv:2001.11818 (replaced) [pdf, other]
Title: Community Detection in Bipartite Networks with Stochastic Blockmodels
Comments: 17 pages, 6 figures. Code is available at this https URL and a documentation at this https URL
Journal-ref: Phys. Rev. E 102, 032309 (2020)
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[59]  arXiv:2002.01650 (replaced) [pdf, other]
Title: Concept Whitening for Interpretable Image Recognition
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[60]  arXiv:2003.10392 (replaced) [pdf, other]
Title: Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[61]  arXiv:2003.12169 (replaced) [pdf, other]
Title: A Collective Learning Framework to Boost GNN Expressiveness
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[62]  arXiv:2004.10696 (replaced) [pdf, other]
Title: Spectrally Consistent UNet for High Fidelity Image Transformations
Subjects: Image and Video Processing (eess.IV); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[63]  arXiv:2005.02505 (replaced) [pdf, other]
Title: A generative adversarial network approach to calibration of local stochastic volatility models
Comments: Replacement for previous version: Major update of previous version to match the content of the published version
Journal-ref: Risks 2020, 8, 101
Subjects: Computational Finance (q-fin.CP); Optimization and Control (math.OC); Machine Learning (stat.ML)
[64]  arXiv:2005.08898 (replaced) [pdf, ps, other]
Title: Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
[65]  arXiv:2005.12844 (replaced) [pdf, other]
Title: Approximation Schemes for ReLU Regression
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[66]  arXiv:2007.01850 (replaced) [pdf, other]
Title: Variational Autoencoders for Anomalous Jet Tagging
Comments: 32 pages, 20 figures. References added; typos/errors corrected; further editting performed
Subjects: High Energy Physics - Phenomenology (hep-ph); High Energy Physics - Experiment (hep-ex); Machine Learning (stat.ML)
[67]  arXiv:2007.01995 (replaced) [pdf, other]
Title: Bidirectional Model-based Policy Optimization
Comments: Accepted at ICML2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[68]  arXiv:2008.03072 (replaced) [pdf, ps, other]
Title: Optimizing Information Loss Towards Robust Neural Networks
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[69]  arXiv:2008.08755 (replaced) [pdf, other]
Title: On $\ell_p$-norm Robustness of Ensemble Stumps and Trees
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[70]  arXiv:2008.09763 (replaced) [pdf, other]
Title: Variational Autoencoder for Anti-Cancer Drug Response Prediction
Comments: 9 pages
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (stat.ML)
[71]  arXiv:2008.13064 (replaced) [pdf, other]
Title: Towards Demystifying Dimensions of Source Code Embeddings
Comments: 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, Co-located with ESEC/FSE (RL+SE&PL'20)
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE); Machine Learning (stat.ML)
[72]  arXiv:2009.02296 (replaced) [pdf, other]
Title: Variational Deep Learning for the Identification and Reconstruction of Chaotic and Stochastic Dynamical Systems from Noisy and Partial Observations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[73]  arXiv:2009.07547 (replaced) [pdf, other]
Title: Grassmannian diffusion maps based dimension reduction and classification for high-dimensional data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[74]  arXiv:2009.09590 (replaced) [pdf, other]
Title: Deep Clustering and Representation Learning that Preserves Geometric Structures
Comments: 20 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[75]  arXiv:2009.09991 (replaced) [pdf, other]
Title: Modeling Text with Decision Forests using Categorical-Set Splits
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[76]  arXiv:2009.12981 (replaced) [pdf, other]
Title: Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[77]  arXiv:2009.12991 (replaced) [pdf, other]
Title: Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
Comments: This paper is accepted by NeurIPS 2020. The code is available on GitHub: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[ total of 77 entries: 1-77 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2009, contact, help  (Access key information)