Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Wed, 30 Sep 20
 [1] arXiv:2009.13831 [pdf, other]

Title: Testing for Normality with Neural NetworksAuthors: Milos SimicComments: 50 pages, 10 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
In this paper, we treat the problem of testing for normality as a binary classification problem and construct a feedforward neural network that can successfully detect normal distributions by inspecting small samples from them. The numerical experiments conducted on small samples with no more than 100 elements indicated that the neural network which we trained was more accurate and far more powerful than the most frequently used and most powerful standard tests of normality: ShapiroWilk, AndersonDarling, Lilliefors and JarqueBerra, as well as the kernel tests of goodnessoffit. The neural network had the AUROC score of almost 1, which corresponds to the perfect binary classifier. Additionally, the network's accuracy was higher than 96% on a set of larger samples with 2501000 elements. Since the normality of data is an assumption of numerous techniques for analysis and inference, the neural network constructed in this study has a very high potential for use in everyday practice of statistics, data analysis and machine learning in both science and industry.
 [2] arXiv:2009.13881 [pdf, ps, other]

Title: Lipschitz neural networks are dense in the set of all Lipschitz functionsAuthors: Stephan EcksteinComments: 7 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)
This note shows that, for a fixed Lipschitz constant $L > 0$, one layer neural networks that are $L$Lipschitz are dense in the set of all $L$Lipschitz functions with respect to the uniform norm on bounded sets.
 [3] arXiv:2009.13961 [pdf, ps, other]

Title: Online Action Learning in High Dimensions: A New Exploration Rule for Contextual $ε_t$Greedy HeuristicsComments: 43 pages, 9 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)
Bandit problems are pervasive in various fields of research and are also present in several practical applications. Examples, including dynamic pricing and assortment and the design of auctions and incentives, permeate a large number of sequential treatment experiments. Different applications impose distinct levels of restrictions on viable actions. Some favor diversity of outcomes, while others require harmful actions to be closely monitored or mainly avoided. In this paper, we extend one of the most popular bandit solutions, the original $\epsilon_t$greedy heuristics, to highdimensional contexts. Moreover, we introduce a competing exploration mechanism that counts with searching sets based on order statistics. We view our proposals as alternatives for cases where pluralism is valued or, in the opposite direction, cases where the enduser should carefully tune the range of exploration of new actions. We find reasonable bounds for the cumulative regret of a decaying $\epsilon_t$greedy heuristic in both cases and we provide an upper bound for the initialization phase that implies the regret bounds when order statistics are considered to be at most equal but mostly better than the case when random searching is the sole exploration mechanism. Additionally, we show that endusers have sufficient flexibility to avoid harmful actions since any cardinality for the higherorder statistics can be used to achieve an stricter upper bound. In a simulation exercise, we show that the algorithms proposed in this paper outperform simple and adapted counterparts.
Crosslists for Wed, 30 Sep 20
 [4] arXiv:2009.13562 (crosslist from cs.LG) [pdf, other]

Title: STRATA: Building Robustness with a Simple Method for Generating Blackbox Adversarial Attacks for Models of CodeComments: 13 pages, 3 figures, 10 tablesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Adversarial examples are imperceptible perturbations in the input to a neural model that result in misclassification. Generating adversarial examples for source code poses an additional challenge compared to the domains of images and natural language, because source code perturbations must adhere to strict semantic guidelines so the resulting programs retain the functional meaning of the code. We propose a simple and efficient blackbox method for generating stateoftheart adversarial examples on models of code. Our method generates untargeted and targeted attacks, and empirically outperforms competing gradientbased methods with less information and less computational effort. We also use adversarial training to construct a model robust to these attacks; our attack reduces the F1 score of code2seq by 42%. Adversarial training brings the F1 score on adversarial examples up to 99% of baseline.
 [5] arXiv:2009.13566 (crosslist from cs.LG) [pdf, other]

Title: Graph Neural Networks with HeterophilySubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Graph Neural Networks (GNNs) have proven to be useful for many different practical applications. However, most existing GNN models have an implicit assumption of homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily. In this work, we propose a novel framework called CPGNN that generalizes GNNs for graphs with either homophily or heterophily. The proposed framework incorporates an interpretable compatibility matrix for modeling the heterophily or homophily level in the graph, which can be learned in an endtoend fashion, enabling it to go beyond the assumption of strong homophily. Theoretically, we show that replacing the compatibility matrix in our framework with the identity (which represents pure homophily) reduces to GCN. Our extensive experiments demonstrate the effectiveness of our approach in more realistic and challenging experimental settings with significantly less training data compared to previous works: CPGNN variants achieve stateoftheart results in heterophily settings with or without contextual node features, while maintaining comparable performance in homophily settings.
 [6] arXiv:2009.13579 (crosslist from cs.LG) [pdf, other]

Title: Novelty Search in representational space for sample efficient explorationComments: 9 pages + references + appendix. Oral presentation at NeurIPS 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We present a new approach for efficient exploration which leverages a lowdimensional encoding of the environment learned with a combination of modelbased and modelfree objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sampleefficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sampleefficient compared to strong baselines.
 [7] arXiv:2009.13586 (crosslist from cs.LG) [pdf, other]

Title: Apollo: An Adaptive Parameterwise Diagonal QuasiNewton Method for Nonconvex Stochastic OptimizationAuthors: Xuezhe MaComments: First version of preprint. 15 pages (plus appendix), 4 figures, 4 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we introduce Apollo, a quasiNewton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix. Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive firstorder optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positivedefinite. Experiments on three tasks of CV and NLP show that Apollo achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in term of both convergence speed and generalization performance. The implementation of the algorithm is available at https://github.com/XuezheMax/apollo.
 [8] arXiv:2009.13598 (crosslist from cs.LG) [pdf, ps, other]

Title: Anomaly Detection and Sampling Cost Control via Hierarchical GANsComments: 6 pages, 7 figures, has been accepted by Globecom 2020Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
Anomaly detection incurs certain sampling and sensing costs and therefore it is of great importance to strike a balance between the detection accuracy and these costs. In this work, we study anomaly detection by considering the detection of threshold crossings in a stochastic time series without the knowledge of its statistics. To reduce the sampling cost in this detection process, we propose the use of hierarchical generative adversarial networks (GANs) to perform nonuniform sampling. In order to improve the detection accuracy and reduce the delay in detection, we introduce a buffer zone in the operation of the proposed GANbased detector. In the experiments, we analyze the performance of the proposed hierarchical GAN detector considering the metrics of detection delay, miss rates, average cost of error, and sampling ratio. We identify the tradeoffs in the performance as the buffer zone sizes and the number of GAN levels in the hierarchy vary. We also compare the performance with that of a sampling policy that approximately minimizes the sum of average costs of sampling and error given the parameters of the stochastic process. We demonstrate that the proposed GANbased detector can have significant performance improvements in terms of detection delay and average cost of error with a larger buffer zone but at the cost of increased sampling rates.
 [9] arXiv:2009.13697 (crosslist from cs.LG) [pdf, ps, other]

Title: A Fast Graph Neural NetworkBased Method for Winner Determination in MultiUnit Combinatorial AuctionsSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
The combinatorial auction (CA) is an efficient mechanism for resource allocation in different fields, including cloud computing. It can obtain high economic efficiency and user flexibility by allowing bidders to submit bids for combinations of different items instead of only for individual items. However, the problem of allocating items among the bidders to maximize the auctioneers" revenue, i.e., the winner determination problem (WDP), is NPcomplete to solve and inapproximable. Existing works for WDPs are generally based on mathematical optimization techniques and most of them focus on the singleunit WDP, where each item only has one unit. On the contrary, few works consider the multiunit WDP in which each item may have multiple units. Given that the multiunit WDP is more complicated but prevalent in cloud computing, we propose leveraging machine learning (ML) techniques to develop a novel lowcomplexity algorithm for solving this problem with negligible revenue loss. Specifically, we model the multiunit WDP as an augmented bipartite biditem graph and use a graph neural network (GNN) with halfconvolution operations to learn the probability of each bid belonging to the optimal allocation. To improve the sample generation efficiency and decrease the number of needed labeled instances, we propose two different sample generation processes. We also develop two novel graphbased postprocessing algorithms to transform the outputs of the GNN into feasible solutions. Through simulations on both synthetic instances and a specific virtual machine (VM) allocation problem in a cloud computing platform, we validate that our proposed method can approach optimal performance with low complexity and has good generalization ability in terms of problem size and usertype distribution.
 [10] arXiv:2009.13714 (crosslist from cs.LG) [pdf, other]

Title: Learned FineTuner for Incongruous FewShot LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Modelagnostic metalearning (MAML) effectively metalearns an initialization of model parameters for fewshot learning where all learning problems share the same format of model parameters  congruous metalearning. We extend MAML to incongruous metalearning where different yet related fewshot learning problems may not share any model parameters. In this setup, we propose the use of a Learned Fine Tuner (LFT) to replace handdesigned optimizers (such as SGD) for the taskspecific finetuning. The metalearned initialization in MAML is replaced by learned optimizers based on the learningtooptimize (L2O) framework to metalearn across incongruous tasks such that models finetuned with LFT (even from random initializations) adapt quickly to new tasks. The introduction of LFT within MAML (i) offers the capability to tackle fewshot learning tasks by metalearning across incongruous yet related problems (e.g., classification over images of different sizes and model architectures), and (ii) can {efficiently} work with firstorder and derivativefree fewshot learning problems. Theoretically, we quantify the difference between LFT (for MAML) and L2O. Empirically, we demonstrate the effectiveness of LFT through both synthetic and real problems and a novel application of generating universal adversarial attacks across different image sources in the fewshot learning regime.
 [11] arXiv:2009.13734 (crosslist from cs.LG) [pdf, other]

Title: New GCNNBased Architecture for SemiSupervised Node ClassificationSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
The nodes of a graph existing in a specific cluster are more likely to connect to each other than with other nodes in the graph. Then revealing some information about the nodes, the structure of the graph (the graph edges) provides this opportunity to know more information about the other nodes. From this perspective, this paper revisits the node classification task in a semisupervised scenario by graph convolutional neural network. The goal is to benefit from the flow of information that circulates around the revealed node labels. For this aim, this paper provides a new graph convolutional neural network architecture. This architecture benefits efficiently from the revealed training nodes, the node features, and the graph structure. On the other hand, in many applications, nongraph observations (side information) exist beside a given graph realization. The nongraph observations are usually independent of the graph structure. This paper shows that the proposed architecture is also powerful in combining a graph realization and independent nongraph observations. For both cases, the experiments on the synthetic and realworld datasets demonstrate that our proposed architecture achieves a higher prediction accuracy in comparison to the existing stateoftheart methods for the node classification task.
 [12] arXiv:2009.13736 (crosslist from cs.LG) [pdf, other]

Title: Lucid Dreaming for Experience Replay: Refreshing Past States with the Current PolicyComments: 20 pages (with appendices), 5 figures, preprintSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Experience replay (ER) improves the data efficiency of offpolicy reinforcement learning (RL) algorithms by allowing an agent to store and reuse its past experiences in a replay buffer. While many techniques have been proposed to enhance ER by biasing how experiences are sampled from the buffer, thus far they have not considered strategies for refreshing experiences inside the buffer. In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy. LiDER 1) moves an agent back to a past state; 2) lets the agent try following its current policy to execute different actionsas if the agent were "dreaming" about the past, but is aware of the situation and can control the dream to encounter new experiences; and 3) stores and reuses the new experience if it turned out better than what the agent previously experienced, i.e., to refresh its memories. LiDER is designed to be easily incorporated into offpolicy, multiworker RL algorithms that use ER; we present in this work a case study of applying LiDER to an actorcritic based algorithm. Results show LiDER consistently improves performance over the baseline in four Atari 2600 games. Our opensource implementation of LiDER and the data used to generate all plots in this paper are available at github.com/duyunshu/luciddreamingforexpreplay.
 [13] arXiv:2009.13794 (crosslist from cs.SI) [pdf, other]

Title: From Twitter to Traffic Predictor: NextDay Morning Traffic Prediction Using Social Media DataSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)
The effectiveness of traditional traffic prediction methods is often extremely limited when forecasting traffic dynamics in early morning. The reason is that traffic can break down drastically during the early morning commute, and the time and duration of this breakdown vary substantially from day to day. Early morning traffic forecast is crucial to inform morningcommute traffic management, but they are generally challenging to predict in advance, particularly by midnight. In this paper, we propose to mine Twitter messages as a probing method to understand the impacts of people's work and rest patterns in the evening/midnight of the previous day to the nextday morning traffic. The model is tested on freeway networks in Pittsburgh as experiments. The resulting relationship is surprisingly simple and powerful. We find that, in general, the earlier people rest as indicated from Tweets, the more congested roads will be in the next morning. The occurrence of big events in the evening before, represented by higher or lower tweet sentiment than normal, often implies lower travel demand in the next morning than normal days. Besides, people's tweeting activities in the night before and early morning are statistically associated with congestion in morning peak hours. We make use of such relationships to build a predictive framework which forecasts morning commute congestion using people's tweeting profiles extracted by 5 am or as late as the midnight prior to the morning. The Pittsburgh study supports that our framework can precisely predict morning congestion, particularly for some road segments upstream of roadway bottlenecks with large daytoday congestion variation. Our approach considerably outperforms those existing methods without Twitter message features, and it can learn meaningful representation of demand from tweeting profiles that offer managerial insights.
 [14] arXiv:2009.13801 (crosslist from cs.LG) [pdf, other]

Title: Framework for Designing Filters of Spectral Graph Convolutional Neural Networks in the Context of Regularization TheorySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Graph convolutional neural networks (GCNNs) have been widely used in graph learning. It has been observed that the smoothness functional on graphs can be defined in terms of the graph Laplacian. This fact points out in the direction of using Laplacian in deriving regularization operators on graphs and its consequent use with spectral GCNN filter designs. In this work, we explore the regularization properties of graph Laplacian and proposed a generalized framework for regularized filter designs in spectral GCNNs. We found that the filters used in many stateoftheart GCNNs can be derived as a special case of the framework we developed. We designed new filters that are associated with welldefined regularization behavior and tested their performance on semisupervised node classification tasks. Their performance was found to be superior to that of the other stateoftheart techniques.
 [15] arXiv:2009.13807 (crosslist from cs.LG) [pdf]

Title: Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of ProgressSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Time series anomaly detection has been a perennially important topic in data science, with papers dating back to the 1950s. However, in recent years there has been an explosion of interest in this topic, much of it driven by the success of deep learning in other domains and for other time series tasks. Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim. The majority of the individual exemplars in these datasets suffer from one or more of four flaws. Because of these four flaws, we believe that many published comparisons of anomaly detection algorithms may be unreliable, and more importantly, much of the apparent progress in recent years may be illusionary. In addition to demonstrating these claims, with this paper we introduce the UCR Time Series Anomaly Datasets. We believe that this resource will perform a similar role as the UCR Time Series Classification Archive, by providing the community with a benchmark that allows meaningful comparisons between approaches and a meaningful gauge of overall progress.
 [16] arXiv:2009.13826 (crosslist from cs.LG) [pdf, other]

Title: EEMC: Embedding Enhanced Multitag ClassificationSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
The recently occurred representation learning make an attractive performance in NLP and complex network, it is becoming a fundamental technology in machine learning and data mining. How to use representation learning to improve the performance of classifiers is a very significance research direction. We using representation learning technology to map raw data(node of graph) to a lowdimensional feature space. In this space, each raw data obtained a lower dimensional vector representation, we do some simple linear operations for those vectors to produce some virtual data, using those vectors and virtual data to training multitag classifier. After that we measured the performance of classifier by F1 score(Macro% F1 and Micro% F1). Our method make Macro F1 rise from 28 %  450% and make average F1 score rise from 12 %  224%. By contrast, we trained the classifier directly with the lower dimensional vector, and measured the performance of classifiers. We validate our algorithm on three public data sets, we found that the virtual data helped the classifier greatly improve the F1 score. Therefore, our algorithm is a effective way to improve the performance of classifier. These result suggest that the virtual data generated by simple linear operation, in representation space, still retains the information of the raw data. It's also have great significance to the learning of small sample data sets.
 [17] arXiv:2009.13853 (crosslist from cs.LG) [pdf, other]

Title: Efficient SVDD Sampling with Approximation Guarantees for the Decision BoundarySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Support Vector Data Description (SVDD) is a popular oneclass classifiers for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the training data on which SVDD trains a decision boundary hopefully equivalent to the one obtained on the full data set. According to the literature, a good sample should therefore contain socalled boundary observations that SVDD would select as support vectors on the full data set. However, nonboundary observations also are essential to not fragment contiguous inlier regions and avoid poor classification accuracy. Other aspects, such as selecting a sufficiently representative sample, are important as well. But existing sampling methods largely overlook them, resulting in poor classification accuracy. In this article, we study how to select a sample considering these points. Our approach is to frame SVDD sampling as an optimization problem, where constraints guarantee that sampling indeed approximates the original decision boundary. We then propose RAPID, an efficient algorithm to solve this optimization problem. RAPID does not require any tuning of parameters, is easy to implement and scales well to large data sets. We evaluate our approach on realworld and synthetic data. Our evaluation is the most comprehensive one for SVDD sampling so far. Our results show that RAPID outperforms its competitors in classification accuracy, in sample size, and in runtime.
 [18] arXiv:2009.13878 (crosslist from physics.chemph) [pdf, other]

Title: PhysicsConstrained Predictive Molecular Latent Space Discovery with Graph Scattering Variational AutoencoderSubjects: Chemical Physics (physics.chemph); Machine Learning (stat.ML)
Recent advances in artificial intelligence have propelled the development of innovative computational materials modeling and design techniques. In particular, generative deep learning models have been used for molecular representation, discovery and design with applications ranging from drug discovery to solar cell development. In this work, we assess the predictive capabilities of a molecular generative model developed based on variational inference and graph theory. The encoder network is based on the scattering transform, which allows for a better generalization of the model in the presence of limited training data. The scattering layers incorporate adaptive spectral filters which are tailored to the training dataset based on the molecular graphs' spectra. The decoding network is a oneshot graph generative model that conditions atom types on molecular topology. We present a quantitative assessment of the latent space in terms of its predictive ability for organic molecules in the QM9 dataset. To account for the limited size training data set, a Bayesian formalism is considered that allows us capturing the uncertainties in the predicted properties.
 [19] arXiv:2009.13891 (crosslist from cs.LG) [pdf, other]

Title: Towards Effective Context for MetaReinforcement Learning: an Approach based on Contrastive LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Context, the embedding of previous collected trajectories, is a powerful construct for MetaReinforcement Learning (MetaRL) algorithms. By conditioning on an effective context, MetaRL policies can easily generalize to new tasks within a few adaptation steps. We argue that improving the quality of context involves answering two questions: 1. How to train a compact and sufficient encoder that can embed the taskspecific information contained in prior trajectories? 2. How to collect informative trajectories of which the corresponding context reflects the specification of tasks? To this end, we propose a novel MetaRL framework called CCM (Contrastive learning augmented Contextbased MetaRL). We first focus on the contrastive nature behind different tasks and leverage it to train a compact and sufficient context encoder. Further, we train a separate exploration policy and theoretically derive a new informationgainbased objective which aims to collect informative trajectories in a few steps. Empirically, we evaluate our approaches on common benchmarks as well as several complex sparsereward environments. The experimental results show that CCM outperforms stateoftheart algorithms by addressing previously mentioned problems respectively.
 [20] arXiv:2009.13895 (crosslist from cs.LG) [pdf, other]

Title: Message Passing Neural ProcessesComments: 18 pages, 6 figures. The first two authors contributed equallySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Neural Processes (NPs) are powerful and flexible models able to incorporate uncertainty when representing stochastic processes, while maintaining a linear time complexity. However, NPs produce a latent description by aggregating independent representations of context points and lack the ability to exploit relational information present in many datasets. This renders NPs ineffective in settings where the stochastic process is primarily governed by neighbourhood rules, such as cellular automata (CA), and limits performance for any task where relational information remains unused. We address this shortcoming by introducing Message Passing Neural Processes (MPNPs), the first class of NPs that explicitly makes use of relational structure within the model. Our evaluation shows that MPNPs thrive at lower sampling rates, on existing benchmarks and newlyproposed CA and CoraBranched tasks. We further report strong generalisation over densitybased CA rulesets and significant gains in challenging arbitrarylabelling and fewshot learning setups.
 [21] arXiv:2009.13939 (crosslist from cs.LG) [pdf, other]

Title: Tackling unsupervised multisource domain adaptation with optimism and consistencySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
It has been known for a while that the problem of multisource domain adaptation can be regarded as a single source domain adaptation task where the source domain corresponds to a mixture of the original source domains. Nonetheless, how to adjust the mixture distribution weights remains an open question. Moreover, most existing work on this topic focuses only on minimizing the error on the source domains and achieving domaininvariant representations, which is insufficient to ensure low error on the target domain. In this work, we present a novel framework that addresses both problems and beats the current state of the art by using a mildly optimistic objective function and consistency regularization on the target samples.
 [22] arXiv:2009.13962 (crosslist from cs.LG) [pdf, other]

Title: Think before you act: A simple baseline for compositional generalizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Contrarily to humans who have the ability to recombine familiar expressions to create novel ones, modern neural networks struggle to do so. This has been emphasized recently with the introduction of the benchmark dataset "gSCAN" (Ruis et al. 2020), aiming to evaluate models' performance at compositional generalization in grounded language understanding. In this work, we challenge the gSCAN benchmark by proposing a simple model that achieves surprisingly good performance on two of the gSCAN test splits. Our model is based on the observation that, to succeed on gSCAN tasks, the agent must (i) identify the target object (think) before (ii) navigating to it successfully (act). Concretely, we propose an attentioninspired modification of the baseline model from (Ruis et al. 2020), together with an auxiliary loss, that takes into account the sequential nature of steps (i) and (ii). While two compositional tasks are trivially solved with our approach, we also find that the other tasks remain unsolved, validating the relevance of gSCAN as a benchmark for evaluating models' compositional abilities.
 [23] arXiv:2009.13975 (crosslist from cs.LG) [pdf, other]

Title: Identification of Probability weighted ARX models with arbitrary domainsComments: 6 pages, 3 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Hybrid system identification is a key tool to achieve reliable models of CyberPhysical Systems from data. PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system. Still, PWA identification is a challenging problem, requiring the concurrent solution of regression and classification tasks. In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX), thus not restricted to polyhedral domains, and characterized by discontinuous maps. To this end, we propose a method based on a probabilistic mixture model, where the discrete state is represented through a multinomial distribution conditioned by the input regressors. The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field. To achieve nonlinear partitioning, we parametrize the discriminant function using a neural network. Then, the parameters of both the ARX submodels and the classifier are concurrently estimated by maximizing the likelihood of the overall model using Expectation Maximization. The proposed method is demonstrated on a nonlinear piecewise problem with discontinuous maps.
 [24] arXiv:2009.13977 (crosslist from cs.LG) [pdf, other]

Title: What if Neural Networks had SVDs?Authors: Alexander Mathiasen, Frederik Hvilshøj, Jakob Rødsgaard Jørgensen, Anshul Nasery, Davide MottinSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Various Neural Networks employ timeconsuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations. The algorithm increases the degree of parallelism of an underlying matrix multiplication $H\cdot X$ where $H$ is an orthogonal matrix represented by a product of Householder matrices. Code is available at www.github.com/AlexanderMath/fasth .
 [25] arXiv:2009.13982 (crosslist from cs.LG) [pdf, other]

Title: A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layerwise LearningComments: Accepted to The International Joint Conference on Neural Networks (IJCNN) 2020, to appearSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers). We assume the communication network between the workers is synchronized and can be modeled as a doublystochastic mixing matrix without having any master node. In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. Using alternatingdirectionmethodofmultipliers (ADMM) along with a layerwise convex optimization approach, we propose a decentralized learning algorithm which enjoys low computational complexity and communication cost among the workers. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place. Finally, we experimentally illustrate the time complexity and convergence behavior of the algorithm.
 [26] arXiv:2009.13987 (crosslist from cs.LG) [pdf, other]

Title: Geometric Disentanglement by Random Convex PolytopesComments: 21 pages, preprintSubjects: Machine Learning (cs.LG); Metric Geometry (math.MG); Machine Learning (stat.ML)
Finding and analyzing meaningful representations of data is the purpose of machine learning. The idea of representation learning is to extract representations from the data itself, e.g., by utilizing deep neural networks. In this work, we examine representation learning from a geometric perspective. Especially, we focus on the convexity of classes and clusters as a natural and desirable representation property, for which robust and scalable measures are still lacking. To address this, we propose a new approach called Random Polytope Descriptor that allows a convex description of data points based on the construction of random convex polytopes. This ties in with current methods for statistical disentanglement. We demonstrate the use of our technique on wellknown deep learning methods for representation learning. Specifically we find that popular regularization variants such as the Variational Autoencoder can destroy crucial information that is relevant for tasks such as outofdistribution detection.
 [27] arXiv:2009.14024 (crosslist from cs.LG) [pdf, other]

Title: Realistic Image Normalization for MultiDomain SegmentationSubjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Image normalization is a building block in medical image analysis. Conventional approaches are customarily utilized on a perdataset basis. This strategy, however, prevents the current normalization algorithms from fully exploiting the complex joint information available across multiple datasets. Consequently, ignoring such joint information has a direct impact on the performance of segmentation algorithms. This paper proposes to revisit the conventional image normalization approach by instead learning a common normalizing function across multiple datasets. Jointly normalizing multiple datasets is shown to yield consistent normalized images as well as an improved image segmentation. To do so, a fully automated adversarial and taskdriven normalization approach is employed as it facilitates the training of realistic and interpretable images while keeping performance onpar with the stateoftheart. The adversarial training of our network aims at finding the optimal transfer function to improve both the segmentation accuracy and the generation of realistic images. We evaluated the performance of our normalizer on both infant and adult brains images from the iSEG, MRBrainS and ABIDE datasets. Results reveal the potential of our normalization approach for segmentation, with Dice improvements of up to 57.5% over our baseline. Our method can also enhance data availability by increasing the number of samples available when learning from multiple imaging domains.
 [28] arXiv:2009.14061 (crosslist from cs.LG) [pdf, other]

Title: GraphITE: Estimating Individual Effects of Graphstructured TreatmentsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Outcome estimation of treatments for target individuals is an important foundation for decision making based on causal relations. Most existing outcome estimation methods deal with binary or multiplechoice treatments; however, in some applications, the number of treatments can be significantly large, while the treatments themselves have rich information. In this study, we considered one important instance of such cases: the outcome estimation problem of graphstructured treatments such as drugs. Owing to the large number of possible treatments, the counterfactual nature of observational data that appears in conventional treatment effect estimation becomes more of a concern for this problem. Our proposed method, GraphITE (pronounced "graphite") learns the representations of graphstructured treatments using graph neural networks while mitigating observation biases using HilbertSchmidt Independence Criterion regularization, which increases the independence of the representations of the targets and treatments. Experiments on two realworld datasets show that GraphITE outperforms baselines, especially in cases with a large number of treatments.
 [29] arXiv:2009.14073 (crosslist from cs.LG) [pdf, other]

Title: Estimation of Switched Markov Polynomial NARX modelsComments: 7 pages, 2 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This work targets the identification of a class of models for hybrid dynamical systems characterized by nonlinear autoregressive exogenous (NARX) components, with finitedimensional polynomial expansions, and by a Markovian switching mechanism. The estimation of the model parameters is performed under a probabilistic framework via Expectation Maximization, including submodel coefficients, hidden state values and transition probabilities. Discrete mode classification and NARX regression tasks are disentangled within the iterations. Softlabels are assigned to latent states on the trajectories by averaging over the state posteriors and updated using the parametrization obtained from the previous maximization phase. Then, NARXs parameters are repeatedly fitted by solving weighted regression subproblems through a cyclical coordinate descent approach with coordinatewise minimization. Moreover, we investigate a two stage selection scheme, based on a l1norm bridge estimation followed by hardthresholding, to achieve parsimonious models through selection of the polynomial expansion. The proposed approach is demonstrated on a SMNARX problem composed by three nonlinear submodels with specific regressors.
 [30] arXiv:2009.14075 (crosslist from cs.LG) [pdf, other]

Title: Fast Fréchet Inception DistanceSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Fr\'echet Inception Distance (FID) has been used to evaluate thousands of generative models. We present a novel algorithm, FastFID, which allows fast computation and backpropagation for FID. FastFID can efficiently (1) evaluate generative model *during* training and (2) construct adversarial examples for FID.
 [31] arXiv:2009.14096 (crosslist from cs.LG) [pdf, other]

Title: Weakly SupervisedBased Oversampling for High Imbalance and High Dimensionality Data ClassificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
With the abundance of industrial datasets, imbalanced classification has become a common problem in several application domains. Oversampling is an effective method to solve imbalanced classification. One of the main challenges of existing oversampling methods is to accurately label the new synthetic samples. Inaccurate labels of synthetic samples would distort the distribution of the dataset and possibly worsen the classification performance. This paper introduces the idea of weakly supervised learning to handle the inaccurate labeling of synthetic samples by traditional oversampling methods. Graph semisupervised SMOTE is developed to improve the credibility of the synthetic samples' labels. In addition, we propose costsensitive neighborhood components analysis for high dimensionality datasets and bootstrap based ensemble framework for high imbalance datasets. The proposed method has achieved good classification performance on 8 synthetic datasets and 3 realworld datasets, especially for high imbalance and high dimensionality problems. The average performances and robustness are better than the benchmark methods.
 [32] arXiv:2009.14108 (crosslist from cs.LG) [pdf, other]

Title: AlignRUDDER: Learning From Few Demonstrations by Reward RedistributionAuthors: Vihang P. Patil, Markus Hofmarcher, MariusConstantin Dinu, Matthias Dorfer, Patrick M. Blies, Johannes Brandstetter, Jose A. ArjonaMedina, Sepp HochreiterSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into subtasks. A step in the Qfunction can be associated with solving a subtask, where the expectation of the return increases. RUDDER has been introduced to identify these steps and then redistribute reward to them, thus immediately giving reward if subtasks are solved. Since the problem of delayed rewards is mitigated, learning is considerably sped up. However, for complex tasks, current exploration strategies as deployed in RUDDER struggle with discovering episodes with high rewards. Therefore, we assume that episodes with high rewards are given as demonstrations and do not have to be discovered by exploration. Typically the number of demonstrations is small and RUDDER's LSTM model as a deep learning method does not learn well. Hence, we introduce AlignRUDDER, which is RUDDER with two major modifications. First, AlignRUDDER assumes that episodes with high rewards are given as demonstrations, replacing RUDDER's safe exploration and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile model that is obtained from multiple sequence alignment of demonstrations. Profile models can be constructed from as few as two demonstrations as known from bioinformatics. AlignRUDDER inherits the concept of reward redistribution, which considerably reduces the delay of rewards, thus speeding up learning. AlignRUDDER outperforms competitors on complex artificial tasks with delayed reward and few demonstrations. On the MineCraft ObtainDiamond task, AlignRUDDER is able to mine a diamond, though not frequently. Github: https://github.com/mljku/alignrudder, YouTube: https://youtu.be/HO_8ZUlUY
 [33] arXiv:2009.14111 (crosslist from cs.LG) [pdf, other]

Title: Inverse Classification with Limited Budget and Maximum Number of Perturbed SamplesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Most recent machine learning research focuses on developing new classifiers for the sake of improving classification accuracy. With many wellperforming stateoftheart classifiers available, there is a growing need for understanding interpretability of a classifier necessitated by practical purposes such as to find the best diet recommendation for a diabetes patient. Inverse classification is a post modeling process to find changes in input features of samples to alter the initially predicted class. It is useful in many business applications to determine how to adjust a sample input data such that the classifier predicts it to be in a desired class. In real world applications, a budget on perturbations of samples corresponding to customers or patients is usually considered, and in this setting, the number of successfully perturbed samples is key to increase benefits. In this study, we propose a new framework to solve inverse classification that maximizes the number of perturbed samples subject to a perfeaturebudget limits and favorable classification classes of the perturbed samples. We design algorithms to solve this optimization problem based on gradient methods, stochastic processes, Lagrangian relaxations, and the Gumbel trick. In experiments, we find that our algorithms based on stochastic processes exhibit an excellent performance in different budget settings and they scale well.
 [34] arXiv:2009.14131 (crosslist from stat.ME) [pdf, other]

Title: Dynamic sparsity on dynamic regression modelsComments: 31 pages, 5 figuresSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
In the present work, we consider variable selection and shrinkage for the Gaussian dynamic linear regression within a Bayesian framework. In particular, we propose a novel method that allows for timevarying sparsity, based on an extension of spikeandslab priors for dynamic models. This is done by assigning appropriate Markov switching priors for the timevarying coefficients' variances, extending the previous work of Ishwaran and Rao (2005). Furthermore, we investigate different priors, including the common Inverted gamma prior for the process variances, and other mixture prior distributions such as Gamma priors for both the spike and the slab, which leads to a mixture of NormalGammas priors (Griffin ad Brown, 2010) for the coefficients. In this sense, our prior can be view as a dynamic variable selection prior which induces either smoothness (through the slab) or shrinkage towards zero (through the spike) at each time point. The MCMC method used for posterior computation uses Markov latent variables that can assume binary regimes at each time point to generate the coefficients' variances. In that way, our model is a dynamic mixture model, thus, we could use the algorithm of Gerlach et al (2000) to generate the latent processes without conditioning on the states. Finally, our approach is exemplified through simulated examples and a real data application.
 [35] arXiv:2009.14133 (crosslist from cs.LG) [pdf, other]

Title: EEG to fMRI Synthesis: Is Deep Learning a candidate?Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Advances on signal, image and video generation underly major breakthroughs on generative medical imaging tasks, including Brain Image Synthesis. Still, the extent to which functional Magnetic Ressonance Imaging (fMRI) can be mapped from the brain electrophysiology remains largely unexplored. This work provides the first comprehensive view on how to use stateoftheart principles from Neural Processing to synthesize fMRI data from electroencephalographic (EEG) data. Given the distinct spatiotemporal nature of haemodynamic and electrophysiological signals, this problem is formulated as the task of learning a mapping function between multivariate time series with highly dissimilar structures. A comparison of stateoftheart synthesis approaches, including Autoencoders, Generative Adversarial Networks and Pairwise Learning, is undertaken. Results highlight the feasibility of EEG to fMRI brain image mappings, pinpointing the role of current advances in Machine Learning and showing the relevance of upcoming contributions to further improve performance. EEG to fMRI synthesis offers a way to enhance and augment brain image data, and guarantee access to more affordable, portable and longlasting protocols of brain activity monitoring. The code used in this manuscript is available in Github and the datasets are open source.
 [36] arXiv:2009.14136 (crosslist from qfin.PM) [pdf, other]

Title: Time your hedge with Deep Reinforcement LearningComments: 9 pages, 8 figuresSubjects: Portfolio Management (qfin.PM); Machine Learning (cs.LG); Machine Learning (stat.ML)
Can an asset manager plan the optimal timing for her/his hedging strategies given market conditions? The standard approach based on Markowitz or other more or less sophisticated financial rules aims to find the best portfolio allocation thanks to forecasted expected returns and risk but fails to fully relate market conditions to hedging strategies decision. In contrast, Deep Reinforcement Learning (DRL) can tackle this challenge by creating a dynamic dependency between market information and hedging strategies allocation decisions. In this paper, we present a realistic and augmented DRL framework that: (i) uses additional contextual information to decide an action, (ii) has a one period lag between observations and actions to account for one day lag turnover of common asset managers to rebalance their hedge, (iii) is fully tested in terms of stability and robustness thanks to a repetitive train test method called anchored walk forward training, similar in spirit to k fold cross validation for time series and (iv) allows managing leverage of our hedging strategy. Our experiment for an augmented asset manager interested in sizing and timing his hedges shows that our approach achieves superior returns and lower risk.
 [37] arXiv:2009.14138 (crosslist from cs.LG) [pdf, other]

Title: Selective Cascade of Residual ExtraTreesComments: To appear in SN Computer ScienceSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose a novel treebased ensemble method named Selective Cascade of Residual ExtraTrees (SCORE). SCORE draws inspiration from representation learning, incorporates regularized regression with variable selection features, and utilizes boosting to improve prediction and reduce generalization errors. We also develop a variable importance measure to increase the explainability of SCORE. Our computer experiments show that SCORE provides comparable or superior performance in prediction against ExtraTrees, random forest, gradient boosting machine, and neural networks; and the proposed variable importance measure for SCORE is comparable to studied benchmark methods. Finally, the predictive performance of SCORE remains stable across hyperparameter values, suggesting potential robustness to hyperparameter specification.
 [38] arXiv:2009.14148 (crosslist from cs.LG) [pdf, other]

Title: Unbalanced Sobolev DescentComments: NeurIPS 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce Unbalanced Sobolev Descent (USD), a particle descent algorithm for transporting a high dimensional source distribution to a target distribution that does not necessarily have the same mass. We define the SobolevFisher discrepancy between distributions and show that it relates to advectionreaction transport equations and the WassersteinFisherRao metric between distributions. USD transports particles along gradient flows of the witness function of the SobolevFisher discrepancy (advection step) and reweighs the mass of particles with respect to this witness function (reaction step). The reaction step can be thought of as a birthdeath process of the particles with rate of growth proportional to the witness function. When the SobolevFisher witness function is estimated in a Reproducing Kernel Hilbert Space (RKHS), under mild assumptions we show that USD converges asymptotically (in the limit of infinite particles) to the target distribution in the Maximum Mean Discrepancy (MMD) sense. We then give two methods to estimate the SobolevFisher witness with neural networks, resulting in two Neural USD algorithms. The first one implements the reaction step with mirror descent on the weights, while the second implements it through a birthdeath process of particles. We show on synthetic examples that USD transports distributions with or without conservation of mass faster than previous particle descent algorithms, and finally demonstrate its use for molecular biology analyses where our method is naturally suited to match developmental stages of populations of differentiating cells based on their singlecell RNA sequencing profile. Code is available at https://github.com/ibm/usd .
 [39] arXiv:2009.14168 (crosslist from cs.LG) [pdf, other]

Title: SelfSupervised FewShot Learning on Point CloudsComments: Accepted at NeurIPS 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The increased availability of massive point clouds coupled with their utility in a wide variety of applications such as robotics, shape synthesis, and selfdriving cars has attracted increased attention from both industry and academia. Recently, deep neural networks operating on labeled point clouds have shown promising results on supervised learning tasks like classification and segmentation. However, supervised learning leads to the cumbersome task of annotating the point clouds. To combat this problem, we propose two novel selfsupervised pretraining tasks that encode a hierarchical partitioning of the point clouds using a covertree, where point cloud subsets lie within balls of varying radii at each level of the covertree. Furthermore, our selfsupervised learning network is restricted to pretrain on the support set (comprising of scarce training examples) used to train the downstream network in a fewshot learning (FSL) setting. Finally, the fullytrained selfsupervised network's point embeddings are input to the downstream task's network. We present a comprehensive empirical evaluation of our method on both downstream classification and segmentation tasks and show that supervised methods pretrained with our selfsupervised learning method significantly improve the accuracy of stateoftheart methods. Additionally, our method also outperforms previous unsupervised methods in downstream classification tasks.
 [40] arXiv:2009.14193 (crosslist from cs.CV) [pdf, other]

Title: Uncertainty Sets for Image Classifiers using Conformal PredictionComments: Codebase available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)
Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies any classifier to output a predictive set containing the true label with a userspecified probability, such as 90%. The algorithm is simple and fast like Platt scaling, but provides a formal finitesample coverage guarantee for every model and dataset. Furthermore, our method generates much smaller predictive sets than alternative methods, since we introduce a regularizer to stabilize the small scores of unlikely classes after Platt scaling. In experiments on both Imagenet and ImagenetV2 with a ResNet152 and other classifiers, our scheme outperforms existing approaches, achieving exact coverage with sets that are often factors of 5 to 10 smaller.
Replacements for Wed, 30 Sep 20
 [41] arXiv:1906.10652 (replaced) [pdf, other]

Title: Monte Carlo Gradient Estimation in Machine LearningComments: 62 pagesJournalref: Journal of Machine Learning Research, 21(132):162, 2020Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
 [42] arXiv:1908.06852 (replaced) [pdf, other]

Title: SIRUS: Stable and Interpretable RUle SetAuthors: Clément Bénard (LPSM (UMR\_8001)), Gérard Biau (LPSM (UMR\_8001)), Sébastien da Veiga, Erwan Scornet (CMAP)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [43] arXiv:1908.11133 (replaced) [pdf, ps, other]

Title: On the rate of convergence of fully connected very deep neural network regression estimatesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [44] arXiv:2005.12123 (replaced) [pdf, other]

Title: Feature Robust Optimal Transport for Highdimensional DataAuthors: Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, YaoHung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto YamadaSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [45] arXiv:2009.11713 (replaced) [pdf, ps, other]

Title: Online Structural Changepoint Detection of Highdimensional Streaming Data via Dynamic Sparse Subspace LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [46] arXiv:1805.08045 (replaced) [pdf, other]

Title: A universal framework for learning the elliptical mixture modelComments: This work has been accepted to IEEE Transactions on Neural Networks and Learning Systems with DOI:10.1109/TNNLS.2020.3010198. The abstract link is this https URLSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
 [47] arXiv:1811.01146 (replaced) [pdf, other]

Title: ClosedLoop Memory GAN for Continual LearningComments: Proceedings of the TwentyEighth International Joint Conference on Artificial Intelligence (IJCAI2019). this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [48] arXiv:1901.08544 (replaced) [pdf, other]

Title: Learning Space Partitions for Nearest Neighbor SearchComments: ICLR 2020Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
 [49] arXiv:1901.09997 (replaced) [pdf, other]

Title: QuasiNewton Methods for Deep Learning: Forget the Past, Just SampleComments: 49 pages, 23 figuresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [50] arXiv:1906.01115 (replaced) [pdf, ps, other]

Title: Convergence Rate of $\mathcal{O}(1/k)$ for Optimistic Gradient and Extragradient Methods in Smooth ConvexConcave Saddle Point ProblemsComments: 19 pagesSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [51] arXiv:1906.07549 (replaced) [pdf]

Title: An AttentionGuided Deep Regression Model for Landmark Detection in CephalogramsSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [52] arXiv:1909.11285 (replaced) [pdf, other]

Title: A Dictionary Approach to DomainInvariant Learning in Deep NetworksSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [53] arXiv:1909.11373 (replaced) [pdf, other]

Title: Multitask Batch Reinforcement Learning with Metric LearningAuthors: Jiachen Li, Quan Vuong, Shuang Liu, Minghua Liu, Kamil Ciosek, Henrik Iskov Christensen, Hao SuSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [54] arXiv:1909.13403 (replaced) [pdf, other]

Title: Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open QuestionsComments: Published in IMC 2020. 20 pages, 26 figuresSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Machine Learning (stat.ML)
 [55] arXiv:1910.00482 (replaced) [pdf, other]

Title: Estimating Smooth GLM in Noninteractive Local Differential Privacy Model with Public Unlabeled DataSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [56] arXiv:1910.03201 (replaced) [pdf, other]

Title: Differentiable Sparsification for Deep Neural NetworksAuthors: Yognjin LeeSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [57] arXiv:2001.08322 (replaced) [pdf, other]

Title: FsNet: Feature Selection Network on Highdimensional Biological DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [58] arXiv:2001.11818 (replaced) [pdf, other]

Title: Community Detection in Bipartite Networks with Stochastic BlockmodelsComments: 17 pages, 6 figures. Code is available at this https URL and a documentation at this https URLJournalref: Phys. Rev. E 102, 032309 (2020)Subjects: Physics and Society (physics.socph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
 [59] arXiv:2002.01650 (replaced) [pdf, other]

Title: Concept Whitening for Interpretable Image RecognitionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [60] arXiv:2003.10392 (replaced) [pdf, other]

Title: Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural SplittingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [61] arXiv:2003.12169 (replaced) [pdf, other]

Title: A Collective Learning Framework to Boost GNN ExpressivenessSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [62] arXiv:2004.10696 (replaced) [pdf, other]

Title: Spectrally Consistent UNet for High Fidelity Image TransformationsSubjects: Image and Video Processing (eess.IV); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [63] arXiv:2005.02505 (replaced) [pdf, other]

Title: A generative adversarial network approach to calibration of local stochastic volatility modelsComments: Replacement for previous version: Major update of previous version to match the content of the published versionJournalref: Risks 2020, 8, 101Subjects: Computational Finance (qfin.CP); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [64] arXiv:2005.08898 (replaced) [pdf, ps, other]

Title: Accelerating IllConditioned LowRank Matrix Estimation via Scaled Gradient DescentSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
 [65] arXiv:2005.12844 (replaced) [pdf, other]

Title: Approximation Schemes for ReLU RegressionSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
 [66] arXiv:2007.01850 (replaced) [pdf, other]

Title: Variational Autoencoders for Anomalous Jet TaggingAuthors: Taoli Cheng, JeanFrançois Arguin, Julien LeissnerMartin, Jacinthe Pilette, Tobias GollingComments: 32 pages, 20 figures. References added; typos/errors corrected; further editting performedSubjects: High Energy Physics  Phenomenology (hepph); High Energy Physics  Experiment (hepex); Machine Learning (stat.ML)
 [67] arXiv:2007.01995 (replaced) [pdf, other]

Title: Bidirectional Modelbased Policy OptimizationComments: Accepted at ICML2020Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [68] arXiv:2008.03072 (replaced) [pdf, ps, other]

Title: Optimizing Information Loss Towards Robust Neural NetworksSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [69] arXiv:2008.08755 (replaced) [pdf, other]

Title: On $\ell_p$norm Robustness of Ensemble Stumps and TreesComments: ICML 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [70] arXiv:2008.09763 (replaced) [pdf, other]

Title: Variational Autoencoder for AntiCancer Drug Response PredictionComments: 9 pagesSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (stat.ML)
 [71] arXiv:2008.13064 (replaced) [pdf, other]

Title: Towards Demystifying Dimensions of Source Code EmbeddingsComments: 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, Colocated with ESEC/FSE (RL+SE&PL'20)Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE); Machine Learning (stat.ML)
 [72] arXiv:2009.02296 (replaced) [pdf, other]

Title: Variational Deep Learning for the Identification and Reconstruction of Chaotic and Stochastic Dynamical Systems from Noisy and Partial ObservationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [73] arXiv:2009.07547 (replaced) [pdf, other]

Title: Grassmannian diffusion maps based dimension reduction and classification for highdimensional dataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [74] arXiv:2009.09590 (replaced) [pdf, other]

Title: Deep Clustering and Representation Learning that Preserves Geometric StructuresComments: 20 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [75] arXiv:2009.09991 (replaced) [pdf, other]

Title: Modeling Text with Decision Forests using CategoricalSet SplitsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [76] arXiv:2009.12981 (replaced) [pdf, other]

Title: Parametric UMAP: learning embeddings with deep neural networks for representation and semisupervised learningSubjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Quantitative Methods (qbio.QM); Machine Learning (stat.ML)
 [77] arXiv:2009.12991 (replaced) [pdf, other]

Title: LongTailed Classification by Keeping the Good and Removing the Bad Momentum Causal EffectComments: This paper is accepted by NeurIPS 2020. The code is available on GitHub: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2009, contact, help (Access key information)