Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 6 Dec 19
 [1] arXiv:1912.02290 [pdf, other]

Title: Indian Buffet Neural Networks for Continual LearningComments: Cameraready submissionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We place an Indian Buffet Process (IBP) prior over the neural structure of a Bayesian Neural Network (BNN), thus allowing the complexity of the BNN to increase and decrease automatically. We apply this methodology to the problem of resource allocation in continual learning, where new tasks occur and the network requires extra resources. Our BNN exploits online variational inference with relaxations to the Bernoulli and Beta distributions (which constitute the IBP prior), so allowing the use of the reparameterisation trick to learn variational posteriors via gradientbased methods. As we automatically learn the number of weights in the BNN, overfitting and underfitting problems are largely overcome. We show empirically that the method offers competitive results compared to Variational Continual Learning (VCL) in some settings.
 [2] arXiv:1912.02399 [pdf, other]

Title: A sparse negative binomial mixture model for clustering RNAseq count dataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Clustering with variable selection is a challenging but critical task for modern smallnlargep data. Existing methods based on Gaussian mixture models or sparse Kmeans provide solutions to continuous data. With the prevalence of RNAseq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with Gaussian assumption. In this paper, we develop a negative binomial mixture model with gene regularization to cluster samples (small $n$) with highdimensional gene features (large $p$). EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with sparse Gaussian mixture model and sparse Kmeans using extensive simulations and two real transcriptomic applications in breast cancer and rat brain studies. The result shows superior performance of the proposed count data model in clustering accuracy, feature selection and biological interpretation by pathway enrichment analysis.
 [3] arXiv:1912.02493 [pdf, other]

Title: Ordinal Bayesian OptimisationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
Bayesian optimisation is a powerful tool to solve expensive blackbox problems, but fails when the stationary assumption made on the objective function is strongly violated, which is the case in particular for illconditioned or discontinuous objectives. We tackle this problem by proposing a new Bayesian optimisation framework that only considers the ordering of variables, both in the input and output spaces, to fit a Gaussian process in a latent space. By doing so, our approach is agnostic to the original metrics on the original spaces. We propose two algorithms, respectively based on an optimistic strategy and on Thompson sampling. For the optimistic strategy we prove an optimal performance under the measure of regret in the latent space. We illustrate the capability of our framework on several challenging toy problems.
 [4] arXiv:1912.02527 [pdf, other]

Title: Warped Input Gaussian Processes for Time Series ForecastingAuthors: David TolpinComments: 6 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We introduce a Gaussian processbased model for handling of nonstationarity. The warping is achieved nonparametrically, through imposing a prior on the relative change of distance between subsequent observation inputs. The model allows the use of general gradient optimization algorithms for training and incurs only a small computational overhead on training and prediction. The model finds its applications in forecasting in nonstationary time series with either gradually varying volatility, presence of change points, or a combination thereof. We evaluate the model on synthetic and realworld time series data comparing against both baseline and known stateoftheart approaches and show that the model exhibits stateoftheart forecasting performance at a lower implementation and computation cost.
 [5] arXiv:1912.02644 [pdf, other]

Title: Representing Closed Transformation Paths in Encoded Network Latent SpaceComments: Accepted at AAAI 2020Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Deep generative networks have been widely used for learning mappings from a lowdimensional latent space to a highdimensional data space. In many cases, data transformations are defined by linear paths in this latent space. However, the Euclidean structure of the latent space may be a poor match for the underlying latent structure in the data. In this work, we incorporate a generative manifold model into the latent space of an autoencoder in order to learn the lowdimensional manifold structure from the data and adapt the latent space to accommodate this structure. In particular, we focus on applications in which the data has closed transformation paths which extend from a starting point and return to nearly the same point. Through experiments on data with natural closed transformation paths, we show that this model introduces the ability to learn the latent dynamics of complex systems, generate transformation paths, and classify samples that belong on the same transformation path.
 [6] arXiv:1912.02724 [pdf, other]

Title: Causal structure based root cause analysis of outliersComments: 11 pages, 9 FiguresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We describe a formal approach to identify 'root causes' of outliers observed in $n$ variables $X_1,\dots,X_n$ in a scenario where the causal relation between the variables is a known directed acyclic graph (DAG). To this end, we first introduce a systematic way to define outlier scores. Further, we introduce the concept of 'conditional outlier score' which measures whether a value of some variable is unexpected *given the value of its parents* in the DAG, if one were to assume that the causal structure and the corresponding conditional distributions are also valid for the anomaly. Finally, we quantify to what extent the high outlier score of some target variable can be attributed to outliers of its ancestors. This quantification is defined via Shapley values from cooperative game theory.
 [7] arXiv:1912.02738 [pdf, other]

Title: MetaFun: MetaLearning with Iterative Functional UpdatesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Fewshot supervised learning leverages experience from previous learning tasks to solve new tasks where only a few labelled examples are available. One successful line of approach to this problem is to use an encoderdecoder metalearning pipeline, whereby labelled data in a task is encoded to produce task representation, and this representation is used to condition the decoder to make predictions on unlabelled data. We propose an approach that uses this pipeline with two important features. 1) We use infinitedimensional functional representations of the task rather than fixeddimensional representations. 2) We iteratively apply functional updates to the representation. We show that our approach can be interpreted as extending functional gradient descent, and delivers performance that is comparable to or outperforms previous stateoftheart on fewshot classification benchmarks such as miniImageNet and tieredImageNet.
 [8] arXiv:1912.02757 [pdf, other]

Title: Deep Ensembles: A Loss Landscape PerspectiveSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Deep ensembles have been empirically shown to be a promising approach for improving accuracy, uncertainty and outofdistribution robustness of deep learning models. While deep ensembles were theoretically motivated by the bootstrap, nonbootstrap ensembles trained with just random initialization also perform well in practice, which suggests that there could be other explanations for why deep ensembles work well. Bayesian neural networks, which learn distributions over the parameters of the network, are theoretically wellmotivated by Bayesian principles, but do not perform as well as deep ensembles in practice, particularly under dataset shift. One possible explanation for this gap between theory and practice is that popular scalable approximate Bayesian methods tend to focus on a single mode, whereas deep ensembles tend to explore diverse modes in function space. We investigate this hypothesis by building on recent work on understanding the loss landscape of neural networks and adding our own exploration to measure the similarity of functions in the space of predictions. Our results show that random initializations explore entirely different modes, while functions along an optimization trajectory or sampled from the subspace thereof cluster within a single mode predictionswise, while often deviating significantly in the weight space. We demonstrate that while lowloss connectors between modes exist, they are not connected in the space of predictions. Developing the concept of the diversityaccuracy plane, we show that the decorrelation power of random initializations is unmatched by popular subspace sampling methods.
 [9] arXiv:1912.02762 [pdf, other]

Title: Normalizing Flows for Probabilistic Modeling and InferenceAuthors: George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, Balaji LakshminarayananComments: Review article. 60 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of a unified perspective. In this review, we attempt to provide such a perspective by describing flows through the lens of probabilistic modeling and inference. We place special emphasis on the fundamental principles of flow design, and discuss foundational topics such as expressive power and computational tradeoffs. We also broaden the conceptual framing of flows by relating them to more general probability transformations. Lastly, we summarize the use of flows for tasks such as generative modeling, approximate inference, and supervised learning.
 [10] arXiv:1912.02771 [pdf, other]

Title: LabelConsistent Backdoor AttacksSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, they crucially rely on the adversary injecting arbitrary inputs that areoften blatantlymislabeled. Such samples would raise suspicion upon human inspection, potentially revealing the attack. Thus, for backdoor attacks to remain undetected, it is crucial that they maintain labelconsistencythe condition that injected inputs are consistent with their labels. In this work, we leverage adversarial perturbations and generative models to execute efficient, yet labelconsistent, backdoor attacks. Our approach is based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easiertolearn) backdoor trigger.
 [11] arXiv:1912.02781 [pdf, other]

Title: AugMix: A Simple Data Processing Method to Improve Robustness and UncertaintyAuthors: Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji LakshminarayananComments: Code available at this https URLSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Modern deep neural networks can achieve high accuracy when the training distribution and test distribution are identically distributed, but this assumption is frequently violated in practice. When the train and test distributions are mismatched, accuracy can plummet. Currently there are few techniques that improve robustness to unforeseen data shifts encountered during deployment. In this work, we propose a technique to improve the robustness and uncertainty estimates of image classifiers. We propose AugMix, a data processing technique that is simple to implement, adds limited computational overhead, and helps models withstand unforeseen corruptions. AugMix significantly improves robustness and uncertainty measures on challenging image classification benchmarks, closing the gap between previous methods and the best possible performance in some cases by more than half.
 [12] arXiv:1912.02803 [pdf, other]

Title: Neural Tangents: Fast and Easy Infinite Neural Networks in PythonAuthors: Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha SohlDickstein, Samuel S. SchoenholzSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Neural Tangents is a library designed to enable research into infinitewidth neural networks. It provides a highlevel API for specifying complex and hierarchical neural network architectures. These networks can then be trained and evaluated either at finitewidth as usual or in their infinitewidth limit. Infinitewidth networks can be trained analytically using exact Bayesian inference or using gradient descent via the Neural Tangent Kernel. Additionally, Neural Tangents provides tools to study gradient descent training dynamics of wide but finite networks in either function space or weight space.
The entire library runs outofthebox on CPU, GPU, or TPU. All computations can be automatically distributed over multiple accelerators with nearlinear scaling in the number of devices. Neural Tangents is available at www.github.com/google/neuraltangents. We also provide an accompanying interactive Colab notebook.
Crosslists for Fri, 6 Dec 19
 [13] arXiv:1912.02233 (crosslist from cs.LG) [pdf, ps, other]

Title: LargeScale SemiSupervised Learning via Graph Structure Learning over HighDense PointsComments: 25 Pages, 2 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We focus on developing a novel scalable graphbased semisupervised learning (SSL) method for a small number of labeled data and a large amount of unlabeled data. Due to the lack of labeled data and the availability of largescale unlabeled data, existing SSL methods usually encounter either suboptimal performance because of an improper graph or the high computational complexity of the largescale optimization problem. In this paper, we propose to address both challenging problems by constructing a proper graph for graphbased SSL methods. Different from existing approaches, we simultaneously learn a small set of vertexes to characterize the highdense regions of the input data and a graph to depict the relationships among these vertexes. A novel approach is then proposed to construct the graph of the input data from the learned graph of a small number of vertexes with some preferred properties. Without explicitly calculating the constructed graph of inputs, two transductive graphbased SSL approaches are presented with the computational complexity in linear with the number of input data. Extensive experiments on synthetic data and real datasets of varied sizes demonstrate that the proposed method is not only scalable for largescale data, but also achieve good classification performance, especially for extremely small number of labels.
 [14] arXiv:1912.02254 (crosslist from cs.LG) [pdf, other]

Title: Deep Model Compression via Deep Reinforcement LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Besides accuracy, the storage of convolutional neural networks (CNN) models is another important factor considering limited hardware resources in practical applications. For example, autonomous driving requires the design of accurate yet fast CNN for low latency in object detection and classification. To fulfill the need, we aim at obtaining CNN models with both high testing accuracy and small size/storage to address resource constraints in many embedded systems. In particular, this paper focuses on proposing a generic reinforcement learning based model compression approach in a twostage compression pipeline: pruning and quantization. The first stage of compression, i.e., pruning, is achieved via exploiting deep reinforcement learning (DRL) to colearn the accuracy of CNN models updated after layerwise channel pruning on a testing dataset and the FLOPs, number of floating point operations in each layer, updated after kernelwise variational pruning using information dropout. Layerwise channel pruning is to remove unimportant kernels from the input channel dimension while kernelwise variational pruning is to remove unimportant kernels from the 2Dkernel dimensions, namely, height and width. The second stage, i.e., quantization, is achieved via a similar DRL approach but focuses on obtaining the optimal weight bits for individual layers. We further conduct experimental results on CIFAR10 and ImageNet datasets. For the CIFAR10 dataset, the proposed method can reduce the size of VGGNet by 9x from 20.04MB to 2.2MB with 0.2% accuracy increase. For the ImageNet dataset, the proposed method can reduce the size of VGG16 by 33x from 138MB to 4.14MB with no accuracy loss.
 [15] arXiv:1912.02258 (crosslist from cs.CR) [pdf, ps, other]

Title: A Survey of Game Theoretic Approaches for Adversarial Machine Learning in Cybersecurity TasksComments: 13 pages, 2 figures, 1 tableJournalref: AI Magazine, 40(2), 3143 (2019)Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Machine learning techniques are currently used extensively for automating various cybersecurity tasks. Most of these techniques utilize supervised learning algorithms that rely on training the algorithm to classify incoming data into different categories, using data encountered in the relevant domain. A critical vulnerability of these algorithms is that they are susceptible to adversarial attacks where a malicious entity called an adversary deliberately alters the training data to misguide the learning algorithm into making classification errors. Adversarial attacks could render the learning algorithm unsuitable to use and leave critical systems vulnerable to cybersecurity attacks. Our paper provides a detailed survey of the stateoftheart techniques that are used to make a machine learning algorithm robust against adversarial attacks using the computational framework of game theory. We also discuss open problems and challenges and possible directions for further research that would make deep machine learningbased systems more robust and reliable for cybersecurity tasks.
 [16] arXiv:1912.02260 (crosslist from cs.LG) [pdf, other]

Title: The effect of task and training on intermediate representations in convolutional neural networks revealed with modified RV similarity analysisComments: 4 pages, 4 figures, Conference on Cognitive Computational Neuroscience 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Centered Kernel Alignment (CKA) was recently proposed as a similarity metric for comparing activation patterns in deep networks. Here we experiment with the modified RVcoefficient (RV2), which has very similar properties as CKA while being less sensitive to dataset size. We compare the representations of networks that received varying amounts of training on different layers: a standard trained network (all parameters updated at every step), a freeze trained network (layers gradually frozen during training), random networks (only some layers trained), and a completely untrained network. We found that RV2 was able to recover expected similarity patterns and provide interpretable similarity matrices that suggested hypotheses about how representations are affected by different training recipes. We propose that the superior performance achieved by freeze training can be attributed to representational differences in the penultimate layer. Our comparisons of random networks suggest that the inputs and targets serve as anchors on the representations in the lowest and highest layers.
 [17] arXiv:1912.02276 (crosslist from cs.LG) [pdf, other]

Title: Enhancing Stratospheric Weather Analyses and Forecasts by Deploying Sensors from a Weather BalloonComments: NeurIPS 2019 Workshop: Tackling Climate Change with Machine LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The ability to analyze and forecast stratospheric weather conditions is fundamental to addressing climate change. However, our capacity to collect data in the stratosphere is limited by sparsely deployed weather balloons. We propose a framework to collect stratospheric data by releasing a contrail of tiny sensor devices as a weather balloon ascends. The key machine learning challenges are determining when and how to deploy a finite collection of sensors to produce a useful data set. We decide when to release sensors by modeling the deviation of a forecast from actual stratospheric conditions as a Gaussian process. We then implement a novel hardware system that is capable of optimally releasing sensors from a rising weather balloon. We show that this data engineering framework is effective through real weather balloon flights, as well as simulations.
 [18] arXiv:1912.02279 (crosslist from cs.LG) [pdf, other]

Title: Angular Visual HardnessAuthors: Beidi Chen, Weiyang Liu Animesh Garg, Zhiding Yu, Anshumali Shrivastava, Jan Kautz, Anima AnandkumarSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Although convolutional neural networks (CNNs) are inspired by the mechanisms behind human visual systems, they diverge on many measures such as ambiguity or hardness. In this paper, we make a surprising discovery: there exists a (nearly) universal score function for CNNs whose correlation is statistically significant than the widely used model confidence with human visual hardness. We term this function as angular visual hardness (AVH) which is given by the normalized angular distance between a feature embedding and the classifier weights of the corresponding target category in a CNN. We conduct an indepth scientific study. We observe that CNN models with the highest accuracy also have the best AVH scores. This agrees with an earlier finding that stateofart models tend to improve on the classification of harder training examples. We find that AVH displays interesting dynamics during training: it quickly reaches a plateau even though the training loss keeps improving. This suggests the need for designing better loss functions that can target harder examples more effectively. Finally, we empirically show significant improvement in performance by using AVH as a measure of hardness in selftraining methods for domain adaptation.
 [19] arXiv:1912.02280 (crosslist from cs.LG) [pdf, other]

Title: Natural Alpha EmbeddingsComments: 23 pages, 6 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning an embedding for a large collection of items is a popular approach to overcome the computational limitations associated to onehot encodings. The aim of item embedding is to learn a low dimensional space for the representations, able to capture with its geometry relevant features or relationships for the data at hand. This can be achieved for example by exploiting adjacencies among items in large sets of unlabelled data. In this paper we interpret in an Information Geometric framework the item embeddings obtained from conditional models. By exploiting the $\alpha$geometry of the exponential family, first introduced by Amari, we introduce a family of natural $\alpha$embeddings represented by vectors in the tangent space of the probability simplex, which includes as a special case standard approaches available in the literature. A typical example is given by word embeddings, commonly used in natural language processing, such as Word2Vec and GloVe. In our analysis, we show how the $\alpha$deformation parameter can impact on standard evaluation tasks.
 [20] arXiv:1912.02292 (crosslist from cs.LG) [pdf, other]

Title: Deep Double Descent: Where Bigger Models and More Data HurtComments: G.K. and Y.B. contributed equallySubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We show that a variety of modern deep learning tasks exhibit a "doubledescent" phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.
 [21] arXiv:1912.02338 (crosslist from cs.LG) [pdf, ps, other]

Title: RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number ClippingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Natural gradient has been recently introduced to the field of boosting to enable the generic probabilistic predication capability. Natural gradient boosting shows promising performance improvements on small datasets due to better training dynamics, but it suffers from slow training speed overhead especially for large datasets. We present a replication study of NGBoost(Duan et al., 2019) training that carefully examines the impacts of key hyperparameters under the circumstance of bestfirst decision tree learning. We find that with the regularization of leaf number clipping, the performance of NGBoost can be largely improved via a better choice of hyperparameters. Experiments show that our approach significantly beats the stateoftheart performance on various kinds of datasets from the UCI Machine Learning Repository while still has up to 4.85x speed up compared with the original approach of NGBoost.
 [22] arXiv:1912.02351 (crosslist from cs.LG) [pdf, other]

Title: Probabilisticallyautoencoded horseshoedisentangled multidomain itemresponse theory modelsComments: Presented as poster at the NeurIPS 2019 Bayesian Deep Learning workshopSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Item response theory (IRT) is a nonlinear generative probabilistic paradigm for using exams to identify, quantify, and compare latent traits of individuals, relative to their peers, within a population of interest. In preexisting multidimensional IRT methods, one requires a factorization of the test items. For this task, linear exploratory factor analysis is used, making IRT a posthoc model. We propose skipping the initial factor analysis by using a sparsitypromoting horseshoe prior to perform factorization directly within the IRT model so that all training occurs in a single selfconsistent step. Being a hierarchical Bayesian model, we adapt the WAIC to the problem of dimensionality selection. IRT models are analogous to probabilistic autoencoders. By binding the generative IRT model to a Bayesian neural network (forming a probabilistic autoencoder), one obtains a scoring algorithm consistent with the interpretable Bayesian model. In some IRT applications the blackbox nature of a neural network scoring machine is desirable. In this manuscript, we demonstrate withinIRT factorization and comment on scoring approaches.
 [23] arXiv:1912.02365 (crosslist from math.OC) [pdf, other]

Title: Lower Bounds for NonConvex Stochastic OptimizationAuthors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake WoodworthSubjects: Optimization and Control (math.OC); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
We lower bound the complexity of finding $\epsilon$stationary points (with gradient norm at most $\epsilon$) using stochastic firstorder methods. In a wellstudied model where algorithms access smooth, potentially nonconvex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $\epsilon^{4}$ queries to find an $\epsilon$ stationary point. The lower bound is tight, and establishes that stochastic gradient descent is minimax optimal in this model. In a more restrictive model where the noisy gradient estimates satisfy a meansquared smoothness property, we prove a lower bound of $\epsilon^{3}$ queries, establishing the optimality of recently proposed variance reduction techniques.
 [24] arXiv:1912.02373 (crosslist from econ.GN) [pdf]

Title: Modeling and Prediction of Iran's Steel Consumption Based on Economic Activity Using Support Vector MachinesComments: 13 pages, 13 figuresSubjects: General Economics (econ.GN); Machine Learning (cs.LG); Machine Learning (stat.ML)
The steel industry has great impacts on the economy and the environment of both developed and underdeveloped countries. The importance of this industry and these impacts have led many researchers to investigate the relationship between a country's steel consumption and its economic activity resulting in the socalled intensity of use model. This paper investigates the validity of the intensity of use model for the case of Iran's steel consumption and extends this hypothesis by using the indexes of economic activity to model the steel consumption. We use the proposed model to train support vector machines and predict the future values for Iran's steel consumption. The paper provides detailed correlation tests for the factors used in the model to check for their relationships with the steel consumption. The results indicate that Iran's steel consumption is strongly correlated with its economic activity following the same pattern as the economy has been in the last four decades.
 [25] arXiv:1912.02379 (crosslist from cs.LG) [pdf, other]

Title: Largescale Pretraining for Visual Dialog: A Simple StateoftheArt BaselineSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Prior work in visual dialog has focused on training deep neural models on the VisDial dataset in isolation, which has led to great progress, but is limiting and wasteful. In this work, following recent trends in representation learning for language, we introduce an approach to leverage pretraining on related largescale visionlanguage datasets before transferring to visual dialog. Specifically, we adapt the recently proposed ViLBERT (Lu et al., 2019) model for multiturn visuallygrounded conversation sequences. Our model is pretrained on the Conceptual Captions and Visual Question Answering datasets, and finetuned on VisDial with a VisDialspecific input representation and the masked language modeling and next sentence prediction objectives (as in BERT). Our best single model achieves stateoftheart on Visual Dialog, outperforming prior published work (including model ensembles) by more than 1% absolute on NDCG and MRR.
Next, we carefully analyse our model and find that additional finetuning using 'dense' annotations i.e. relevance scores for all 100 answer options corresponding to each question on a subset of the training set, leads to even higher NDCG  more than 10% over our base model  but hurts MRR  more than 17% below our base model! This highlights a stark tradeoff between the two primary metrics for this task  NDCG and MRR. We find that this is because dense annotations in the dataset do not correlate well with the original groundtruth answers to questions, often rewarding the model for generic responses (e.g. "can't tell").  [26] arXiv:1912.02386 (crosslist from cs.LG) [pdf, other]

Title: The Search for Sparse, Robust Neural NetworksComments: The Safety and Robustness in Decision Making Workshop at the 33rd Conference on Neural InformationProcessing Systems (NeurIPS 2019), Vancouver, CanadaSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Recent work on deep neural network pruning has shown there exist sparse subnetworks that achieve equal or improved accuracy, training time, and loss using fewer network parameters when compared to their dense counterparts. Orthogonal to pruning literature, deep neural networks are known to be susceptible to adversarial examples, which may pose risks in security or safetycritical applications. Intuition suggests that there is an inherent tradeoff between sparsity and robustness such that these characteristics could not coexist. We perform an extensive empirical evaluation and analysis testing the Lottery Ticket Hypothesis with adversarial training and show this approach enables us to find sparse, robust neural networks. Code for reproducing experiments is available here: https://github.com/justincosentino/robustsparsenetworks.
 [27] arXiv:1912.02390 (crosslist from cs.LG) [pdf, other]

Title: Towards Robust Relational Causal DiscoveryComments: 14 pagesJournalref: Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence, UAI 2019Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We consider the problem of learning causal relationships from relational data. Existing approaches rely on queries to a relational conditional independence (RCI) oracle to establish and orient causal relations in such a setting. In practice, queries to a RCI oracle have to be replaced by reliable tests for RCI against available data. Relational data present several unique challenges in testing for RCI. We study the conditions under which traditional iidbased conditional independence (CI) tests yield reliable answers to RCI queries against relational data. We show how to conduct CI tests against relational data to robustly recover the underlying relational causal structure. Results of our experiments demonstrate the effectiveness of our proposed approach.
 [28] arXiv:1912.02392 (crosslist from math.ST) [pdf, other]

Title: KoPA: Automated Kronecker Product ApproximationSubjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
We consider matrix approximation induced by the Kronecker product decomposition. Similar as the low rank approximations, which seeks to approximate a given matrix by the sum of a few rank1 matrices, we propose to use the approximation by the sum of a few Kronecker products, which we refer to as the Kronecker product approximation (KoPA). Although it can be transformed into an SVD problem, KoPA offers a greater flexibility over low rank approximation, since it allows the user to choose the configuration of the Kronecker product. On the other hand, the configuration (the dimensions of the two smaller matrices forming the Kronecker product) to be used is usually unknown, and has to be determined from the data in order to obtain optimal balance between accuracy and complexity. We propose to use an extended information criterion to select the configuration. Under the paradigm of high dimensionality, we show that the proposed procedure is able to select the true configuration with probability tending to one, under suitable conditions on the signaltonoise ratio. We demonstrate the performance and superiority of KoPA over the low rank approximations thought numerical studies, and a real example in image analysis.
 [29] arXiv:1912.02400 (crosslist from cs.LG) [pdf, other]

Title: Covariance Matrix Adaptation for the Rapid Illumination of Behavior SpaceSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Quality Diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAPElites are a new class of populationbased stochastic algorithms designed to generate a diverse collection of quality solutions. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMAES) are among the bestperforming derivativefree optimizers in singleobjective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAPElites (CMAME). Our new algorithm combines the dynamic selfadaptation techniques of CMAES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments with standard continuous optimization benchmarks show that CMAME finds betterquality solutions than MAPElites; similarly, results on the strategic game Hearthstone show that CMAME finds both a higher overall quality and broader diversity of strategies than both CMAES and MAPElites. Overall, CMAME more than doubles the performance of MAPElites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from stateoftheart optimization algorithms can yield highperforming methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains. Code is available for both the continuous optimization benchmark (https://github.com/tehqin/QualDivBenchmark) and Hearthstone (https://github.com/tehqin/EvoStone) domains.
 [30] arXiv:1912.02405 (crosslist from cs.LG) [pdf]

Title: Clustering TimeSeries by a Novel SlopeBased Similarity Measure Considering Particle Swarm OptimizationComments: 27 pages, 8 figures, 12 tablesSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Recently there has been an increase in the studies on timeseries data mining specifically timeseries clustering due to the vast existence of timeseries in various domains. The large volume of data in the form of timeseries makes it necessary to employ various techniques such as clustering to understand the data and to extract information and hidden patterns. In the field of clustering specifically, timeseries clustering, the most important aspects are the similarity measure used and the algorithm employed to conduct the clustering. In this paper, a new similarity measure for timeseries clustering is developed based on a combination of a simple representation of timeseries, slope of each segment of timeseries, Euclidean distance and the socalled dynamic time warping. It is proved in this paper that the proposed distance measure is metric and thus indexing can be applied. For the task of clustering, the Particle Swarm Optimization algorithm is employed. The proposed similarity measure is compared to three existing measures in terms of various criteria used for the evaluation of clustering algorithms. The results indicate that the proposed similarity measure outperforms the rest in almost every dataset used in this paper.
 [31] arXiv:1912.02427 (crosslist from cs.LG) [pdf, other]

Title: Analysis of the Optimization Landscapes for Overcomplete Representation LearningComments: 68 pages, 5 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV); Signal Processing (eess.SP); Machine Learning (stat.ML)
We study nonconvex optimization landscapes for learning overcomplete representations, including learning \emph{(i)} sparsely used overcomplete dictionaries and \emph{(ii)} convolutional dictionaries, where these unsupervised learning problems find many applications in highdimensional data analysis. Despite the empirical success of simple nonconvex algorithms, theoretical justifications of why these methods work so well are far from satisfactory. In this work, we show these problems can be formulated as $\ell^4$norm optimization problems with spherical constraint, and study the geometric properties of their nonconvex optimization landscapes. %For both problems, we show the nonconvex objectives have benign (global) geometric structures, in the sense that every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature. This discovery enables the development of guaranteed global optimization methods using simple initializations. For both problems, we show the nonconvex objectives have benign geometric structures  every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvatureeither in the entire space or within a sufficiently large region. This discovery ensures local search algorithms (such as Riemannian gradient descent) with simple initializations approximately find the target solutions. Finally, numerical experiments justify our theoretical discoveries.
 [32] arXiv:1912.02494 (crosslist from cs.LG) [pdf, other]

Title: MetalGAN: MultiDomain LabelLess Image Synthesis Using cGANs and MetaLearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Image synthesis is currently one of the most addressed image processing topic in computer vision and deep learning fields of study. Researchers have tackled this problem focusing their efforts on its several challenging problems, e.g. image quality and size, domain and pose changing, architecture of the networks, and so on. Above all, producing images belonging to different domains by using a single architecture is a very relevant goal for image generation. In fact, a single multidomain network would allow greater flexibility and robustness in the image synthesis task than other approaches. This paper proposes a novel architecture and a training algorithm, which are able to produce multidomain outputs using a single network. A small portion of a dataset is intentionally used, and there are no hardcoded labels (or classes). This is achieved by combining a conditional Generative Adversarial Network (cGAN) for image generation and a MetaLearning algorithm for domain switch, and we called our approach MetalGAN. The approach has proved to be appropriate for solving the multidomain problem and it is validated on facial attribute transfer, using CelebA dataset.
 [33] arXiv:1912.02503 (crosslist from cs.LG) [pdf, other]

Title: Hindsight Credit AssignmentAuthors: Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi MunosComments: NeurIPS 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the observed outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions can be rewritten through this lens, yielding a new family of algorithms. We study the properties of these algorithms, and empirically show that they successfully address important credit assignment challenges, through a set of illustrative tasks.
 [34] arXiv:1912.02522 (crosslist from cs.SD) [pdf, other]

Title: VoxSRC 2019: The first VoxCeleb Speaker Recognition ChallengeAuthors: Joon Son Chung, Arsha Nagrani, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew ZissermanComments: ISCA ArchiveSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a public challenge and workshop held at Interspeech 2019 in Graz, Austria. This paper outlines the challenge and provides its baselines, results and discussions.
 [35] arXiv:1912.02532 (crosslist from cs.LG) [pdf, other]

Title: Iterative PolicySpace Expansion in Reinforcement LearningComments: Workshop on Biological and Artificial Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, CanadaSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Humans and animals solve a difficult problem much more easily when they are presented with a sequence of problems that starts simple and slowly increases in difficulty. We explore this idea in the context of reinforcement learning. Rather than providing the agent with an externally provided curriculum of progressively more difficult tasks, the agent solves a single task utilizing a decreasingly constrained policy space. The algorithm we propose first learns to categorize features into positive and negative before gradually learning a more refined policy. Experimental results in Tetris demonstrate superior learning rate of our approach when compared to existing algorithms.
 [36] arXiv:1912.02566 (crosslist from cs.LG) [pdf, other]

Title: Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss FunctionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsityinducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points.
 [37] arXiv:1912.02572 (crosslist from cs.LG) [pdf, other]

Title: Dynamic Pricing on Ecommerce Platform with Deep Reinforcement LearningComments: 9 pages, 7 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper we present an endtoend framework for addressing the problem of dynamic pricing on Ecommerce platform using methods based on deep reinforcement learning (DRL). By using four groups of different business data to represent the states of each time period, we model the dynamic pricing problem as a Markov Decision Process (MDP). Compared with the stateoftheart DRLbased dynamic pricing algorithms, our approaches make the following three contributions. First, we extend the discrete set problem to the continuous price set. Second, instead of using revenue as the reward function directly, we define a new function named difference of revenue conversion rates (DRCR). Third, the coldstart problem of MDP is tackled by pretraining and evaluation using some carefully chosen historical sales data. Our approaches are evaluated by both offline evaluation method using real dataset of Alibaba Inc., and online field experiments on Tmall.com, a major online shopping website owned by Alibaba Inc.. In particular, experiment results suggest that DRCR is a more appropriate reward function than revenue, which is widely used by current literature. In the end, field experiments, which last for months on 1000 stock keeping units (SKUs) of products demonstrate that continuous price sets have better performance than discrete sets and show that our approaches significantly outperformed the manual pricing by operation experts.
 [38] arXiv:1912.02574 (crosslist from cs.NE) [pdf, other]

Title: DataDriven Optimization of Public Transit ScheduleComments: 20 pages, 6 figures, 2 tablesSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)
Bus transit systems are the backbone of public transportation in the United States. An important indicator of the quality of service in such infrastructures is ontime performance at stops, with published transit schedules playing an integral role governing the level of success of the service. However there are relatively few optimization architectures leveraging stochastic search that focus on optimizing bus timetables with the objective of maximizing probability of bus arrivals at timepoints with delays within desired ontime ranges. In addition to this, there is a lack of substantial research considering monthly and seasonal variations of delay patterns integrated with such optimization strategies. To address these,this paper makes the following contributions to the corpus of studies on transit ontime performance optimization: (a) an unsupervised clustering mechanism is presented which groups months with similar seasonal delay patterns, (b) the problem is formulated as a singleobjective optimization task and a greedy algorithm, a genetic algorithm (GA) as well as a particle swarm optimization (PSO) algorithm are employed to solve it, (c) a detailed discussion on empirical results comparing the algorithms are provided and sensitivity analysis on hyperparameters of the heuristics are presented along with execution times, which will help practitioners looking at similar problems. The analyses conducted are insightful in the local context of improving public transit scheduling in the Nashville metro region as well as informative from a global perspective as an elaborate case study which builds upon the growing corpus of empirical studies using natureinspired approaches to transit schedule optimization.
 [39] arXiv:1912.02580 (crosslist from cs.LG) [pdf, other]

Title: Collective LearningAuthors: Francesco FarinaSubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Optimization and Control (math.OC); Machine Learning (stat.ML)
In this paper, we introduce the concept of collective learning (CL) which exploits the notion of collective intelligence in the field of distributed semisupervised learning. The proposed framework draws inspiration from the learning behavior of human beings, who alternate phases involving collaboration, confrontation and exchange of views with other consisting of studying and learning on their own. On this regard, CL comprises two main phases: a selftraining phase in which learning is performed on local private (labeled) data only and a collective training phase in which proxylabels are assigned to shared (unlabeled) data by means of a consensusbased algorithm. In the considered framework, heterogeneous systems can be connected over the same network, each with different computational capabilities and resources and everyone in the network may take advantage of the cooperation and will eventually reach higher performance with respect to those it can reach on its own. An extensive experimental campaign on an image classification problem emphasizes the properties of CL by analyzing the performance achieved by the cooperating agents.
 [40] arXiv:1912.02588 (crosslist from cs.LG) [pdf, other]

Title: Tensor Recovery from Noisy and MultiLevel Quantized MeasurementsSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Higherorder tensors can represent scores in a rating system, frames in a video, and images of the same subject. In practice, the measurements are often highly quantized due to the sampling strategies or the quality of devices. Existing works on tensor recovery have focused on data losses and random noises. Only a few works consider tensor recovery from quantized measurements but are restricted to binary measurements. This paper, for the first time, addresses the problem of tensor recovery from multilevel quantized measurements. Leveraging the lowrank property of the tensor, this paper proposes a nonconvex optimization problem for tensor recovery. We provide a theoretical upper bound of the recovery error, which diminishes to zero when the sizes of dimensions increase to infinity. Our error bound significantly improves over the existing results in onebit tensor recovery and quantized matrix recovery. A tensorbased alternating proximal gradient descent algorithm with a convergence guarantee is proposed to solve the nonconvex problem. Our recovery method can handle data losses and do not need the information of the quantization rule. The method is validated on synthetic data, image datasets, and music recommender datasets.
 [41] arXiv:1912.02590 (crosslist from hepex) [pdf, other]

Title: Machine Learning on sWeighted DataComments: Submitted to Journal of Physics: Conference Series (ACAT2019 proceedings)Subjects: High Energy Physics  Experiment (hepex); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
Data analysis in high energy physics has to deal with data samples produced from different sources. One of the most widely used ways to unfold their contributions is the sPlot technique. It uses the results of a maximum likelihood fit to assign weights to events. Some weights produced by sPlot are by design negative. Negative weights make it difficult to apply machine learning methods. The loss function becomes unbounded. This leads to divergent neural network training. In this paper we propose a mathematically rigorous way to transform the weights obtained by sPlot into class probabilities conditioned on observables, thus enabling to apply any machine learning algorithm outofthebox.
 [42] arXiv:1912.02591 (crosslist from eess.AS) [pdf, other]

Title: Investigating Deep Neural Transformations for Spectrogrambased Musical Source SeparationComments: 8 pages 8 tables 9 figures under reviewing of ECAI 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Machine Learning (stat.ML)
Musical Source Separation (MSS) is a signal processing task that tries to separate the mixed musical signal into each acoustic sound source, such as singing voice or drums. Recently many machine learningbased methods have been proposed for the MSS task, but there were no existing works that evaluate and directly compare various types of networks. In this paper, we aim to design a variety of neural transformation methods, including timeinvariant methods, timefrequency methods, and mixtures of two different transformations. Our experiments provide abundant material for future works by comparing several transformation methods. We train our models on raw complexvalued STFT outputs and achieve stateoftheart SDR performance in the MUSDB18 singing voice separation task by a large margin of 1.0 dB.
 [43] arXiv:1912.02605 (crosslist from cs.LG) [pdf, other]

Title: Towards Understanding Residual and Dilated Dense Neural Networks via Convolutional Sparse CodingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Convolutional neural network (CNN) and its variants have led to many stateofart results in various fields. However, a clear theoretical understanding about them is still lacking. Recently, multilayer convolutional sparse coding (MLCSC) has been proposed and proved to equal such simply stacked networks (plain networks). Here, we think three factors in each layer of it including the initialization, the dictionary design and the number of iterations greatly affect the performance of MLCSC. Inspired by these considerations, we propose two novel multilayer modelsresidual convolutional sparse coding model (ResCSC) and mixedscale dense convolutional sparse coding model (MSDCSC), which have close relationship with the residual neural network (ResNet) and mixedscale (dilated) dense neural network (MSDNet), respectively. Mathematically, we derive the shortcut connection in ResNet as a special case of a new forward propagation rule on MLCSC. We find a theoretical interpretation of the dilated convolution and dense connection in MSDNet by analyzing MSDCSC, which gives a clear mathematical understanding about them. We implement the iterative soft thresholding algorithm (ISTA) and its fast version to solve ResCSC and MSDCSC, which can employ the unfolding operation for further improvements. At last, extensive numerical experiments and comparison with competing methods demonstrate their effectiveness using three typical datasets.
 [44] arXiv:1912.02606 (crosslist from eess.AS) [pdf, other]

Title: Predominant Musical Instrument Classification based on Spectral FeaturesComments: 9 Pages, 5 figuresSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
This work aims to examine one of the cornerstone problems of Musical Instrument Recognition, in particular instrument classification. IRMAS (Instrument recognition in Musical Audio Signals) data set is chosen. The data includes music obtained from various decades in the last century, thus having a wide variety in audio quality. We have presented a very concise summary of past work in this domain. Having implemented various supervised learning algorithms for this classification task, SVM classifier has outperformed the other stateoftheart models with an accuracy of 79%. The classifier had a major challenge distinguishing between flute and organ. We also implemented Unsupervised techniques out of which Hierarchical Clustering has performed well. We have included most of the code (jupyter notebook) for easy reproducibility.
 [45] arXiv:1912.02610 (crosslist from eess.AS) [pdf, other]

Title: Bimodal Speech Emotion Recognition Using PreTrained Language ModelsComments: LifeLong Learning for Spoken Language Systems ASRU 2019Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Speech emotion recognition is a challenging task and an important step towards more natural humanmachine interaction. We show that pretrained language models can be finetuned for text emotion recognition, achieving an accuracy of 69.5% on Task 4A of SemEval 2017, improving upon the previous state of the art by over 3% absolute. We combine these language models with speech emotion recognition, achieving results of 73.5% accuracy when using provided transcriptions and speech data on a subset of four classes of the IEMOCAP dataset. The use of noiseinduced transcriptions and speech data results in an accuracy of 71.4%. For our experiments, we created IEmoNet, a modular and adaptable bimodal framework for speech emotion recognition based on pretrained language models. Lastly, we discuss the idea of using an emotional classifier as a reward for reinforcement learning as a step towards more successful and convenient humanmachine interaction.
 [46] arXiv:1912.02613 (crosslist from eess.AS) [pdf, other]

Title: Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational AutoencodersSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on nonparallel corpora, accommodates manytomany conversion, and leverages recent advances of variational autoencoders. It employs separate encoders to learn disentangled latent representations of singer identity and vocal technique separately, with a joint decoder for reconstruction. Conversion is carried out by simple vector arithmetic in the learned latent spaces. Both a quantitative analysis as well as a visualization of the converted spectrograms show that our model is able to disentangle singer identity and vocal technique and successfully perform conversion of these attributes. To the best of our knowledge, this is the first work to jointly tackle conversion of singer identity and vocal technique based on a deep learning approach.
 [47] arXiv:1912.02615 (crosslist from eess.AS) [pdf, other]

Title: Audiovisual Transformer Architectures for LargeScale Classification and Synchronization of Weakly Labeled Audio EventsJournalref: Proceedings of the 27th ACM International Conference on Multimedia (MM '19). ACM, New York, NY, USA, 19611969Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
We tackle the task of environmental event classification by drawing inspiration from the transformer neural network architecture used in machine translation. We modify this attentionbased feedforward structure in such a way that allows the resulting model to use audio as well as video to compute sound event predictions. We perform extensive experiments with these adapted transformers on an audiovisual data set, obtained by appending relevant visual information to an existing largescale weakly labeled audio collection. The employed multilabel data contains cliplevel annotation indicating the presence or absence of 17 classes of environmental sounds, and does not include temporal information. We show that the proposed modified transformers strongly improve upon previously introduced models and in fact achieve stateoftheart results. We also make a compelling case for devoting more attention to research in multimodal audiovisual classification by proving the usefulness of visual information for the task at hand,namely audio event recognition. In addition, we visualize internal attention patterns of the audiovisual transformers and in doing so demonstrate their potential for performing multimodal synchronization.
 [48] arXiv:1912.02624 (crosslist from cs.LG) [pdf, other]

Title: Learning Efficient Representation for Intrinsic MotivationSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
Mutual Information between agent Actions and environment States (MIAS) quantifies the influence of agent on its environment. Recently, it was found that the maximization of MIAS can be used as an intrinsic motivation for artificial agents. In literature, the term empowerment is used to represent the maximum of MIAS at a certain state. While empowerment has been shown to solve a broad range of reinforcement learning problems, its calculation in arbitrary dynamics is a challenging problem because it relies on the estimation of mutual information. Existing approaches, which rely on sampling, are limited to low dimensional spaces, because highconfidence distributionfree lower bounds for mutual information require exponential number of samples. In this work, we develop a novel approach for the estimation of empowerment in unknown dynamics from visual observation only, without the need to sample for MIAS. The core idea is to represent the relation between action sequences and future states using a stochastic dynamic model in latent space with a specific form. This allows us to efficiently compute empowerment with the "WaterFilling" algorithm from information theory. We construct this embedding with deep neural networks trained on a sophisticated objective function. Our experimental results show that the designed embedding preserves informationtheoretic properties of the original dynamics.
 [49] arXiv:1912.02628 (crosslist from cs.LG) [pdf, ps, other]

Title: Fundamental Limitations in Sequential Prediction and Recursive Algorithms: $\mathcal{L}_{p}$ Bounds via an Entropic AnalysisComments: arXiv admin note: substantial text overlap with arXiv:1910.06742Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Statistics Theory (math.ST); Machine Learning (stat.ML)
In this paper, we obtain fundamental $\mathcal{L}_{p}$ bounds in sequential prediction and recursive algorithms via an entropic analysis. Both classes of problems are examined by investigating the underlying entropic relationships of the data and/or noises involved, and the derived lower bounds may all be quantified in a conditional entropy characterization. We also study the conditions to achieve the generic bounds from an innovations' viewpoint.
 [50] arXiv:1912.02631 (crosslist from cs.LG) [pdf, ps, other]

Title: Trident: Efficient 4PC Framework for Privacy Preserving Machine LearningComments: To appear in 26th Annual Network and Distributed System Security Symposium (NDSS) 2020Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Machine learning has started to be deployed in fields such as healthcare and finance, which propelled the need for and growth of privacypreserving machine learning (PPML). We propose an actively secure fourparty protocol (4PC), and a framework for PPML, showcasing its applications on four of the most widelyknown machine learning algorithms  Linear Regression, Logistic Regression, Neural Networks, and Convolutional Neural Networks.
Our 4PC protocol tolerating at most one malicious corruption is practically efficient as compared to the existing works. We use the protocol to build an efficient mixedworld framework (Trident) to switch between the Arithmetic, Boolean, and Garbled worlds. Our framework operates in the offlineonline paradigm over rings and is instantiated in an outsourced setting for machine learning. Also, we propose conversions especially relevant to privacypreserving machine learning.
The highlights of our framework include using a minimal number of expensive circuits overall as compared to ABY3. This can be seen in our technique for truncation, which does not affect the online cost of multiplication and removes the need for any circuits in the offline phase. Our B2A conversion has an improvement of $\mathbf{7} \times$ in rounds and $\mathbf{18} \times$ in the communication complexity. In addition to these, all of the special conversions for machine learning, e.g. Secure Comparison, achieve constant round complexity.
The practicality of our framework is argued through improvements in the benchmarking of the aforementioned algorithms when compared with ABY3. All the protocols are implemented over a 64bit ring in both LAN and WAN settings. Our improvements go up to $\mathbf{187} \times$ for the training phase and $\mathbf{158} \times$ for the prediction phase when observed over LAN and WAN.  [51] arXiv:1912.02641 (crosslist from cs.LG) [pdf, other]

Title: Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process RegressionComments: 34th AAAI Conference on Artificial Intelligence (AAAI 2020), Extended version with derivations, 12 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest model evidence, our proposed VBKS algorithm considers the kernel as a random variable and learns its belief from data such that the uncertainty of the kernel can be interpreted and exploited to avoid overconfident GP predictions. To achieve this, we represent the probabilistic kernel as an additional variational variable in a variational inference (VI) framework for SGPR models where its posterior belief is learned together with that of the other variational variables (i.e., inducing variables and kernel hyperparameters). In particular, we transform the discrete kernel belief into a continuous parametric distribution via reparameterization in order to apply VI. Though it is computationally challenging to jointly optimize a large number of hyperparameters due to many kernels being evaluated simultaneously by our VBKS algorithm, we show that the variational lower bound of the logmarginal likelihood can be decomposed into an additive form such that each additive term depends only on a disjoint subset of the variational variables and can thus be optimized independently. Stochastic optimization is then used to maximize the variational lower bound by iteratively improving the variational approximation of the exact posterior belief via stochastic gradient ascent, which incurs constant time per iteration and hence scales to big data. We empirically evaluate the performance of our VBKS algorithm on synthetic and massive realworld datasets.
 [52] arXiv:1912.02686 (crosslist from cs.LG) [pdf, ps, other]

Title: Binarized Canonical Polyadic Decomposition for Knowledge Graph CompletionComments: arXiv admin note: substantial text overlap with arXiv:1902.02970Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Methods based on vector embeddings of knowledge graphs have been actively pursued as a promising approach to knowledge graph completion.However, embedding models generate storageinefficient representations, particularly when the number of entities and relations, and the dimensionality of the realvalued embedding vectors are large. We present a binarized CANDECOMP/PARAFAC(CP) decomposition algorithm, which we refer to as BCP, where realvalued parameters are replaced by binary values to reduce model size. Moreover, we show that a fast score computation technique can be developed with bitwise operations. We prove that BCP is fully expressive by deriving a bound on the size of its embeddings. Experimental results on several benchmark datasets demonstrate that the proposed method successfully reduces model size by more than an order of magnitude while maintaining task performance at the same level as the realvalued CP model.
 [53] arXiv:1912.02696 (crosslist from cs.LG) [pdf, other]

Title: Optimizing NormBounded Weighted Ambiguity Sets for Robust MDPsComments: arXiv admin note: substantial text overlap with arXiv:1910.10786Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Optimal policies in Markov decision processes (MDPs) are very sensitive to model misspecification. This raises serious concerns about deploying them in highstake domains. Robust MDPs (RMDP) provide a promising framework to mitigate vulnerabilities by computing policies with worstcase guarantees in reinforcement learning. The solution quality of an RMDP depends on the ambiguity set, which is a quantification of model uncertainties. In this paper, we propose a new approach for optimizing the shape of the ambiguity sets for RMDPs. Our method departs from the conventional idea of constructing a normbounded uniform and symmetric ambiguity set. We instead argue that the structure of a nearoptimal ambiguity set is problem specific. Our proposed method computes a weight parameter from the value functions, and these weights then drive the shape of the ambiguity sets. Our theoretical analysis demonstrates the rationale of the proposed idea. We apply our method to several different problem domains, and the empirical results further furnish the practical promise of weighted nearoptimal ambiguity sets.
 [54] arXiv:1912.02703 (crosslist from cs.LG) [pdf]

Title: SelfSupervised Contextual Language Representation of Radiology Reports to Improve the Identification of Communication UrgencyComments: Accepted in AMIA 2020 Informatics SummitSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Machine learning methods have recently achieved highperformance in biomedical text analysis. However, a major bottleneck in the widespread application of these methods is obtaining the required large amounts of annotated training data, which is resource intensive and time consuming. Recent progress in selfsupervised learning has shown promise in leveraging large text corpora without explicit annotations. In this work, we built a selfsupervised contextual language representation model using BERT, a deep bidirectional transformer architecture, to identify radiology reports requiring prompt communication to the referring physicians. We pretrained the BERT model on a large unlabeled corpus of radiology reports and used the resulting contextual representations in a final text classifier for communication urgency. Our model achieved a precision of 97.0%, recall of 93.3%, and Fmeasure of 95.1% on an independent test set in identifying radiology reports for prompt communication, and significantly outperformed the previous stateoftheart model based on word2vec representations.
 [55] arXiv:1912.02707 (crosslist from cs.CV) [pdf, other]

Title: A Novel Hybrid Scheme Using Genetic Algorithms and Deep Learning for the Reconstruction of Portuguese Tile PanelsJournalref: ACM Genetic and Evolutionary Computation Conference (GECCO), pages 13191327, Prague, Czech Republic, July 2019Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
This paper presents a novel scheme, based on a unique combination of genetic algorithms (GAs) and deep learning (DL), for the automatic reconstruction of Portuguese tile panels, a challenging realworld variant of the jigsaw puzzle problem (JPP) with important national heritage implications. Specifically, we introduce an enhanced GAbased puzzle solver, whose integration with a novel DLbased compatibility measure (DLCM) yields stateoftheart performance, regarding the above application. Current compatibility measures consider typically (the chromatic information of) edge pixels (between adjacent tiles), and help achieve high accuracy for the synthetic JPP variant. However, such measures exhibit rather poor performance when applied to the Portuguese tile panels, which are susceptible to various realworld effects, e.g., monochromatic panels, nonsquared tiles, edge degradation, etc. To overcome such difficulties, we have developed a novel DLCM to extract highlevel texture/color statistics from the entire tile information.
Integrating this measure with our enhanced GAbased puzzle solver, we have demonstrated, for the first time, how to deal most effectively with largescale realworld problems, such as the Portuguese tile problem. Specifically, we have achieved 82% accuracy for the reconstruction of Portuguese tile panels with unknown piece rotation and puzzle dimension (compared to merely 3.5% average accuracy achieved by the best method known for solving this problem variant). The proposed method outperforms even human experts in several cases, correcting their mistakes in the manual tile assembly.  [56] arXiv:1912.02714 (crosslist from cs.LG) [pdf, other]

Title: Inferring the Optimal Policy using Markov Chain Monte CarloSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
This paper investigates methods for estimating the optimal stochastic control policy for a Markov Decision Process with unknown transition dynamics and an unknown reward function. This form of modelfree reinforcement learning comprises many real world systems such as playing video games, simulated control tasks, and real robot locomotion. Existing methods for estimating the optimal stochastic control policy rely on high variance estimates of the policy descent. However, these methods are not guaranteed to find the optimal stochastic policy, and the high variance gradient estimates make convergence unstable. In order to resolve these problems, we propose a technique using Markov Chain Monte Carlo to generate samples from the posterior distribution of the parameters conditioned on being optimal. Our method provably converges to the globally optimal stochastic policy, and empirically similar variance compared to the policy gradient.
 [57] arXiv:1912.02729 (crosslist from condmat.disnn) [pdf, ps, other]

Title: Rademacher complexity and spin glasses: A link between the replica and statistical theories of learningComments: 15 + 10 pagesSubjects: Disordered Systems and Neural Networks (condmat.disnn); Statistical Mechanics (condmat.statmech); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Statistical learning theory provides bounds of the generalization gap, using in particular the VapnikChervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple syntheticdata models. Here we discuss the connections between these approaches and focus on the link between the Rademacher complexity in statistical learning and the theories of generalization for typicalcase synthetic models from statistical physics, involving quantities known as Gardner capacity and ground state energy. We show that in these models the Rademacher complexity is closely related to the ground state energy computed by replica theories. Using this connection, one may reinterpret many results of the literature as rigorous Rademacher bounds in a variety of models in the highdimensional statistics limit. Somewhat surprisingly, we also show that statistical learning theory provides predictions for the behavior of the groundstate energies in some full replica symmetry breaking models.
 [58] arXiv:1912.02765 (crosslist from cs.LG) [pdf, other]

Title: On the Sample Complexity of Learning SumProduct NetworksSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
SumProduct Networks (SPNs) can be regarded as a form of deep graphical models that compactly represent deeply factored and mixed distributions. An SPN is a rooted directed acyclic graph (DAG) consisting of a set of leaves (corresponding to base distributions), a set of sum nodes (which represent mixtures of their children distributions) and a set of product nodes (representing the products of its children distributions).
In this work, we initiate the study of the sample complexity of PAClearning the set of distributions that correspond to SPNs. We show that the sample complexity of learning tree structured SPNs with the usual type of leaves (i.e., Gaussian or discrete) grows at most linearly (up to logarithmic factors) with the number of parameters of the SPN. More specifically, we show that the class of distributions that corresponds to tree structured Gaussian SPNs with $k$ mixing weights and $e$ ($d$dimensional Gaussian) leaves can be learned within Total Variation error $\epsilon$ using at most $\widetilde{O}(\frac{ed^2+k}{\epsilon^2})$ samples. A similar result holds for tree structured SPNs with discrete leaves.
We obtain the upper bounds based on the recently proposed notion of distribution compression schemes. More specifically, we show that if a (base) class of distributions $\mathcal{F}$ admits an "efficient" compression, then the class of tree structured SPNs with leaves from $\mathcal{F}$ also admits an efficient compression.  [59] arXiv:1912.02794 (crosslist from cs.LG) [pdf, other]

Title: Adversarial Risk via Optimal Transport and Optimal CouplingsSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
The accuracy of modern machine learning algorithms deteriorates severely on adversarially manipulated test data. Optimal adversarial risk quantifies the best error rate of any classifier in the presence of adversaries, and optimal adversarial classifiers are sought that minimize adversarial risk. In this paper, we investigate the optimal adversarial risk and optimal adversarial classifiers from an optimal transport perspective. We present a new and simple approach to show that the optimal adversarial risk for binary classification with $01$ loss function is completely characterized by an optimal transport cost between the probability distributions of the two classes, for a suitably defined cost function. We propose a novel coupling strategy that achieves the optimal transport cost for several univariate distributions like Gaussian, uniform and triangular. Using the optimal couplings, we obtain the optimal adversarial classifiers in these settings and show how they differ from optimal classifiers in the absence of adversaries. Based on our analysis, we evaluate algorithmindependent fundamental limits on adversarial risk for CIFAR10, MNIST, FashionMNIST and SVHN datasets, and Gaussian mixtures based on them. In addition to the $01$ loss, we also derive bounds on the deviation of optimal risk and optimal classifier in the presence of adversaries for continuous loss functions, that are based on the convexity and smoothness of the loss functions.
 [60] arXiv:1912.02807 (crosslist from cs.LG) [pdf, other]

Title: Combining QLearning and Search with Amortized Value EstimatesAuthors: Jessica B. Hamrick, Victor Bapst, Alvaro SanchezGonzalez, Tobias Pfaff, Theophane Weber, Lars Buesing, Peter W. BattagliaSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce "Search with Amortized Value Estimates" (SAVE), an approach for combining modelfree Qlearning with modelbased MonteCarlo Tree Search (MCTS). In SAVE, a learned prior over stateaction values is used to guide MCTS, which estimates an improved set of stateaction values. The new Qestimates are then used in combination with real experience to update the prior. This effectively amortizes the value computation performed by MCTS, resulting in a cooperative relationship between modelfree learning and modelbased search. SAVE can be implemented on top of any Qlearning agent with access to a model, which we demonstrate by incorporating it into agents that perform challenging physical reasoning tasks and Atari. SAVE consistently achieves higher rewards with fewer training steps, andin contrast to typical modelbased search approachesyields strong performance with very small search budgets. By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of modelfree learning and the computational cost of planning.
Replacements for Fri, 6 Dec 19
 [61] arXiv:1711.01558 (replaced) [pdf, other]

Title: Wasserstein AutoEncodersComments: Published at ICLR 2018.. Included much wider hyperparameter sweep: in significant improvements in FIDs on CelebASubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [62] arXiv:1806.09178 (replaced) [pdf, other]

Title: Towards A Unified Analysis of Random Fourier FeaturesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [63] arXiv:1902.01073 (replaced) [pdf, other]

Title: Causal Effect Identification from Multiple Incomplete Data Sources: A General Searchbased ApproachComments: 37 pages, 11 figuresSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [64] arXiv:1906.07125 (replaced) [pdf, other]

Title: Replacing the docalculus with Bayes ruleComments: 10 pages, 10 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [65] arXiv:1907.01660 (replaced) [pdf, other]

Title: A flexible EMlike clustering algorithm for noisy dataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [66] arXiv:1910.08032 (replaced) [pdf, other]

Title: Notes on Margin Training and Margin pValues for Deep Neural Network ClassifiersSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [67] arXiv:1910.08880 (replaced) [pdf, other]

Title: Sparse (group) learning with Lipschitz loss functions: a unified analysisAuthors: Antoine DedieuComments: arXiv admin note: text overlap with arXiv:1810.03081Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Other Statistics (stat.OT)
 [68] arXiv:1710.05114 (replaced) [pdf, ps, other]

Title: Deep Learning in a Generalized HJMtype Framework Through ArbitrageFree RegularizationComments: 23 Pages + ReferencesSubjects: Mathematical Finance (qfin.MF); Probability (math.PR); Pricing of Securities (qfin.PR); Machine Learning (stat.ML)
 [69] arXiv:1809.03062 (replaced) [pdf, ps, other]

Title: Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of BlackScholes partial differential equationsSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
 [70] arXiv:1811.08117 (replaced) [pdf, ps, other]

Title: Limited Gradient Descent: Learning With Noisy LabelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [71] arXiv:1812.09707 (replaced) [pdf, other]

Title: Increasing the adversarial robustness and explainability of capsule networks with $γ$capsulesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [72] arXiv:1902.00091 (replaced) [pdf, other]

Title: Neural Networks Predict Fluid Dynamics Solutions from Tiny DatasetsSubjects: Computational Physics (physics.compph); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [73] arXiv:1902.06278 (replaced) [pdf, other]

Title: ODIN: ODEInformed Regression for Parameter and State Inference in TimeContinuous Dynamical SystemsAuthors: Philippe Wenk, Gabriele Abbati, Michael A Osborne, Bernhard Schölkopf, Andreas Krause, Stefan BauerComments: Published at the Thirtyfourth AAAI Conference on Artificial IntelligenceSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
 [74] arXiv:1902.08179 (replaced) [pdf, other]

Title: Online Sampling from LogConcave DistributionsComments: 42 pagesJournalref: NeurIPS 2019Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Probability (math.PR); Computation (stat.CO); Machine Learning (stat.ML)
 [75] arXiv:1904.07482 (replaced) [pdf, other]

Title: ObjectOriented Dynamics Learning through MultiLevel AbstractionComments: Accepted to the ThirthyFourth AAAI Conference On Artificial Intelligence (AAAI), 2020Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [76] arXiv:1904.13262 (replaced) [pdf, other]

Title: Implicit Regularization of Discrete Gradient Dynamics in Linear Neural NetworksComments: 19 pages, to appear in NeurIPS 2019 proceedingsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [77] arXiv:1905.09570 (replaced) [pdf, other]

Title: GravityInspired Graph Autoencoders for Directed Link PredictionComments: ACM International Conference on Information and Knowledge Management (CIKM 2019)Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
 [78] arXiv:1905.12962 (replaced) [pdf, other]

Title: Learning Nonsymmetric Determinantal Point ProcessesComments: NeurIPS 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [79] arXiv:1905.13725 (replaced) [pdf, other]

Title: Are Labels Required for Improving Adversarial Robustness?Authors: Jonathan Uesato, JeanBaptiste Alayrac, PoSen Huang, Robert Stanforth, Alhussein Fawzi, Pushmeet KohliComments: Appears in the ThirtyThird Annual Conference on Neural Information Processing Systems (NeurIPS 2019)Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [80] arXiv:1906.00150 (replaced) [pdf, other]

Title: Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural NetworksComments: 27 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [81] arXiv:1906.00454 (replaced) [pdf, other]

Title: Classification of Crop Tolerance to Heat and Drought: A Deep Convolutional Neural Networks ApproachComments: Won the Best Paper Award of the Second International Workshop on Machine Learning for CyberAgricultural Systems (Ames, IA, USA). One of the winning solutions to the 2019 INFORMS Syngenta Crop Challenge. Presented at 2019 INFORMS Conference on Business Analytics and Operations Research (Austin, TX, USA). Published in the Agronomy JournalJournalref: Agronomy 2019, 9, 833Subjects: Machine Learning (cs.LG); Quantitative Methods (qbio.QM); Applications (stat.AP); Machine Learning (stat.ML)
 [82] arXiv:1906.01687 (replaced) [pdf, other]

Title: Stochastic Gradients for LargeScale Tensor DecompositionSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [83] arXiv:1906.02629 (replaced) [pdf, other]

Title: When Does Label Smoothing Help?Comments: Accepted at NeurIPS 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [84] arXiv:1906.04279 (replaced) [pdf, other]

Title: Exploration via Hindsight Goal GenerationComments: Thirtythird Conference on Neural Information Processing Systems (NeurIPS 2019)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [85] arXiv:1906.10720 (replaced) [pdf, other]

Title: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamicsComments: Presented at NeurIPS 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [86] arXiv:1907.11452 (replaced) [pdf, other]

Title: An Informationtheoretic Online Learning Principle for Specialization in Hierarchical DecisionMaking SystemsSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
 [87] arXiv:1908.01580 (replaced) [pdf, other]

Title: The HSIC Bottleneck: Deep Learning without BackPropagationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [88] arXiv:1908.09092 (replaced) [pdf, other]

Title: Fairness Warnings and FairMAML: Learning Fairly with Minimal DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [89] arXiv:1909.05447 (replaced) [pdf, other]

Title: Feature Engineering and Forecasting via Derivativefree Optimization and Ensemble of Sequencetosequence Networks with Applications in Renewable EnergySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [90] arXiv:1909.11201 (replaced) [pdf, other]

Title: Matrix Sketching for Secure Collaborative Machine LearningAuthors: Shusen WangSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [91] arXiv:1910.04930 (replaced) [pdf, other]

Title: Random Quadratic Forms with Dependence: Applications to Restricted Isometry and BeyondSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [92] arXiv:1910.05933 (replaced) [pdf]

Title: DISCERN: Diversitybased Selection of Centroids for kEstimation and Rapid Nonstochastic ClusteringComments: under reviewSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [93] arXiv:1910.12824 (replaced) [pdf, other]

Title: Neural Architecture Evolution in Deep Reinforcement Learning for Continuous ControlComments: NeurIPS 2019 MetaLearn WorkshopSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [94] arXiv:1910.12827 (replaced) [pdf, other]

Title: Entity Abstraction in Visual ModelBased Reinforcement LearningAuthors: Rishi Veerapaneni, John D. CoReyes, Michael Chang, Michael Janner, Chelsea Finn, Jiajun Wu, Joshua B. Tenenbaum, Sergey LevineComments: Accepted at CoRL 2019Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [95] arXiv:1910.13003 (replaced) [pdf, other]

Title: Neural Similarity LearningComments: NeurIPS 2019Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [96] arXiv:1911.00630 (replaced) [pdf, other]

Title: Predicting Weather Uncertainty with Deep ConvnetsAuthors: Peter Grönquist, Tal BenNun, Nikoli Dryden, Peter Dueben, Luca Lavarini, Shigang Li, Torsten HoeflerComments: Poster presentation at NeurIPS2019 "Machine Learning and the Physical Sciences" WorkshopSubjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.aoph); Machine Learning (stat.ML)
 [97] arXiv:1911.05954 (replaced) [pdf, other]

Title: Hierarchical Graph Pooling with Structure LearningComments: Accepted to AAAI2020; Code is available at this https URL; Corrected typosSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [98] arXiv:1911.07819 (replaced) [pdf, other]

Title: Drug Repurposing for Cancer: An NLP Approach to Identify LowCost TherapiesAuthors: Shivashankar Subramanian, Ioana Baldini, Sushma Ravichandran, Dmitriy A. KatzRogozhnikov, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Kush R. Varshney, Annmarie Wang, Pradeep Mangalath, Laura B. KleimanSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [99] arXiv:1911.12607 (replaced) [pdf, ps, other]

Title: The Weighted Tsetlin Machine: Compressed Representations with Weighted ClausesComments: 15 pages, 10 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [100] arXiv:1911.13162 (replaced) [pdf, other]

Title: Deep autofocus with conebeam CT consistency constraintAuthors: Alexander Preuhs, Michael Manhart, Philipp Roser, Bernhard Stimpel, Christopher Syben, Marios Psychogios, Markus Kowarschik, Andreas MaierComments: Accepted at BVM 2020, review score under Top6 of the conferenceSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
 [101] arXiv:1911.13270 (replaced) [pdf, other]

Title: Transflow Learning: Repurposing Flow Models Without RetrainingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [102] arXiv:1912.00183 (replaced) [pdf, other]

Title: [Re] Learning to Learn By SelfCritiqueSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [103] arXiv:1912.00350 (replaced) [pdf, ps, other]

Title: Online Knowledge Distillation with Diverse PeersComments: Accepted to AAAI2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [104] arXiv:1912.01139 (replaced) [pdf, other]
 [105] arXiv:1912.01192 (replaced) [pdf, ps, other]

Title: Learning Adversarial MDPs with Bandit Feedback and Unknown TransitionComments: 14 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 1912, contact, help (Access key information)