We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 123 entries: 1-123 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 2 Mar 21

[1]  arXiv:2103.00025 [pdf, ps, other]
Title: TEC: Tensor Ensemble Classifier for Big Data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Tensor (multidimensional array) classification problem has become very popular in modern applications such as image recognition and high dimensional spatio-temporal data analysis. Support Tensor Machine (STM) classifier, which is extended from the support vector machine, takes CANDECOMP / Parafac (CP) form of tensor data as input and predicts the data labels. The distribution-free and statistically consistent properties of STM highlight its potential in successfully handling wide varieties of data applications. Training a STM can be computationally expensive with high-dimensional tensors. However, reducing the size of tensor with a random projection technique can reduce the computational time and cost, making it feasible to handle large size tensors on regular machines. We name an STM estimated with randomly projected tensor as Random Projection-based Support Tensor Machine (RPSTM). In this work, we propose a Tensor Ensemble Classifier (TEC), which aggregates multiple RPSTMs for big tensor classification. TEC utilizes the ensemble idea to minimize the excessive classification risk brought by random projection, providing statistically consistent predictions while taking the computational advantage of RPSTM. Since each RPSTM can be estimated independently, TEC can further take advantage of parallel computing techniques and be more computationally efficient. The theoretical and numerical results demonstrate the decent performance of TEC model in high-dimensional tensor classification problems. The model prediction is statistically consistent as its risk is shown to converge to the optimal Bayes risk. Besides, we highlight the trade-off between the computational cost and the prediction risk for TEC model. The method is validated by extensive simulation and a real data example. We prepare a python package for applying TEC, which is available at our GitHub.

[2]  arXiv:2103.00034 [pdf, other]
Title: Beyond Perturbation Stability: LP Recovery Guarantees for MAP Inference on Noisy Stable Instances
Comments: 25 pages, 2 figures, 2 tables. To appear in AISTATS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Several works have shown that perturbation stable instances of the MAP inference problem in Potts models can be solved exactly using a natural linear programming (LP) relaxation. However, most of these works give few (or no) guarantees for the LP solutions on instances that do not satisfy the relatively strict perturbation stability definitions. In this work, we go beyond these stability results by showing that the LP approximately recovers the MAP solution of a stable instance even after the instance is corrupted by noise. This "noisy stable" model realistically fits with practical MAP inference problems: we design an algorithm for finding "close" stable instances, and show that several real-world instances from computer vision have nearby instances that are perturbation stable. These results suggest a new theoretical explanation for the excellent performance of this LP relaxation in practice.

[3]  arXiv:2103.00083 [pdf, other]
Title: Deep Quantile Aggregation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Conditional quantile estimation is a key statistical learning challenge motivated by the need to quantify uncertainty in predictions or to model a diverse population without being overly reductive. As such, many models have been developed for this problem. Adopting a meta viewpoint, we propose a general framework (inspired by neural network optimization) for aggregating any number of conditional quantile models in order to boost predictive accuracy. We consider weighted ensembling strategies of increasing flexibility where the weights may vary over individual models, quantile levels, and feature values. An appeal of our approach is its portability: we ensure that estimated quantiles at adjacent levels do not cross by applying simple transformations through which gradients can be backpropagated, and this allows us to leverage the modern deep learning toolkit for building quantile ensembles. Our experiments confirm that ensembling can lead to big gains in accuracy, even when the constituent models are themselves powerful and flexible.

[4]  arXiv:2103.00222 [pdf, other]
Title: Variational Laplace for Bayesian neural networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. Variational Laplace performs better on image classification tasks than MAP inference and far better than standard variational inference with stochastic sampling despite using the same mean-field Gaussian approximate posterior. The Variational Laplace objective is simple to evaluate, as it is (in essence) the log-likelihood, plus weight-decay, plus a squared-gradient regularizer. Finally, we emphasise care needed in benchmarking standard VI as there is a risk of stopping before the variance parameters have converged. We show that early-stopping can be avoided by increasing the learning rate for the variance parameters.

[5]  arXiv:2103.00373 [pdf, other]
Title: Communication-efficient Byzantine-robust distributed learning with statistical guarantee
Comments: 34 pages
Subjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Communication efficiency and robustness are two major issues in modern distributed learning framework. This is due to the practical situations where some computing nodes may have limited communication power or may behave adversarial behaviors. To address the two issues simultaneously, this paper develops two communication-efficient and robust distributed learning algorithms for convex problems. Our motivation is based on surrogate likelihood framework and the median and trimmed mean operations. Particularly, the proposed algorithms are provably robust against Byzantine failures, and also achieve optimal statistical rates for strong convex losses and convex (non-smooth) penalties. For typical statistical models such as generalized linear models, our results show that statistical errors dominate optimization errors in finite iterations. Simulated and real data experiments are conducted to demonstrate the numerical performance of our algorithms.

[6]  arXiv:2103.00500 [pdf, other]
Title: Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural Networks
Comments: 33 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We investigate the asymptotic risk of a general class of overparameterized likelihood models, including deep models. The recent empirical success of large-scale models has motivated several theoretical studies to investigate a scenario wherein both the number of samples, $n$, and parameters, $p$, diverge to infinity and derive an asymptotic risk at the limit. However, these theorems are only valid for linear-in-feature models, such as generalized linear regression, kernel regression, and shallow neural networks. Hence, it is difficult to investigate a wider class of nonlinear models, including deep neural networks with three or more layers. In this study, we consider a likelihood maximization problem without the model constraints and analyze the upper bound of an asymptotic risk of an estimator with penalization. Technically, we combine a property of the Fisher information matrix with an extended Marchenko-Pastur law and associate the combination with empirical process techniques. The derived bound is general, as it describes both the double descent and the regularized risk curves, depending on the penalization. Our results are valid without the linear-in-feature constraints on models and allow us to derive the general spectral distributions of a Fisher information matrix from the likelihood. We demonstrate that several explicit models, such as parallel deep neural networks and ensemble learning, are in agreement with our theory. This result indicates that even large and deep models have a small asymptotic risk if they exhibit a specific structure, such as divisibility. To verify this finding, we conduct a real-data experiment with parallel deep neural networks. Our results expand the applicability of the asymptotic risk analysis, and may also contribute to the understanding and application of deep learning.

[7]  arXiv:2103.00654 [pdf, other]
Title: Feedback Coding for Active Learning
Comments: AISTATS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The iterative selection of examples for labeling in active machine learning is conceptually similar to feedback channel coding in information theory: in both tasks, the objective is to seek a minimal sequence of actions to encode information in the presence of noise. While this high-level overlap has been previously noted, there remain open questions on how to best formulate active learning as a communications system to leverage existing analysis and algorithms in feedback coding. In this work, we formally identify and leverage the structural commonalities between the two problems, including the characterization of encoder and noisy channel components, to design a new algorithm. Specifically, we develop an optimal transport-based feedback coding scheme called Approximate Posterior Matching (APM) for the task of active example selection and explore its application to Bayesian logistic regression, a popular model in active learning. We evaluate APM on a variety of datasets and demonstrate learning performance comparable to existing active learning methods, at a reduced computational cost. These results demonstrate the potential of directly deploying concepts from feedback channel coding to design efficient active learning strategies.

[8]  arXiv:2103.00668 [pdf, other]
Title: Learning Proposals for Probabilistic Programs with Inference Combinators
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Programming Languages (cs.PL)

We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernels and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. We demonstrate the flexibility of this framework in applications to advanced variational methods based on amortized Gibbs sampling and annealing.

[9]  arXiv:2103.00684 [pdf, other]
Title: Meta-learning One-class Classifiers with Eigenvalue Solvers for Supervised Anomaly Detection
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Neural network-based anomaly detection methods have shown to achieve high performance. However, they require a large amount of training data for each task. We propose a neural network-based meta-learning method for supervised anomaly detection. The proposed method improves the anomaly detection performance on unseen tasks, which contains a few labeled normal and anomalous instances, by meta-training with various datasets. With a meta-learning framework, quick adaptation to each task and its effective backpropagation are important since the model is trained by the adaptation for each epoch. Our model enables them by formulating adaptation as a generalized eigenvalue problem with one-class classification; its global optimum solution is obtained, and the solver is differentiable. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods on various datasets.

[10]  arXiv:2103.00694 [pdf, other]
Title: Meta-learning representations for clustering with infinite Gaussian mixture models
Authors: Tomoharu Iwata
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

For better clustering performance, appropriate representations are critical. Although many neural network-based metric learning methods have been proposed, they do not directly train neural networks to improve clustering performance. We propose a meta-learning method that train neural networks for obtaining representations such that clustering performance improves when the representations are clustered by the variational Bayesian (VB) inference with an infinite Gaussian mixture model. The proposed method can cluster unseen unlabeled data using knowledge meta-learned with labeled data that are different from the unlabeled data. For the objective function, we propose a continuous approximation of the adjusted Rand index (ARI), by which we can evaluate the clustering performance from soft clustering assignments. Since the approximated ARI and the VB inference procedure are differentiable, we can backpropagate the objective function through the VB inference procedure to train the neural networks. With experiments using text and image data sets, we demonstrate that our proposed method has a higher adjusted Rand index than existing methods do.

[11]  arXiv:2103.00704 [pdf, other]
Title: Privacy-Preserving Distributed SVD via Federated Power
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Singular value decomposition (SVD) is one of the most fundamental tools in machine learning and statistics.The modern machine learning community usually assumes that data come from and belong to small-scale device users. The low communication and computation power of such devices, and the possible privacy breaches of users' sensitive data make the computation of SVD challenging. Federated learning (FL) is a paradigm enabling a large number of devices to jointly learn a model in a communication-efficient way without data sharing. In the FL framework, we develop a class of algorithms called FedPower for the computation of partial SVD in the modern setting. Based on the well-known power method, the local devices alternate between multiple local power iterations and one global aggregation to improve communication efficiency. In the aggregation, we propose to weight each local eigenvector matrix with Orthogonal Procrustes Transformation (OPT). Considering the practical stragglers' effect, the aggregation can be fully participated or partially participated, where for the latter we propose two sampling and aggregation schemes. Further, to ensure strong privacy protection, we add Gaussian noise whenever the communication happens by adopting the notion of differential privacy (DP). We theoretically show the convergence bound for FedPower. The resulting bound is interpretable with each part corresponding to the effect of Gaussian noise, parallelization, and random sampling of devices, respectively. We also conduct experiments to demonstrate the merits of FedPower. In particular, the local iterations not only improve communication efficiency but also reduce the chance of privacy breaches.

[12]  arXiv:2103.00711 [pdf, ps, other]
Title: Panel semiparametric quantile regression neural network for electricity consumption forecasting
Comments: 30
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)

China has made great achievements in electric power industry during the long-term deepening of reform and opening up. However, the complex regional economic, social and natural conditions, electricity resources are not evenly distributed, which accounts for the electricity deficiency in some regions of China. It is desirable to develop a robust electricity forecasting model. Motivated by which, we propose a Panel Semiparametric Quantile Regression Neural Network (PSQRNN) by utilizing the artificial neural network and semiparametric quantile regression. The PSQRNN can explore a potential linear and nonlinear relationships among the variables, interpret the unobserved provincial heterogeneity, and maintain the interpretability of parametric models simultaneously. And the PSQRNN is trained by combining the penalized quantile regression with LASSO, ridge regression and backpropagation algorithm. To evaluate the prediction accuracy, an empirical analysis is conducted to analyze the provincial electricity consumption from 1999 to 2018 in China based on three scenarios. From which, one finds that the PSQRNN model performs better for electricity consumption forecasting by considering the economic and climatic factors. Finally, the provincial electricity consumptions of the next $5$ years (2019-2023) in China are reported by forecasting.

[13]  arXiv:2103.00733 [pdf, ps, other]
Title: The Mathematics Behind Spectral Clustering And The Equivalence To PCA
Authors: T Shen
Comments: 6 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Spectral clustering is a popular algorithm that clusters points using the eigenvalues and eigenvectors of Laplacian matrices derived from the data. For years, spectral clustering has been working mysteriously. This paper explains spectral clustering by dividing it into two categories based on whether the graph Laplacian is fully connected or not. For a fully connected graph, this paper demonstrates the dimension reduction part by offering an objective function: the covariance between the original data points' similarities and the mapped data points' similarities. For a multi-connected graph, this paper proves that with a proper $k$, the first $k$ eigenvectors are the indicators of the connected components. This paper also proves there is an equivalence between spectral embedding and PCA.

[14]  arXiv:2103.01126 [pdf, ps, other]
Title: BERT based patent novelty search by training claims to their own description
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG); Econometrics (econ.EM)

In this paper we present a method to concatenate patent claims to their own description. By applying this method, BERT trains suitable descriptions for claims. Such a trained BERT (claim-to-description- BERT) could be able to identify novelty relevant descriptions for patents. In addition, we introduce a new scoring scheme, relevance scoring or novelty scoring, to process the output of BERT in a meaningful way. We tested the method on patent applications by training BERT on the first claims of patents and corresponding descriptions. BERT's output has been processed according to the relevance score and the results compared with the cited X documents in the search reports. The test showed that BERT has scored some of the cited X documents as highly relevant.

Cross-lists for Tue, 2 Mar 21

[15]  arXiv:2102.13135 (cross-list from math.ST) [pdf, other]
Title: Graph Community Detection from Coarse Measurements: Recovery Conditions for the Coarsened Weighted Stochastic Block Model
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at low-resolution, each measurement integrating across multiple graph nodes. Such low-resolution measurements effectively induce a coarse graph with its own communities. Our objective is to develop conditions on the graph structure, the quantity, and properties of measurements, under which we can recover the community organization in this coarse graph. In this paper, we build on the stochastic block model by mathematically formalizing the coarsening process, and characterizing its impact on the community members and connections. Through this novel setup and modeling, we characterize an error bound for community recovery. The error bound yields simple and closed-form asymptotic conditions to achieve the perfect recovery of the coarse graph communities.

[16]  arXiv:2103.00065 (cross-list from cs.LG) [pdf, other]
Title: Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Comments: To appear in ICLR 2021. 72 pages, 107 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We empirically demonstrate that full-batch gradient descent on neural network training objectives typically operates in a regime we call the Edge of Stability. In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / \text{(step size)}$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long timescales. Since this behavior is inconsistent with several widespread presumptions in the field of optimization, our findings raise questions as to whether these presumptions are relevant to neural network training. We hope that our findings will inspire future efforts aimed at rigorously understanding optimization at the Edge of Stability. Code is available at https://github.com/locuslab/edge-of-stability.

[17]  arXiv:2103.00107 (cross-list from cs.LG) [pdf, other]
Title: Revisiting Peng's Q($λ$) for Modern Reinforcement Learning
Comments: 26 pages, 7 figures, 2 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonetheless, recent studies have shown that non-conservative algorithms empirically outperform conservative ones. Motivated by the empirical results and the lack of theory, we carry out theoretical analyses of Peng's Q($\lambda$), a representative example of non-conservative algorithms. We prove that it also converges to an optimal policy provided that the behavior policy slowly tracks a greedy policy in a way similar to conservative policy iteration. Such a result has been conjectured to be true but has not been proven. We also experiment with Peng's Q($\lambda$) in complex continuous control tasks, confirming that Peng's Q($\lambda$) often outperforms conservative algorithms despite its simplicity. These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

[18]  arXiv:2103.00136 (cross-list from cs.LG) [pdf, other]
Title: Incorporating Causal Graphical Prior Knowledge into Predictive Modeling via Simple Data Augmentation
Comments: 24 pages, 5 figures, 2 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions. When a CG is available, e.g., from the domain knowledge, we can infer the conditional independence (CI) relations that should hold in the data distribution. However, it is not straightforward how to incorporate this knowledge into predictive modeling. In this work, we propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the CI encoded in a CG for supervised machine learning. We theoretically justify the proposed method by providing an excess risk bound indicating that the proposed method suppresses overfitting by reducing the apparent complexity of the predictor hypothesis class. Using real-world data with CGs provided by domain experts, we experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.

[19]  arXiv:2103.00139 (cross-list from cs.LG) [pdf, other]
Title: Scalable Causal Transfer Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. A main limitation of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across the source and target domains based on Markov blanket discovery. SCTL does not require to have prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in low-dimensional and high-dimensional settings.

[20]  arXiv:2103.00299 (cross-list from math.OC) [pdf, ps, other]
Title: Parallel Stochastic Mirror Descent for MDPs
Subjects: Optimization and Control (math.OC); Machine Learning (stat.ML)

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for average-reward MDPs with a generative model. One of the main features of the presented method is low communication costs in a distributed centralized setting.

[21]  arXiv:2103.00349 (cross-list from cs.LG) [pdf, other]
Title: High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Subspaces
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Bayesian optimization (BO) is a powerful paradigm for efficient optimization of black-box objective functions. High-dimensional BO presents a particular challenge, in part because the curse of dimensionality makes it difficult to define as well as do inference over a suitable class of surrogate models. We argue that Gaussian process surrogate models defined on sparse axis-aligned subspaces offer an attractive compromise between flexibility and parsimony. We demonstrate that our approach, which relies on Hamiltonian Monte Carlo for inference, can rapidly identify sparse subspaces relevant to modeling the unknown objective function, enabling sample-efficient high-dimensional BO. In an extensive suite of experiments comparing to existing methods for high-dimensional BO we demonstrate that our algorithm, Sparse Axis-Aligned Subspace BO (SAASBO), achieves excellent performance on several synthetic and real-world problems without the need to set problem-specific hyperparameters.

[22]  arXiv:2103.00381 (cross-list from cs.LG) [pdf, other]
Title: Adversarial Information Bottleneck
Comments: 10 pages,7 figures,2 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The information bottleneck (IB) principle has been adopted to explain deep learning in terms of information compression and prediction, which are balanced by a trade-off hyperparameter. How to optimize the IB principle for better robustness and figure out the effects of compression through the trade-off hyperparameter are two challenging problems. Previous methods attempted to optimize the IB principle by introducing random noise into learning the representation and achieved state-of-the-art performance in the nuisance information compression and semantic information extraction. However, their performance on resisting adversarial perturbations is far less impressive. To this end, we propose an adversarial information bottleneck (AIB) method without any explicit assumptions about the underlying distribution of the representations, which can be optimized effectively by solving a Min-Max optimization problem. Numerical experiments on synthetic and real-world datasets demonstrate its effectiveness on learning more invariant representations and mitigating adversarial perturbations compared to several competing IB methods. In addition, we analyse the adversarial robustness of diverse IB methods contrasting with their IB curves, and reveal that IB models with the hyperparameter $\beta$ corresponding to the knee point in the IB curve achieve the best trade-off between compression and prediction, and has best robustness against various attacks.

[23]  arXiv:2103.00393 (cross-list from cs.LG) [pdf, other]
Title: Hierarchical Inducing Point Gaussian Process for Inter-domain Observations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We examine the general problem of inter-domain Gaussian Processes (GPs): problems where the GP realization and the noisy observations of that realization lie on different domains. When the mapping between those domains is linear, such as integration or differentiation, inference is still closed form. However, many of the scaling and approximation techniques that our community has developed do not apply to this setting. In this work, we introduce the hierarchical inducing point GP (HIP-GP), a scalable inter-domain GP inference method that enables us to improve the approximation accuracy by increasing the number of inducing points to the millions. HIP-GP, which relies on inducing points with grid structure and a stationary kernel assumption, is suitable for low-dimensional problems. In developing HIP-GP, we introduce (1) a fast whitening strategy, and (2) a novel preconditioner for conjugate gradients which can be helpful in general GP settings.

[24]  arXiv:2103.00394 (cross-list from cs.LG) [pdf, ps, other]
Title: Convergence of Gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The Gaussian-smoothed optimal transport (GOT) framework, recently proposed by Goldfeld et al., scales to high dimensions in estimation and provides an alternative to entropy regularization. This paper provides convergence guarantees for estimating the GOT distance under more general settings. For the Gaussian-smoothed $p$-Wasserstein distance in $d$ dimensions, our results require only the existence of a moment greater than $d + 2p$. For the special case of sub-gamma distributions, we quantify the dependence on the dimension $d$ and establish a phase transition with respect to the scale parameter. We also prove convergence for dependent samples, only requiring a condition on the pairwise dependence of the samples measured by the covariance of the feature map of a kernel space.
A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy (MMD) distances with a kernel that depends on the cost function as well as the amount of Gaussian smoothing. This insight provides further interpretability for the GOT framework and also introduces a class of kernel MMD distances with desirable properties. The theoretical results are supported by numerical experiments.

[25]  arXiv:2103.00396 (cross-list from cs.LG) [pdf, other]
Title: A Minimax Probability Machine for Non-Decomposable Performance Measures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Imbalanced classification tasks are widespread in many real-world applications. For such classification tasks, in comparison with the accuracy rate, it is usually much more appropriate to use non-decomposable performance measures such as the Area Under the receiver operating characteristic Curve (AUC) and the $F_\beta$ measure as the classification criterion since the label class is imbalanced. On the other hand, the minimax probability machine is a popular method for binary classification problems and aims at learning a linear classifier by maximizing the accuracy rate, which makes it unsuitable to deal with imbalanced classification tasks. The purpose of this paper is to develop a new minimax probability machine for the $F_\beta$ measure, called MPMF, which can be used to deal with imbalanced classification tasks. A brief discussion is also given on how to extend the MPMF model for several other non-decomposable performance measures listed in the paper. To solve the MPMF model effectively, we derive its equivalent form which can then be solved by an alternating descent method to learn a linear classifier. Further, the kernel trick is employed to derive a nonlinear MPMF model to learn a nonlinear classifier. Several experiments on real-world benchmark datasets demonstrate the effectiveness of our new model.

[26]  arXiv:2103.00445 (cross-list from cs.LG) [pdf, other]
Title: Ensemble Bootstrapping for Q-Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.

[27]  arXiv:2103.00476 (cross-list from cs.NE) [pdf, other]
Title: Optimal Conversion of Conventional Artificial Neural Networks to Spiking Neural Networks
Authors: Shikuang Deng, Shi Gu
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Spiking neural networks (SNNs) are biology-inspired artificial neural networks (ANNs) that comprise of spiking neurons to process asynchronous discrete signals. While more efficient in power consumption and inference speed on the neuromorphic hardware, SNNs are usually difficult to train directly from scratch with spikes due to the discreteness. As an alternative, many efforts have been devoted to converting conventional ANNs into SNNs by copying the weights from ANNs and adjusting the spiking threshold potential of neurons in SNNs. Researchers have designed new SNN architectures and conversion algorithms to diminish the conversion error. However, an effective conversion should address the difference between the SNN and ANN architectures with an efficient approximation \DSK{of} the loss function, which is missing in the field. In this work, we analyze the conversion error by recursive reduction to layer-wise summation and propose a novel strategic pipeline that transfers the weights to the target SNN by combining threshold balance and soft-reset mechanisms. This pipeline enables almost no accuracy loss between the converted SNNs and conventional ANNs with only $\sim1/10$ of the typical SNN simulation time. Our method is promising to get implanted onto embedded platforms with better support of SNNs with limited energy and memory.

[28]  arXiv:2103.00486 (cross-list from cs.SI) [pdf, other]
Title: Community Detection in Weighted Multilayer Networks with Ambient Noise
Comments: 27 pages
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)

We introduce a novel class of stochastic blockmodel for multilayer weighted networks that accounts for the presence of a global ambient noise that governs between-block interactions. In weighted multilayer noisy networks, a natural hierarchy is induced by the assumption of all but one block being under the `umbrella' of signal, while a single block is classified as exhibiting the same characteristics as interstitial interactions and subsumed into ambient noise. We use hierarchical variational inference to jointly detect block-structures and differentiate the communities' local signals from global noise. We propose a novel algorithm to find clusters in Gaussian weighted multilayer networks with a focus on these principles called Stochastic Block (with) Ambient Noise Model (SBANM).We apply this method to several different domains. We focus on the Philadelphia Neurodevelopmental Cohort to discover communities of subjects that form diagnostic categories relating psychopathological symptoms to psychosis.

[29]  arXiv:2103.00502 (cross-list from cs.LG) [pdf, other]
Title: Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width $\mathcal{O}\big(\max\{d\lfloor N^{1/d}\rfloor,\, N+2\}\big)$ and depth $\mathcal{O}(L)$ can approximate a H\"older continuous function on $[0,1]^d$ with an approximation rate $\mathcal{O}\big(\lambda\sqrt{d} (N^2L^2\ln N)^{-\alpha/d}\big)$, where $\alpha\in (0,1]$ and $\lambda>0$ are H\"older order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function $f$ on $[0,1]^d$, the approximation rate becomes $\mathcal{O}\big(\,\sqrt{d}\,\omega_f\big( (N^2L^2\ln N)^{-1/d}\big)\,\big)$, where $\omega_f(\cdot)$ is the modulus of continuity. We also extend our analysis to any continuous function $f$ on a bounded set. Particularly, if ReLU networks with depth $31$ and width $\mathcal{O}(N)$ are used to approximate one-dimensional Lipschitz continuous functions on $[0,1]$ with a Lipschitz constant $\lambda>0$, the approximation rate in terms of the total number of parameters, $W=\mathcal{O}(N^2)$, becomes $\mathcal{O}(\tfrac{\lambda}{W\ln W})$, which has not been discovered in the literature for fixed-depth ReLU networks.

[30]  arXiv:2103.00580 (cross-list from stat.ME) [pdf, other]
Title: A Stein Goodness of fit Test for Exponential Random Graph Models
Journal-ref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

We propose and analyse a novel nonparametric goodness of fit testing procedure for exchangeable exponential random graph models (ERGMs) when a single network realisation is observed. The test determines how likely it is that the observation is generated from a target unnormalised ERGM density. Our test statistics are derived from a kernel Stein discrepancy, a divergence constructed via Steins method using functions in a reproducing kernel Hilbert space, combined with a discrete Stein operator for ERGMs. The test is a Monte Carlo test based on simulated networks from the target ERGM. We show theoretical properties for the testing procedure for a class of ERGMs. Simulation studies and real network applications are presented.

[31]  arXiv:2103.00674 (cross-list from stat.ME) [pdf, other]
Title: BEAUTY Powered BEAST
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

We study inference about the uniform distribution with the proposed binary expansion approximation of uniformity (BEAUTY) approach. Through an extension of the celebrated Euler's formula, we approximate the characteristic function of any copula distribution with a linear combination of means of binary interactions from marginal binary expansions. This novel characterization enables a unification of many important existing tests through an approximation from some quadratic form of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a uniformly high power, we study test statistics with data-adaptive weights through an oracle approach, referred to as the binary expansion adaptive symmetry test (BEAST). By utilizing the properties of the binary expansion filtration, we show that the Neyman-Pearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle leads all existing tests we considered in empirical power against all complex forms of alternatives. This oracle therefore sheds light on the potential of substantial improvements in power and on the form of optimal weights under each alternative. By approximating this oracle with data-adaptive weights, we develop the BEAST that improves the empirical power of many existing tests against a wide spectrum of common alternatives while providing clear interpretation of the form of non-uniformity upon rejection. We illustrate the BEAST with a study of the relationship between the location and brightness of stars.

[32]  arXiv:2103.00719 (cross-list from cs.LG) [pdf, ps, other]
Title: LocalDrop: A Hybrid Regularization for Deep Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a two-stage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances.

[33]  arXiv:2103.00959 (cross-list from cs.SI) [pdf, other]
Title: CogDL: An Extensive Toolkit for Deep Learning on Graphs
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph representation learning aims to learn low-dimensional node embeddings for graphs. It is used in several real-world applications such as social network analysis and large-scale recommender systems. In this paper, we introduce CogDL, an extensive research toolkit for deep learning on graphs that allows researchers and developers to easily conduct experiments and build applications. It provides standard training and evaluation for the most important tasks in the graph domain, including node classification, link prediction, graph classification, and other graph tasks. For each task, it offers implementations of state-of-the-art models. The models in our toolkit are divided into two major parts, graph embedding methods and graph neural networks. Most of the graph embedding methods learn node-level or graph-level representations in an unsupervised way and preserves the graph properties such as structural information, while graph neural networks capture node features and work in semi-supervised or self-supervised settings. All models implemented in our toolkit can be easily reproducible for leaderboard results. Most models in CogDL are developed on top of PyTorch, and users can leverage the advantages of PyTorch to implement their own models. Furthermore, we demonstrate the effectiveness of CogDL for real-world applications in AMiner, which is a large academic database and system.

[34]  arXiv:2103.00988 (cross-list from cs.LG) [pdf, other]
Title: Moment-Based Variational Inference for Stochastic Differential Equations
Comments: Appearing in Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Existing deterministic variational inference approaches for diffusion processes use simple proposals and target the marginal density of the posterior. We construct the variational process as a controlled version of the prior process and approximate the posterior by a set of moment functions. In combination with moment closure, the smoothing problem is reduced to a deterministic optimal control problem. Exploiting the path-wise Fisher information, we propose an optimization procedure that corresponds to a natural gradient descent in the variational parameters. Our approach allows for richer variational approximations that extend to state-dependent diffusion terms. The classical Gaussian process approximation is recovered as a special case.

[35]  arXiv:2103.01030 (cross-list from cs.LG) [pdf, other]
Title: An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over Simulations
Authors: Justin Domke
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

It is important to estimate the errors of probabilistic inference algorithms. Existing diagnostics for Markov chain Monte Carlo methods assume inference is asymptotically exact, and are not appropriate for approximate methods like variational inference or Laplace's method. This paper introduces a diagnostic based on repeatedly simulating datasets from the prior and performing inference on each. The central observation is that it is possible to estimate a symmetric KL-divergence defined over these simulations.

[36]  arXiv:2103.01043 (cross-list from cs.LG) [pdf, other]
Title: Persistent Message Passing
Comments: 7 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Graph neural networks (GNNs) are a powerful inductive bias for modelling algorithmic reasoning procedures and data structures. Their prowess was mainly demonstrated on tasks featuring Markovian dynamics, where querying any associated data structure depends only on its latest state. For many tasks of interest, however, it may be highly beneficial to support efficient data structure queries dependent on previous states. This requires tracking the data structure's evolution through time, placing significant pressure on the GNN's latent representations. We introduce Persistent Message Passing (PMP), a mechanism which endows GNNs with capability of querying past state by explicitly persisting it: rather than overwriting node representations, it creates new nodes whenever required. PMP generalises out-of-distribution to more than 2x larger test inputs on dynamic temporal range queries, significantly outperforming GNNs which overwrite states.

[37]  arXiv:2103.01085 (cross-list from cs.LG) [pdf, other]
Title: Challenges and Opportunities in High-dimensional Variational Inference
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We explore the limitations of and best practices for using black-box variational inference to estimate posterior summaries of the model parameters. By taking an importance sampling perspective, we are able to explain and empirically demonstrate: 1) why the intuitions about the behavior of approximate families and divergences for low-dimensional posteriors fail for higher-dimensional posteriors, 2) how we can diagnose the pre-asymptotic reliability of variational inference in practice by examining the behavior of the density ratios (i.e., importance weights), 3) why the choice of variational objective is not as relevant for higher-dimensional posteriors, and 4) why, although flexible variational families can provide some benefits in higher dimensions, they also introduce additional optimization challenges. Based on these findings, for high-dimensional posteriors we recommend using the exclusive KL divergence that is most stable and easiest to optimize, and then focusing on improving the variational family or using model parameter transformations to make the posterior more similar to the approximating family. Our results also show that in low to moderate dimensions, heavy-tailed variational families and mass-covering divergences can increase the chances that the approximation can be improved by importance sampling.

[38]  arXiv:2103.01148 (cross-list from cs.LG) [pdf, other]
Title: Class Means as an Early Exit Decision Mechanism
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

State-of-the-art neural networks with early exit mechanisms often need considerable amount of training and fine-tuning to achieve good performance with low computational cost. We propose a novel early exit technique based on the class means of samples. Unlike most existing schemes, our method does not require gradient-based training of internal classifiers. This makes our method particularly useful for neural network training in low-power devices, as in wireless edge networks. In particular, given a fixed training time budget, our scheme achieves higher accuracy as compared to existing early exit mechanisms. Moreover, if there are no limitations on the training time budget, our method can be combined with an existing early exit scheme to boost its performance, achieving a better trade-off between computational cost and network accuracy.

[39]  arXiv:2103.01197 (cross-list from cs.LG) [pdf, other]
Title: Coordination Among Neural Modules Through a Shared Global Workspace
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For example, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions; object-centric architectures make use of graph neural networks to model interactions among entities. However, pairwise interactions may not achieve global coordination or a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.

[40]  arXiv:2103.01201 (cross-list from econ.EM) [pdf, other]
Title: Can Machine Learning Catch the COVID-19 Recession?
Subjects: Econometrics (econ.EM); Applications (stat.AP); Machine Learning (stat.ML)

Based on evidence gathered from a newly built large macroeconomic data set for the UK, labeled UK-MD and comparable to similar datasets for the US and Canada, it seems the most promising avenue for forecasting during the pandemic is to allow for general forms of nonlinearity by using machine learning (ML) methods. But not all nonlinear ML methods are alike. For instance, some do not allow to extrapolate (like regular trees and forests) and some do (when complemented with linear dynamic components). This and other crucial aspects of ML-based forecasting in unprecedented times are studied in an extensive pseudo-out-of-sample exercise.

[41]  arXiv:2103.01210 (cross-list from cs.LG) [pdf, ps, other]
Title: Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

Replacements for Tue, 2 Mar 21

[42]  arXiv:1806.02935 (replaced) [pdf, other]
Title: Causal effects based on distributional distances
Comments: 46 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[43]  arXiv:1905.05022 (replaced) [pdf, other]
Title: Inferring Hierarchical Mixture Structures:A Bayesian Nonparametric Approach
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[44]  arXiv:1910.03834 (replaced) [pdf, other]
Title: Estimating Density Models with Truncation Boundaries using Score Matching
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[45]  arXiv:1912.04439 (replaced) [pdf, other]
Title: Privacy-preserving data sharing via probabilistic modelling
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[46]  arXiv:1912.10708 (replaced) [pdf]
Title: Recreation of the Periodic Table with an Unsupervised Machine Learning Algorithm
Comments: 28 pages, 14 figures, complete version of this paper is available at this https URL (Published: 26 February 2021)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[47]  arXiv:2001.05484 (replaced) [pdf, other]
Title: Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data
Comments: accepted to the Annals of Statistics
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Statistics Theory (math.ST)
[48]  arXiv:2002.11394 (replaced) [pdf, other]
Title: Bayesian Nonparametric Space Partitions: A Survey
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[49]  arXiv:2006.08063 (replaced) [pdf, other]
Title: Gradient Estimation with Stochastic Softmax Tricks
Comments: NeurIPS 2020, final copy
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[50]  arXiv:2006.11695 (replaced) [pdf, other]
Title: Uncertainty-Aware (UNA) Bases for Bayesian Regression Using Multi-Headed Auxiliary Networks
Comments: ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning, 30 pages, 21 Figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[51]  arXiv:2007.04813 (replaced) [pdf, other]
Title: Graph-Based Continual Learning
Comments: Published as a conference paper at ICLR 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[52]  arXiv:2008.06344 (replaced) [pdf, other]
Title: COVID-19 mortality analysis from soft-data multivariate curve regression and machine learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)
[53]  arXiv:2011.05836 (replaced) [pdf, other]
Title: Neural Empirical Bayes: Source Distribution Estimation and its Applications to Simulation-Based Inference
Comments: Camera-ready version presented at AISTATS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); High Energy Physics - Phenomenology (hep-ph); Data Analysis, Statistics and Probability (physics.data-an)
[54]  arXiv:2012.08073 (replaced) [pdf, other]
Title: Generalized Chernoff Sampling for Active Testing, Active Regression and Structured Bandit Algorithms
Comments: 46 pages, 9 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[55]  arXiv:2101.01134 (replaced) [pdf, other]
Title: Does Invariant Risk Minimization Capture Invariance?
Comments: Code is available in the arXiv ancillary files, linked from this page
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[56]  arXiv:2101.07957 (replaced) [pdf, other]
Title: Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[57]  arXiv:2101.10643 (replaced) [pdf, other]
Title: CDSM -- Casual Inference using Deep Bayesian Dynamic Survival Models
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[58]  arXiv:1807.11876 (replaced) [pdf, other]
Title: Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[59]  arXiv:1901.07935 (replaced) [src]
Title: Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information
Comments: Same as arXiv:1807.11876, added by mistake
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[60]  arXiv:1901.10517 (replaced) [pdf, other]
Title: Reparameterizable Subset Sampling via Continuous Relaxations
Comments: IJCAI 2019
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[61]  arXiv:1901.10995 (replaced) [pdf, other]
Title: Go-Explore: a New Approach for Hard-Exploration Problems
Comments: 37 pages, 14 figures; added references to Goyal et al. and Oh et al., updated reference to Colas et al; updated author emails; point readers to updated paper
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[62]  arXiv:1907.04670 (replaced) [pdf, other]
Title: Improving the Performance of the LSTM and HMM Model via Hybridization
Comments: Working Manuscript
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computation (stat.CO); Machine Learning (stat.ML)
[63]  arXiv:1910.08644 (replaced) [pdf, other]
Title: Initialization methods for optimum average silhouette width clustering
Authors: Fatima Batool
Comments: 41 pages
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
[64]  arXiv:1912.00832 (replaced) [pdf, other]
Title: Epistemic Uncertainty Quantification in Deep Learning Classification by the Delta Method
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[65]  arXiv:2001.03040 (replaced) [pdf, other]
Title: Deep Network Approximation for Smooth Functions
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[66]  arXiv:2002.03532 (replaced) [pdf, other]
Title: Understanding and Improving Knowledge Distillation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[67]  arXiv:2002.06673 (replaced) [pdf, other]
Title: Performative Prediction
Comments: published at ICML'20; fixed some typos
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
[68]  arXiv:2002.07024 (replaced) [pdf, ps, other]
Title: Gaming Helps! Learning from Strategic Interactions in Natural Dynamics
Comments: The Conference version of this paper is to appear in the Proceedings of AISTATS 2021. 27 pages
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
[69]  arXiv:2002.08517 (replaced) [pdf, other]
Title: Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks
Comments: AAAI camera ready version. 18 pages, 9 figures, 2 tables. Corrected name particle capitalisation and formatting
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[70]  arXiv:2002.08526 (replaced) [pdf, other]
Title: Scalable Constrained Bayesian Optimization
Comments: To appear in Proceedings of AISTATS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[71]  arXiv:2002.12047 (replaced) [pdf, other]
Title: FMix: Enhancing Mixed Sample Data Augmentation
Comments: Code available at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Machine Learning (stat.ML)
[72]  arXiv:2003.06113 (replaced) [pdf, ps, other]
Title: Ultra Efficient Transfer Learning with Meta Update for Cross Subject EEG Classification
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[73]  arXiv:2003.13441 (replaced) [pdf]
Title: Introduction to Rare-Event Predictive Modeling for Inferential Statisticians -- A Hands-On Application in the Prediction of Breakthrough Patents
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[74]  arXiv:2004.01646 (replaced) [pdf, other]
Title: M2: Mixed Models with Preferences, Popularities and Transitions for Next-Basket Recommendation
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
[75]  arXiv:2004.02786 (replaced) [pdf, other]
Title: Adaptive Partial Scanning Transmission Electron Microscopy with Reinforcement Learning
Authors: Jeffrey M. Ede
Comments: 13 pages, 3 figures + 1 algorithm
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[76]  arXiv:2004.07320 (replaced) [pdf, other]
Title: Training with Quantization Noise for Extreme Model Compression
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[77]  arXiv:2005.07587 (replaced) [pdf, ps, other]
Title: Non-Sparse PCA in High Dimensions via Cone Projected Power Iteration
Subjects: Statistics Theory (math.ST); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)
[78]  arXiv:2005.10531 (replaced) [pdf, ps, other]
Title: Supervised Learning in the Presence of Concept Drift: A modelling framework
Comments: 17 pages in twocolumn
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)
[79]  arXiv:2005.14708 (replaced) [pdf, other]
Title: Online DR-Submodular Maximization with Stochastic Cumulative Constraints
Comments: To appear in proceedings of AAAI 2021
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[80]  arXiv:2006.06147 (replaced) [pdf, other]
Title: Implicit Kernel Attention
Comments: AAAI-21
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[81]  arXiv:2006.06853 (replaced) [pdf, ps, other]
Title: Training a Single Bandit Arm
Comments: 32 pages, 3 figures, 1 table
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[82]  arXiv:2006.06981 (replaced) [pdf, other]
Title: Kernel Distributionally Robust Optimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[83]  arXiv:2006.07886 (replaced) [pdf, other]
Title: On Disentangled Representations Learned From Correlated Data
Comments: 31 pages, 17 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[84]  arXiv:2006.08875 (replaced) [pdf, other]
Title: Model-based Adversarial Meta-Reinforcement Learning
Comments: Accepted by NeurIPS 2020. Code at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[85]  arXiv:2006.09255 (replaced) [pdf, other]
Title: Corralling Stochastic Bandit Algorithms
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[86]  arXiv:2006.12369 (replaced) [pdf, other]
Title: Controlling for sparsity in sparse factor analysis models: adaptive latent feature sharing for piecewise linear dimensionality reduction
Comments: Interactive demo available at this https URL
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[87]  arXiv:2006.13026 (replaced) [pdf, other]
Title: Deep Polynomial Neural Networks
Comments: Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI). Code: this https URL arXiv admin note: substantial text overlap with arXiv:2003.03828
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[88]  arXiv:2006.13636 (replaced) [pdf, other]
Title: Approximating a Target Distribution using Weight Queries
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[89]  arXiv:2007.00339 (replaced) [pdf, other]
Title: Multi-Task Variational Information Bottleneck
Comments: 10 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[90]  arXiv:2007.04169 (replaced) [pdf, other]
Title: An exploration of the influence of path choice in game-theoretic attribution algorithms
Comments: 21 pages, 12 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[91]  arXiv:2007.04640 (replaced) [pdf, other]
Title: Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate
Comments: In 35th AAAI Conference on Artificial Intelligence (AAAI 2021)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[92]  arXiv:2007.07177 (replaced) [pdf, other]
Title: MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Information Retrieval (cs.IR); Machine Learning (stat.ML)
[93]  arXiv:2008.02839 (replaced) [pdf, other]
Title: Learned convex regularizers for inverse problems
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[94]  arXiv:2008.06081 (replaced) [pdf, other]
Title: Adversarial Training and Provable Robustness: A Tale of Two Objectives
Comments: Accepted at AAAI 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[95]  arXiv:2008.07283 (replaced) [pdf, other]
Title: Estimating Causal Effects with the Neural Autoregressive Density Estimator
Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[96]  arXiv:2008.10183 (replaced) [pdf, other]
Title: HALO: Learning to Prune Neural Networks with Shrinkage
Comments: Accepted at SDM 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[97]  arXiv:2009.00387 (replaced) [pdf, other]
Title: Boosting Share Routing for Multi-task Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[98]  arXiv:2009.07888 (replaced) [pdf, other]
Title: Transfer Learning in Deep Reinforcement Learning: A Survey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[99]  arXiv:2009.07938 (replaced) [pdf, ps, other]
Title: Type-augmented Relation Prediction in Knowledge Graphs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
[100]  arXiv:2009.10780 (replaced) [pdf, other]
Title: Independent finite approximations for Bayesian nonparametric inference
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
[101]  arXiv:2009.10990 (replaced) [pdf, other]
Title: Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance Plans
Comments: Accepted for publication in The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), in the Innovative Applications of Artificial Intelligence track. This is the extended version with some stylistic fixes from the first posting and complete author list
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)
[102]  arXiv:2010.00381 (replaced) [pdf, other]
Title: Student-Initiated Action Advising via Advice Novelty
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[103]  arXiv:2010.00743 (replaced) [pdf, other]
Title: Cell Complex Neural Networks
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Algebraic Topology (math.AT); Machine Learning (stat.ML)
[104]  arXiv:2010.00904 (replaced) [pdf, other]
Title: Autoregressive Entity Retrieval
Comments: Accepted (spotlight) at International Conference on Learning Representations (ICLR) 2021. Code at this https URL 20 pages, 9 figures, 8 tables
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[105]  arXiv:2010.02459 (replaced) [pdf, other]
Title: Usable Information and Evolution of Optimal Representations During Training
Comments: ICLR 2021
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
[106]  arXiv:2010.04050 (replaced) [pdf, other]
Title: A survey of algorithmic recourse: definitions, formulations, solutions, and prospects
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[107]  arXiv:2010.05820 (replaced) [pdf, other]
Title: Permutation invariant networks to learn Wasserstein metrics
Comments: Fix typos, Accepted as a spotlight at Topological Data Analysis and Beyond Workshop at Neurips 2020. Added more experiments and results. Comments welcome
Subjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
[108]  arXiv:2010.10346 (replaced) [pdf, other]
Title: Deep Importance Sampling based on Regression for Model Inversion and Emulation
Subjects: Computation (stat.CO); Machine Learning (cs.LG); Machine Learning (stat.ML)
[109]  arXiv:2010.14763 (replaced) [pdf, other]
Title: Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes
Comments: arXiv admin note: substantial text overlap with arXiv:2007.09208 AISTATS 2021
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[110]  arXiv:2010.15285 (replaced) [pdf, other]
Title: Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs
Comments: Improved exposition, add resampling based test, source code
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[111]  arXiv:2011.00364 (replaced) [pdf, ps, other]
Title: Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization
Comments: in Proc. AISTATS'21
Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
[112]  arXiv:2011.00576 (replaced) [pdf, other]
Title: Experimental Design for Regret Minimization in Linear Bandits
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[113]  arXiv:2011.03178 (replaced) [pdf, other]
Title: Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?
Comments: AISTATS 2021 (Oral)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[114]  arXiv:2011.05824 (replaced) [pdf, other]
Title: Semi-Structured Deep Piecewise Exponential Models
Comments: 8 pages, 3 figures, Accepted at the AAAI spring symposium: Survival Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[115]  arXiv:2012.01205 (replaced) [pdf, other]
Title: VisEvol: Visual Analytics to Support Hyperparameter Search through Evolutionary Optimization
Comments: This manuscript is currently under review
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
[116]  arXiv:2101.05201 (replaced) [pdf, other]
Title: Optimisation of Spectral Wavelets for Persistence-based Graph Classification
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
[117]  arXiv:2101.09436 (replaced) [pdf, other]
Title: Hierarchical Variational Auto-Encoding for Unsupervised Domain Generalization
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[118]  arXiv:2101.09763 (replaced) [pdf, other]
Title: Analysing the Noise Model Error for Realistic Noisy Label Data
Comments: Accepted at AAAI 2021, additional material at this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[119]  arXiv:2102.07206 (replaced) [pdf, other]
Title: Sample Efficient Subspace-based Representations for Nonlinear Meta-Learning
Comments: To appear in ICASSP 21'
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[120]  arXiv:2102.08248 (replaced) [pdf, other]
Title: Hierarchical VAEs Know What They Don't Know
Comments: 18 pages, source code available at this https URL and this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[121]  arXiv:2102.09972 (replaced) [pdf, other]
Title: Implicit Regularization in Tensor Factorization
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[122]  arXiv:2102.11860 (replaced) [pdf, other]
Title: Automated Discovery of Adaptive Attacks on Adversarial Defenses
Comments: 16 pages, 4 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[123]  arXiv:2102.11866 (replaced) [pdf, ps, other]
Title: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Comments: Submitted for publication
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ total of 123 entries: 1-123 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2103, contact, help  (Access key information)