Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 2 Mar 21
 [1] arXiv:2103.00025 [pdf, ps, other]

Title: TEC: Tensor Ensemble Classifier for Big DataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Tensor (multidimensional array) classification problem has become very popular in modern applications such as image recognition and high dimensional spatiotemporal data analysis. Support Tensor Machine (STM) classifier, which is extended from the support vector machine, takes CANDECOMP / Parafac (CP) form of tensor data as input and predicts the data labels. The distributionfree and statistically consistent properties of STM highlight its potential in successfully handling wide varieties of data applications. Training a STM can be computationally expensive with highdimensional tensors. However, reducing the size of tensor with a random projection technique can reduce the computational time and cost, making it feasible to handle large size tensors on regular machines. We name an STM estimated with randomly projected tensor as Random Projectionbased Support Tensor Machine (RPSTM). In this work, we propose a Tensor Ensemble Classifier (TEC), which aggregates multiple RPSTMs for big tensor classification. TEC utilizes the ensemble idea to minimize the excessive classification risk brought by random projection, providing statistically consistent predictions while taking the computational advantage of RPSTM. Since each RPSTM can be estimated independently, TEC can further take advantage of parallel computing techniques and be more computationally efficient. The theoretical and numerical results demonstrate the decent performance of TEC model in highdimensional tensor classification problems. The model prediction is statistically consistent as its risk is shown to converge to the optimal Bayes risk. Besides, we highlight the tradeoff between the computational cost and the prediction risk for TEC model. The method is validated by extensive simulation and a real data example. We prepare a python package for applying TEC, which is available at our GitHub.
 [2] arXiv:2103.00034 [pdf, other]

Title: Beyond Perturbation Stability: LP Recovery Guarantees for MAP Inference on Noisy Stable InstancesComments: 25 pages, 2 figures, 2 tables. To appear in AISTATS 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Several works have shown that perturbation stable instances of the MAP inference problem in Potts models can be solved exactly using a natural linear programming (LP) relaxation. However, most of these works give few (or no) guarantees for the LP solutions on instances that do not satisfy the relatively strict perturbation stability definitions. In this work, we go beyond these stability results by showing that the LP approximately recovers the MAP solution of a stable instance even after the instance is corrupted by noise. This "noisy stable" model realistically fits with practical MAP inference problems: we design an algorithm for finding "close" stable instances, and show that several realworld instances from computer vision have nearby instances that are perturbation stable. These results suggest a new theoretical explanation for the excellent performance of this LP relaxation in practice.
 [3] arXiv:2103.00083 [pdf, other]

Title: Deep Quantile AggregationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Conditional quantile estimation is a key statistical learning challenge motivated by the need to quantify uncertainty in predictions or to model a diverse population without being overly reductive. As such, many models have been developed for this problem. Adopting a meta viewpoint, we propose a general framework (inspired by neural network optimization) for aggregating any number of conditional quantile models in order to boost predictive accuracy. We consider weighted ensembling strategies of increasing flexibility where the weights may vary over individual models, quantile levels, and feature values. An appeal of our approach is its portability: we ensure that estimated quantiles at adjacent levels do not cross by applying simple transformations through which gradients can be backpropagated, and this allows us to leverage the modern deep learning toolkit for building quantile ensembles. Our experiments confirm that ensembling can lead to big gains in accuracy, even when the constituent models are themselves powerful and flexible.
 [4] arXiv:2103.00222 [pdf, other]

Title: Variational Laplace for Bayesian neural networksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neuralnetwork weights. Variational Laplace performs better on image classification tasks than MAP inference and far better than standard variational inference with stochastic sampling despite using the same meanfield Gaussian approximate posterior. The Variational Laplace objective is simple to evaluate, as it is (in essence) the loglikelihood, plus weightdecay, plus a squaredgradient regularizer. Finally, we emphasise care needed in benchmarking standard VI as there is a risk of stopping before the variance parameters have converged. We show that earlystopping can be avoided by increasing the learning rate for the variance parameters.
 [5] arXiv:2103.00373 [pdf, other]

Title: Communicationefficient Byzantinerobust distributed learning with statistical guaranteeComments: 34 pagesSubjects: Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Communication efficiency and robustness are two major issues in modern distributed learning framework. This is due to the practical situations where some computing nodes may have limited communication power or may behave adversarial behaviors. To address the two issues simultaneously, this paper develops two communicationefficient and robust distributed learning algorithms for convex problems. Our motivation is based on surrogate likelihood framework and the median and trimmed mean operations. Particularly, the proposed algorithms are provably robust against Byzantine failures, and also achieve optimal statistical rates for strong convex losses and convex (nonsmooth) penalties. For typical statistical models such as generalized linear models, our results show that statistical errors dominate optimization errors in finite iterations. Simulated and real data experiments are conducted to demonstrate the numerical performance of our algorithms.
 [6] arXiv:2103.00500 [pdf, other]

Title: Asymptotic Risk of Overparameterized Likelihood Models: Double Descent Theory for Deep Neural NetworksComments: 33 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We investigate the asymptotic risk of a general class of overparameterized likelihood models, including deep models. The recent empirical success of largescale models has motivated several theoretical studies to investigate a scenario wherein both the number of samples, $n$, and parameters, $p$, diverge to infinity and derive an asymptotic risk at the limit. However, these theorems are only valid for linearinfeature models, such as generalized linear regression, kernel regression, and shallow neural networks. Hence, it is difficult to investigate a wider class of nonlinear models, including deep neural networks with three or more layers. In this study, we consider a likelihood maximization problem without the model constraints and analyze the upper bound of an asymptotic risk of an estimator with penalization. Technically, we combine a property of the Fisher information matrix with an extended MarchenkoPastur law and associate the combination with empirical process techniques. The derived bound is general, as it describes both the double descent and the regularized risk curves, depending on the penalization. Our results are valid without the linearinfeature constraints on models and allow us to derive the general spectral distributions of a Fisher information matrix from the likelihood. We demonstrate that several explicit models, such as parallel deep neural networks and ensemble learning, are in agreement with our theory. This result indicates that even large and deep models have a small asymptotic risk if they exhibit a specific structure, such as divisibility. To verify this finding, we conduct a realdata experiment with parallel deep neural networks. Our results expand the applicability of the asymptotic risk analysis, and may also contribute to the understanding and application of deep learning.
 [7] arXiv:2103.00654 [pdf, other]

Title: Feedback Coding for Active LearningComments: AISTATS 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The iterative selection of examples for labeling in active machine learning is conceptually similar to feedback channel coding in information theory: in both tasks, the objective is to seek a minimal sequence of actions to encode information in the presence of noise. While this highlevel overlap has been previously noted, there remain open questions on how to best formulate active learning as a communications system to leverage existing analysis and algorithms in feedback coding. In this work, we formally identify and leverage the structural commonalities between the two problems, including the characterization of encoder and noisy channel components, to design a new algorithm. Specifically, we develop an optimal transportbased feedback coding scheme called Approximate Posterior Matching (APM) for the task of active example selection and explore its application to Bayesian logistic regression, a popular model in active learning. We evaluate APM on a variety of datasets and demonstrate learning performance comparable to existing active learning methods, at a reduced computational cost. These results demonstrate the potential of directly deploying concepts from feedback channel coding to design efficient active learning strategies.
 [8] arXiv:2103.00668 [pdf, other]

Title: Learning Proposals for Probabilistic Programs with Inference CombinatorsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Programming Languages (cs.PL)
We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernels and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for userprogrammable variational methods that are correct by construction and can be tailored to specific models. We demonstrate the flexibility of this framework in applications to advanced variational methods based on amortized Gibbs sampling and annealing.
 [9] arXiv:2103.00684 [pdf, other]

Title: Metalearning Oneclass Classifiers with Eigenvalue Solvers for Supervised Anomaly DetectionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Neural networkbased anomaly detection methods have shown to achieve high performance. However, they require a large amount of training data for each task. We propose a neural networkbased metalearning method for supervised anomaly detection. The proposed method improves the anomaly detection performance on unseen tasks, which contains a few labeled normal and anomalous instances, by metatraining with various datasets. With a metalearning framework, quick adaptation to each task and its effective backpropagation are important since the model is trained by the adaptation for each epoch. Our model enables them by formulating adaptation as a generalized eigenvalue problem with oneclass classification; its global optimum solution is obtained, and the solver is differentiable. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and fewshot learning methods on various datasets.
 [10] arXiv:2103.00694 [pdf, other]

Title: Metalearning representations for clustering with infinite Gaussian mixture modelsAuthors: Tomoharu IwataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
For better clustering performance, appropriate representations are critical. Although many neural networkbased metric learning methods have been proposed, they do not directly train neural networks to improve clustering performance. We propose a metalearning method that train neural networks for obtaining representations such that clustering performance improves when the representations are clustered by the variational Bayesian (VB) inference with an infinite Gaussian mixture model. The proposed method can cluster unseen unlabeled data using knowledge metalearned with labeled data that are different from the unlabeled data. For the objective function, we propose a continuous approximation of the adjusted Rand index (ARI), by which we can evaluate the clustering performance from soft clustering assignments. Since the approximated ARI and the VB inference procedure are differentiable, we can backpropagate the objective function through the VB inference procedure to train the neural networks. With experiments using text and image data sets, we demonstrate that our proposed method has a higher adjusted Rand index than existing methods do.
 [11] arXiv:2103.00704 [pdf, other]

Title: PrivacyPreserving Distributed SVD via Federated PowerSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Singular value decomposition (SVD) is one of the most fundamental tools in machine learning and statistics.The modern machine learning community usually assumes that data come from and belong to smallscale device users. The low communication and computation power of such devices, and the possible privacy breaches of users' sensitive data make the computation of SVD challenging. Federated learning (FL) is a paradigm enabling a large number of devices to jointly learn a model in a communicationefficient way without data sharing. In the FL framework, we develop a class of algorithms called FedPower for the computation of partial SVD in the modern setting. Based on the wellknown power method, the local devices alternate between multiple local power iterations and one global aggregation to improve communication efficiency. In the aggregation, we propose to weight each local eigenvector matrix with Orthogonal Procrustes Transformation (OPT). Considering the practical stragglers' effect, the aggregation can be fully participated or partially participated, where for the latter we propose two sampling and aggregation schemes. Further, to ensure strong privacy protection, we add Gaussian noise whenever the communication happens by adopting the notion of differential privacy (DP). We theoretically show the convergence bound for FedPower. The resulting bound is interpretable with each part corresponding to the effect of Gaussian noise, parallelization, and random sampling of devices, respectively. We also conduct experiments to demonstrate the merits of FedPower. In particular, the local iterations not only improve communication efficiency but also reduce the chance of privacy breaches.
 [12] arXiv:2103.00711 [pdf, ps, other]

Title: Panel semiparametric quantile regression neural network for electricity consumption forecastingComments: 30Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)
China has made great achievements in electric power industry during the longterm deepening of reform and opening up. However, the complex regional economic, social and natural conditions, electricity resources are not evenly distributed, which accounts for the electricity deficiency in some regions of China. It is desirable to develop a robust electricity forecasting model. Motivated by which, we propose a Panel Semiparametric Quantile Regression Neural Network (PSQRNN) by utilizing the artificial neural network and semiparametric quantile regression. The PSQRNN can explore a potential linear and nonlinear relationships among the variables, interpret the unobserved provincial heterogeneity, and maintain the interpretability of parametric models simultaneously. And the PSQRNN is trained by combining the penalized quantile regression with LASSO, ridge regression and backpropagation algorithm. To evaluate the prediction accuracy, an empirical analysis is conducted to analyze the provincial electricity consumption from 1999 to 2018 in China based on three scenarios. From which, one finds that the PSQRNN model performs better for electricity consumption forecasting by considering the economic and climatic factors. Finally, the provincial electricity consumptions of the next $5$ years (20192023) in China are reported by forecasting.
 [13] arXiv:2103.00733 [pdf, ps, other]

Title: The Mathematics Behind Spectral Clustering And The Equivalence To PCAAuthors: T ShenComments: 6 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Spectral clustering is a popular algorithm that clusters points using the eigenvalues and eigenvectors of Laplacian matrices derived from the data. For years, spectral clustering has been working mysteriously. This paper explains spectral clustering by dividing it into two categories based on whether the graph Laplacian is fully connected or not. For a fully connected graph, this paper demonstrates the dimension reduction part by offering an objective function: the covariance between the original data points' similarities and the mapped data points' similarities. For a multiconnected graph, this paper proves that with a proper $k$, the first $k$ eigenvectors are the indicators of the connected components. This paper also proves there is an equivalence between spectral embedding and PCA.
 [14] arXiv:2103.01126 [pdf, ps, other]

Title: BERT based patent novelty search by training claims to their own descriptionSubjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG); Econometrics (econ.EM)
In this paper we present a method to concatenate patent claims to their own description. By applying this method, BERT trains suitable descriptions for claims. Such a trained BERT (claimtodescription BERT) could be able to identify novelty relevant descriptions for patents. In addition, we introduce a new scoring scheme, relevance scoring or novelty scoring, to process the output of BERT in a meaningful way. We tested the method on patent applications by training BERT on the first claims of patents and corresponding descriptions. BERT's output has been processed according to the relevance score and the results compared with the cited X documents in the search reports. The test showed that BERT has scored some of the cited X documents as highly relevant.
Crosslists for Tue, 2 Mar 21
 [15] arXiv:2102.13135 (crosslist from math.ST) [pdf, other]

Title: Graph Community Detection from Coarse Measurements: Recovery Conditions for the Coarsened Weighted Stochastic Block ModelSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
We study the problem of community recovery from coarse measurements of a graph. In contrast to the problem of community recovery of a fully observed graph, one often encounters situations when measurements of a graph are made at lowresolution, each measurement integrating across multiple graph nodes. Such lowresolution measurements effectively induce a coarse graph with its own communities. Our objective is to develop conditions on the graph structure, the quantity, and properties of measurements, under which we can recover the community organization in this coarse graph. In this paper, we build on the stochastic block model by mathematically formalizing the coarsening process, and characterizing its impact on the community members and connections. Through this novel setup and modeling, we characterize an error bound for community recovery. The error bound yields simple and closedform asymptotic conditions to achieve the perfect recovery of the coarse graph communities.
 [16] arXiv:2103.00065 (crosslist from cs.LG) [pdf, other]

Title: Gradient Descent on Neural Networks Typically Occurs at the Edge of StabilityComments: To appear in ICLR 2021. 72 pages, 107 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We empirically demonstrate that fullbatch gradient descent on neural network training objectives typically operates in a regime we call the Edge of Stability. In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / \text{(step size)}$, and the training loss behaves nonmonotonically over short timescales, yet consistently decreases over long timescales. Since this behavior is inconsistent with several widespread presumptions in the field of optimization, our findings raise questions as to whether these presumptions are relevant to neural network training. We hope that our findings will inspire future efforts aimed at rigorously understanding optimization at the Edge of Stability. Code is available at https://github.com/locuslab/edgeofstability.
 [17] arXiv:2103.00107 (crosslist from cs.LG) [pdf, other]

Title: Revisiting Peng's Q($λ$) for Modern Reinforcement LearningAuthors: Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David AbelComments: 26 pages, 7 figures, 2 tablesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Offpolicy multistep reinforcement learning algorithms consist of conservative and nonconservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Qfunction. In contrast, nonconservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonetheless, recent studies have shown that nonconservative algorithms empirically outperform conservative ones. Motivated by the empirical results and the lack of theory, we carry out theoretical analyses of Peng's Q($\lambda$), a representative example of nonconservative algorithms. We prove that it also converges to an optimal policy provided that the behavior policy slowly tracks a greedy policy in a way similar to conservative policy iteration. Such a result has been conjectured to be true but has not been proven. We also experiment with Peng's Q($\lambda$) in complex continuous control tasks, confirming that Peng's Q($\lambda$) often outperforms conservative algorithms despite its simplicity. These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoreticallysound and practically effective algorithm.
 [18] arXiv:2103.00136 (crosslist from cs.LG) [pdf, other]

Title: Incorporating Causal Graphical Prior Knowledge into Predictive Modeling via Simple Data AugmentationComments: 24 pages, 5 figures, 2 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions. When a CG is available, e.g., from the domain knowledge, we can infer the conditional independence (CI) relations that should hold in the data distribution. However, it is not straightforward how to incorporate this knowledge into predictive modeling. In this work, we propose a modelagnostic data augmentation method that allows us to exploit the prior knowledge of the CI encoded in a CG for supervised machine learning. We theoretically justify the proposed method by providing an excess risk bound indicating that the proposed method suppresses overfitting by reducing the apparent complexity of the predictor hypothesis class. Using realworld data with CGs provided by domain experts, we experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the smalldata regime.
 [19] arXiv:2103.00139 (crosslist from cs.LG) [pdf, other]

Title: Scalable Causal Transfer LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. A main limitation of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across the source and target domains based on Markov blanket discovery. SCTL does not require to have prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in lowdimensional and highdimensional settings.
 [20] arXiv:2103.00299 (crosslist from math.OC) [pdf, ps, other]

Title: Parallel Stochastic Mirror Descent for MDPsSubjects: Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider the problem of learning the optimal policy for infinitehorizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitzcontinuous functionals. An important detail is the ability to use inexact values of functional constraints. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for averagereward MDPs with a generative model. One of the main features of the presented method is low communication costs in a distributed centralized setting.
 [21] arXiv:2103.00349 (crosslist from cs.LG) [pdf, other]

Title: HighDimensional Bayesian Optimization with Sparse AxisAligned SubspacesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bayesian optimization (BO) is a powerful paradigm for efficient optimization of blackbox objective functions. Highdimensional BO presents a particular challenge, in part because the curse of dimensionality makes it difficult to define as well as do inference over a suitable class of surrogate models. We argue that Gaussian process surrogate models defined on sparse axisaligned subspaces offer an attractive compromise between flexibility and parsimony. We demonstrate that our approach, which relies on Hamiltonian Monte Carlo for inference, can rapidly identify sparse subspaces relevant to modeling the unknown objective function, enabling sampleefficient highdimensional BO. In an extensive suite of experiments comparing to existing methods for highdimensional BO we demonstrate that our algorithm, Sparse AxisAligned Subspace BO (SAASBO), achieves excellent performance on several synthetic and realworld problems without the need to set problemspecific hyperparameters.
 [22] arXiv:2103.00381 (crosslist from cs.LG) [pdf, other]

Title: Adversarial Information BottleneckComments: 10 pages,7 figures,2 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The information bottleneck (IB) principle has been adopted to explain deep learning in terms of information compression and prediction, which are balanced by a tradeoff hyperparameter. How to optimize the IB principle for better robustness and figure out the effects of compression through the tradeoff hyperparameter are two challenging problems. Previous methods attempted to optimize the IB principle by introducing random noise into learning the representation and achieved stateoftheart performance in the nuisance information compression and semantic information extraction. However, their performance on resisting adversarial perturbations is far less impressive. To this end, we propose an adversarial information bottleneck (AIB) method without any explicit assumptions about the underlying distribution of the representations, which can be optimized effectively by solving a MinMax optimization problem. Numerical experiments on synthetic and realworld datasets demonstrate its effectiveness on learning more invariant representations and mitigating adversarial perturbations compared to several competing IB methods. In addition, we analyse the adversarial robustness of diverse IB methods contrasting with their IB curves, and reveal that IB models with the hyperparameter $\beta$ corresponding to the knee point in the IB curve achieve the best tradeoff between compression and prediction, and has best robustness against various attacks.
 [23] arXiv:2103.00393 (crosslist from cs.LG) [pdf, other]

Title: Hierarchical Inducing Point Gaussian Process for Interdomain ObservationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We examine the general problem of interdomain Gaussian Processes (GPs): problems where the GP realization and the noisy observations of that realization lie on different domains. When the mapping between those domains is linear, such as integration or differentiation, inference is still closed form. However, many of the scaling and approximation techniques that our community has developed do not apply to this setting. In this work, we introduce the hierarchical inducing point GP (HIPGP), a scalable interdomain GP inference method that enables us to improve the approximation accuracy by increasing the number of inducing points to the millions. HIPGP, which relies on inducing points with grid structure and a stationary kernel assumption, is suitable for lowdimensional problems. In developing HIPGP, we introduce (1) a fast whitening strategy, and (2) a novel preconditioner for conjugate gradients which can be helpful in general GP settings.
 [24] arXiv:2103.00394 (crosslist from cs.LG) [pdf, ps, other]

Title: Convergence of Gaussiansmoothed optimal transport distance with subgamma distributions and dependent samplesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Gaussiansmoothed optimal transport (GOT) framework, recently proposed by Goldfeld et al., scales to high dimensions in estimation and provides an alternative to entropy regularization. This paper provides convergence guarantees for estimating the GOT distance under more general settings. For the Gaussiansmoothed $p$Wasserstein distance in $d$ dimensions, our results require only the existence of a moment greater than $d + 2p$. For the special case of subgamma distributions, we quantify the dependence on the dimension $d$ and establish a phase transition with respect to the scale parameter. We also prove convergence for dependent samples, only requiring a condition on the pairwise dependence of the samples measured by the covariance of the feature map of a kernel space.
A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy (MMD) distances with a kernel that depends on the cost function as well as the amount of Gaussian smoothing. This insight provides further interpretability for the GOT framework and also introduces a class of kernel MMD distances with desirable properties. The theoretical results are supported by numerical experiments.  [25] arXiv:2103.00396 (crosslist from cs.LG) [pdf, other]

Title: A Minimax Probability Machine for NonDecomposable Performance MeasuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Imbalanced classification tasks are widespread in many realworld applications. For such classification tasks, in comparison with the accuracy rate, it is usually much more appropriate to use nondecomposable performance measures such as the Area Under the receiver operating characteristic Curve (AUC) and the $F_\beta$ measure as the classification criterion since the label class is imbalanced. On the other hand, the minimax probability machine is a popular method for binary classification problems and aims at learning a linear classifier by maximizing the accuracy rate, which makes it unsuitable to deal with imbalanced classification tasks. The purpose of this paper is to develop a new minimax probability machine for the $F_\beta$ measure, called MPMF, which can be used to deal with imbalanced classification tasks. A brief discussion is also given on how to extend the MPMF model for several other nondecomposable performance measures listed in the paper. To solve the MPMF model effectively, we derive its equivalent form which can then be solved by an alternating descent method to learn a linear classifier. Further, the kernel trick is employed to derive a nonlinear MPMF model to learn a nonlinear classifier. Several experiments on realworld benchmark datasets demonstrate the effectiveness of our new model.
 [26] arXiv:2103.00445 (crosslist from cs.LG) [pdf, other]

Title: Ensemble Bootstrapping for QLearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Qlearning (QL), a common reinforcement learning algorithm, suffers from overestimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to suboptimal behavior. DoubleQlearning tackles this issue by utilizing two estimators, yet results in an underestimation bias. Similar to overestimation in Qlearning, in certain scenarios, the underestimation bias may degrade performance. In this work, we introduce a new biasreduced algorithm called Ensemble Bootstrapped QLearning (EBQL), a natural extension of DoubleQlearning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQLlike updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and underestimation result in suboptimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.
 [27] arXiv:2103.00476 (crosslist from cs.NE) [pdf, other]

Title: Optimal Conversion of Conventional Artificial Neural Networks to Spiking Neural NetworksSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Spiking neural networks (SNNs) are biologyinspired artificial neural networks (ANNs) that comprise of spiking neurons to process asynchronous discrete signals. While more efficient in power consumption and inference speed on the neuromorphic hardware, SNNs are usually difficult to train directly from scratch with spikes due to the discreteness. As an alternative, many efforts have been devoted to converting conventional ANNs into SNNs by copying the weights from ANNs and adjusting the spiking threshold potential of neurons in SNNs. Researchers have designed new SNN architectures and conversion algorithms to diminish the conversion error. However, an effective conversion should address the difference between the SNN and ANN architectures with an efficient approximation \DSK{of} the loss function, which is missing in the field. In this work, we analyze the conversion error by recursive reduction to layerwise summation and propose a novel strategic pipeline that transfers the weights to the target SNN by combining threshold balance and softreset mechanisms. This pipeline enables almost no accuracy loss between the converted SNNs and conventional ANNs with only $\sim1/10$ of the typical SNN simulation time. Our method is promising to get implanted onto embedded platforms with better support of SNNs with limited energy and memory.
 [28] arXiv:2103.00486 (crosslist from cs.SI) [pdf, other]

Title: Community Detection in Weighted Multilayer Networks with Ambient NoiseComments: 27 pagesSubjects: Social and Information Networks (cs.SI); Physics and Society (physics.socph); Applications (stat.AP); Methodology (stat.ME); Machine Learning (stat.ML)
We introduce a novel class of stochastic blockmodel for multilayer weighted networks that accounts for the presence of a global ambient noise that governs betweenblock interactions. In weighted multilayer noisy networks, a natural hierarchy is induced by the assumption of all but one block being under the `umbrella' of signal, while a single block is classified as exhibiting the same characteristics as interstitial interactions and subsumed into ambient noise. We use hierarchical variational inference to jointly detect blockstructures and differentiate the communities' local signals from global noise. We propose a novel algorithm to find clusters in Gaussian weighted multilayer networks with a focus on these principles called Stochastic Block (with) Ambient Noise Model (SBANM).We apply this method to several different domains. We focus on the Philadelphia Neurodevelopmental Cohort to discover communities of subjects that form diagnostic categories relating psychopathological symptoms to psychosis.
 [29] arXiv:2103.00502 (crosslist from cs.LG) [pdf, other]

Title: Optimal Approximation Rate of ReLU Networks in terms of Width and DepthSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper concentrates on the approximation power of deep feedforward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width $\mathcal{O}\big(\max\{d\lfloor N^{1/d}\rfloor,\, N+2\}\big)$ and depth $\mathcal{O}(L)$ can approximate a H\"older continuous function on $[0,1]^d$ with an approximation rate $\mathcal{O}\big(\lambda\sqrt{d} (N^2L^2\ln N)^{\alpha/d}\big)$, where $\alpha\in (0,1]$ and $\lambda>0$ are H\"older order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function $f$ on $[0,1]^d$, the approximation rate becomes $\mathcal{O}\big(\,\sqrt{d}\,\omega_f\big( (N^2L^2\ln N)^{1/d}\big)\,\big)$, where $\omega_f(\cdot)$ is the modulus of continuity. We also extend our analysis to any continuous function $f$ on a bounded set. Particularly, if ReLU networks with depth $31$ and width $\mathcal{O}(N)$ are used to approximate onedimensional Lipschitz continuous functions on $[0,1]$ with a Lipschitz constant $\lambda>0$, the approximation rate in terms of the total number of parameters, $W=\mathcal{O}(N^2)$, becomes $\mathcal{O}(\tfrac{\lambda}{W\ln W})$, which has not been discovered in the literature for fixeddepth ReLU networks.
 [30] arXiv:2103.00580 (crosslist from stat.ME) [pdf, other]

Title: A Stein Goodness of fit Test for Exponential Random Graph ModelsJournalref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
We propose and analyse a novel nonparametric goodness of fit testing procedure for exchangeable exponential random graph models (ERGMs) when a single network realisation is observed. The test determines how likely it is that the observation is generated from a target unnormalised ERGM density. Our test statistics are derived from a kernel Stein discrepancy, a divergence constructed via Steins method using functions in a reproducing kernel Hilbert space, combined with a discrete Stein operator for ERGMs. The test is a Monte Carlo test based on simulated networks from the target ERGM. We show theoretical properties for the testing procedure for a class of ERGMs. Simulation studies and real network applications are presented.
 [31] arXiv:2103.00674 (crosslist from stat.ME) [pdf, other]

Title: BEAUTY Powered BEASTSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
We study inference about the uniform distribution with the proposed binary expansion approximation of uniformity (BEAUTY) approach. Through an extension of the celebrated Euler's formula, we approximate the characteristic function of any copula distribution with a linear combination of means of binary interactions from marginal binary expansions. This novel characterization enables a unification of many important existing tests through an approximation from some quadratic form of symmetry statistics, where the deterministic weight matrix characterizes the power properties of each test. To achieve a uniformly high power, we study test statistics with dataadaptive weights through an oracle approach, referred to as the binary expansion adaptive symmetry test (BEAST). By utilizing the properties of the binary expansion filtration, we show that the NeymanPearson test of uniformity can be approximated by an oracle weighted sum of symmetry statistics. The BEAST with this oracle leads all existing tests we considered in empirical power against all complex forms of alternatives. This oracle therefore sheds light on the potential of substantial improvements in power and on the form of optimal weights under each alternative. By approximating this oracle with dataadaptive weights, we develop the BEAST that improves the empirical power of many existing tests against a wide spectrum of common alternatives while providing clear interpretation of the form of nonuniformity upon rejection. We illustrate the BEAST with a study of the relationship between the location and brightness of stars.
 [32] arXiv:2103.00719 (crosslist from cs.LG) [pdf, ps, other]

Title: LocalDrop: A Hybrid Regularization for Deep Neural NetworksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas. We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop. A new regularization function for both fullyconnected networks (FCNs) and convolutional neural networks (CNNs), including drop rates and weight matrices, has been developed based on the proposed upper bound of the local Rademacher complexity by the strict mathematical deduction. The analyses of dropout in FCNs and DropBlock in CNNs with keep rate matrices in different layers are also included in the complexity analyses. With the new regularization function, we establish a twostage procedure to obtain the optimal keep rate matrix and weight matrix to realize the whole training model. Extensive experiments have been conducted to demonstrate the effectiveness of LocalDrop in different models by comparing it with several algorithms and the effects of different hyperparameters on the final performances.
 [33] arXiv:2103.00959 (crosslist from cs.SI) [pdf, other]

Title: CogDL: An Extensive Toolkit for Deep Learning on GraphsAuthors: Yukuo Cen, Zhenyu Hou, Yan Wang, Qibin Chen, Yizhen Luo, Xingcheng Yao, Aohan Zeng, Shiguang Guo, Peng Zhang, Guohao Dai, Yu Wang, Chang Zhou, Hongxia Yang, Jie TangSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Graph representation learning aims to learn lowdimensional node embeddings for graphs. It is used in several realworld applications such as social network analysis and largescale recommender systems. In this paper, we introduce CogDL, an extensive research toolkit for deep learning on graphs that allows researchers and developers to easily conduct experiments and build applications. It provides standard training and evaluation for the most important tasks in the graph domain, including node classification, link prediction, graph classification, and other graph tasks. For each task, it offers implementations of stateoftheart models. The models in our toolkit are divided into two major parts, graph embedding methods and graph neural networks. Most of the graph embedding methods learn nodelevel or graphlevel representations in an unsupervised way and preserves the graph properties such as structural information, while graph neural networks capture node features and work in semisupervised or selfsupervised settings. All models implemented in our toolkit can be easily reproducible for leaderboard results. Most models in CogDL are developed on top of PyTorch, and users can leverage the advantages of PyTorch to implement their own models. Furthermore, we demonstrate the effectiveness of CogDL for realworld applications in AMiner, which is a large academic database and system.
 [34] arXiv:2103.00988 (crosslist from cs.LG) [pdf, other]

Title: MomentBased Variational Inference for Stochastic Differential EquationsComments: Appearing in Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Existing deterministic variational inference approaches for diffusion processes use simple proposals and target the marginal density of the posterior. We construct the variational process as a controlled version of the prior process and approximate the posterior by a set of moment functions. In combination with moment closure, the smoothing problem is reduced to a deterministic optimal control problem. Exploiting the pathwise Fisher information, we propose an optimization procedure that corresponds to a natural gradient descent in the variational parameters. Our approach allows for richer variational approximations that extend to statedependent diffusion terms. The classical Gaussian process approximation is recovered as a special case.
 [35] arXiv:2103.01030 (crosslist from cs.LG) [pdf, other]

Title: An Easy to Interpret Diagnostic for Approximate Inference: Symmetric Divergence Over SimulationsAuthors: Justin DomkeSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
It is important to estimate the errors of probabilistic inference algorithms. Existing diagnostics for Markov chain Monte Carlo methods assume inference is asymptotically exact, and are not appropriate for approximate methods like variational inference or Laplace's method. This paper introduces a diagnostic based on repeatedly simulating datasets from the prior and performing inference on each. The central observation is that it is possible to estimate a symmetric KLdivergence defined over these simulations.
 [36] arXiv:2103.01043 (crosslist from cs.LG) [pdf, other]

Title: Persistent Message PassingComments: 7 pages, 2 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Graph neural networks (GNNs) are a powerful inductive bias for modelling algorithmic reasoning procedures and data structures. Their prowess was mainly demonstrated on tasks featuring Markovian dynamics, where querying any associated data structure depends only on its latest state. For many tasks of interest, however, it may be highly beneficial to support efficient data structure queries dependent on previous states. This requires tracking the data structure's evolution through time, placing significant pressure on the GNN's latent representations. We introduce Persistent Message Passing (PMP), a mechanism which endows GNNs with capability of querying past state by explicitly persisting it: rather than overwriting node representations, it creates new nodes whenever required. PMP generalises outofdistribution to more than 2x larger test inputs on dynamic temporal range queries, significantly outperforming GNNs which overwrite states.
 [37] arXiv:2103.01085 (crosslist from cs.LG) [pdf, other]

Title: Challenges and Opportunities in Highdimensional Variational InferenceAuthors: Akash Kumar Dhaka, Alejandro Catalina, Manushi Welandawe, Michael Riis Andersen, Jonathan Huggins, Aki VehtariSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
We explore the limitations of and best practices for using blackbox variational inference to estimate posterior summaries of the model parameters. By taking an importance sampling perspective, we are able to explain and empirically demonstrate: 1) why the intuitions about the behavior of approximate families and divergences for lowdimensional posteriors fail for higherdimensional posteriors, 2) how we can diagnose the preasymptotic reliability of variational inference in practice by examining the behavior of the density ratios (i.e., importance weights), 3) why the choice of variational objective is not as relevant for higherdimensional posteriors, and 4) why, although flexible variational families can provide some benefits in higher dimensions, they also introduce additional optimization challenges. Based on these findings, for highdimensional posteriors we recommend using the exclusive KL divergence that is most stable and easiest to optimize, and then focusing on improving the variational family or using model parameter transformations to make the posterior more similar to the approximating family. Our results also show that in low to moderate dimensions, heavytailed variational families and masscovering divergences can increase the chances that the approximation can be improved by importance sampling.
 [38] arXiv:2103.01148 (crosslist from cs.LG) [pdf, other]

Title: Class Means as an Early Exit Decision MechanismSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Stateoftheart neural networks with early exit mechanisms often need considerable amount of training and finetuning to achieve good performance with low computational cost. We propose a novel early exit technique based on the class means of samples. Unlike most existing schemes, our method does not require gradientbased training of internal classifiers. This makes our method particularly useful for neural network training in lowpower devices, as in wireless edge networks. In particular, given a fixed training time budget, our scheme achieves higher accuracy as compared to existing early exit mechanisms. Moreover, if there are no limitations on the training time budget, our method can be combined with an existing early exit scheme to boost its performance, achieving a better tradeoff between computational cost and network accuracy.
 [39] arXiv:2103.01197 (crosslist from cs.LG) [pdf, other]

Title: Coordination Among Neural Modules Through a Shared Global WorkspaceAuthors: Anirudh Goyal, Aniket Didolkar, Alex Lamb, Kartikeya Badola, Nan Rosemary Ke, Nasim Rahaman, Jonathan Binas, Charles Blundell, Michael Mozer, Yoshua BengioSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For example, Transformers segment by position, and objectcentric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of selfattention to incorporate information from other positions; objectcentric architectures make use of graph neural networks to model interactions among entities. However, pairwise interactions may not achieve global coordination or a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidthlimited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.
 [40] arXiv:2103.01201 (crosslist from econ.EM) [pdf, other]

Title: Can Machine Learning Catch the COVID19 Recession?Subjects: Econometrics (econ.EM); Applications (stat.AP); Machine Learning (stat.ML)
Based on evidence gathered from a newly built large macroeconomic data set for the UK, labeled UKMD and comparable to similar datasets for the US and Canada, it seems the most promising avenue for forecasting during the pandemic is to allow for general forms of nonlinearity by using machine learning (ML) methods. But not all nonlinear ML methods are alike. For instance, some do not allow to extrapolate (like regular trees and forests) and some do (when complemented with linear dynamic components). This and other crucial aspects of MLbased forecasting in unprecedented times are studied in an extensive pseudooutofsample exercise.
 [41] arXiv:2103.01210 (crosslist from cs.LG) [pdf, ps, other]

Title: Quantifying the Benefit of Using Differentiable Learning over Tangent KernelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a nontrivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a nontrivial advantage over random guessing.
Replacements for Tue, 2 Mar 21
 [42] arXiv:1806.02935 (replaced) [pdf, other]

Title: Causal effects based on distributional distancesComments: 46 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
 [43] arXiv:1905.05022 (replaced) [pdf, other]

Title: Inferring Hierarchical Mixture Structures:A Bayesian Nonparametric ApproachSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [44] arXiv:1910.03834 (replaced) [pdf, other]

Title: Estimating Density Models with Truncation Boundaries using Score MatchingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [45] arXiv:1912.04439 (replaced) [pdf, other]

Title: Privacypreserving data sharing via probabilistic modellingSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
 [46] arXiv:1912.10708 (replaced) [pdf]

Title: Recreation of the Periodic Table with an Unsupervised Machine Learning AlgorithmComments: 28 pages, 14 figures, complete version of this paper is available at this https URL (Published: 26 February 2021)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [47] arXiv:2001.05484 (replaced) [pdf, other]

Title: Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing DataComments: accepted to the Annals of StatisticsSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Statistics Theory (math.ST)
 [48] arXiv:2002.11394 (replaced) [pdf, other]

Title: Bayesian Nonparametric Space Partitions: A SurveySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [49] arXiv:2006.08063 (replaced) [pdf, other]

Title: Gradient Estimation with Stochastic Softmax TricksComments: NeurIPS 2020, final copySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [50] arXiv:2006.11695 (replaced) [pdf, other]

Title: UncertaintyAware (UNA) Bases for Bayesian Regression Using MultiHeaded Auxiliary NetworksComments: ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning, 30 pages, 21 FiguresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [51] arXiv:2007.04813 (replaced) [pdf, other]

Title: GraphBased Continual LearningComments: Published as a conference paper at ICLR 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [52] arXiv:2008.06344 (replaced) [pdf, other]

Title: COVID19 mortality analysis from softdata multivariate curve regression and machine learningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)
 [53] arXiv:2011.05836 (replaced) [pdf, other]

Title: Neural Empirical Bayes: Source Distribution Estimation and its Applications to SimulationBased InferenceComments: Cameraready version presented at AISTATS 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); High Energy Physics  Experiment (hepex); High Energy Physics  Phenomenology (hepph); Data Analysis, Statistics and Probability (physics.dataan)
 [54] arXiv:2012.08073 (replaced) [pdf, other]

Title: Generalized Chernoff Sampling for Active Testing, Active Regression and Structured Bandit AlgorithmsComments: 46 pages, 9 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [55] arXiv:2101.01134 (replaced) [pdf, other]

Title: Does Invariant Risk Minimization Capture Invariance?Comments: Code is available in the arXiv ancillary files, linked from this pageSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [56] arXiv:2101.07957 (replaced) [pdf, other]

Title: NearOptimal Regret Bounds for Contextual Combinatorial SemiBandits with Linear Payoff FunctionsAuthors: Kei Takemura, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro Fukunaga, Naonori Kakimura, Kenichi KawarabayashiSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [57] arXiv:2101.10643 (replaced) [pdf, other]

Title: CDSM  Casual Inference using Deep Bayesian Dynamic Survival ModelsSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (qbio.QM)
 [58] arXiv:1807.11876 (replaced) [pdf, other]

Title: Predicting Tactical Solutions to Operational Planning Problems under Imperfect InformationAuthors: Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Frejinger, Simon LacosteJulien, Andrea LodiSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [59] arXiv:1901.07935 (replaced) [src]

Title: Predicting Tactical Solutions to Operational Planning Problems under Imperfect InformationAuthors: Eric Larsen, Sébastien Lachapelle, Yoshua Bengio, Emma Frejinger, Simon LacosteJulien, Andrea LodiComments: Same as arXiv:1807.11876, added by mistakeSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [60] arXiv:1901.10517 (replaced) [pdf, other]

Title: Reparameterizable Subset Sampling via Continuous RelaxationsComments: IJCAI 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [61] arXiv:1901.10995 (replaced) [pdf, other]

Title: GoExplore: a New Approach for HardExploration ProblemsComments: 37 pages, 14 figures; added references to Goyal et al. and Oh et al., updated reference to Colas et al; updated author emails; point readers to updated paperSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [62] arXiv:1907.04670 (replaced) [pdf, other]

Title: Improving the Performance of the LSTM and HMM Model via HybridizationComments: Working ManuscriptSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computation (stat.CO); Machine Learning (stat.ML)
 [63] arXiv:1910.08644 (replaced) [pdf, other]

Title: Initialization methods for optimum average silhouette width clusteringAuthors: Fatima BatoolComments: 41 pagesSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [64] arXiv:1912.00832 (replaced) [pdf, other]

Title: Epistemic Uncertainty Quantification in Deep Learning Classification by the Delta MethodSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [65] arXiv:2001.03040 (replaced) [pdf, other]

Title: Deep Network Approximation for Smooth FunctionsSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
 [66] arXiv:2002.03532 (replaced) [pdf, other]

Title: Understanding and Improving Knowledge DistillationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [67] arXiv:2002.06673 (replaced) [pdf, other]

Title: Performative PredictionComments: published at ICML'20; fixed some typosSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
 [68] arXiv:2002.07024 (replaced) [pdf, ps, other]

Title: Gaming Helps! Learning from Strategic Interactions in Natural DynamicsComments: The Conference version of this paper is to appear in the Proceedings of AISTATS 2021. 27 pagesSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
 [69] arXiv:2002.08517 (replaced) [pdf, other]

Title: Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite NetworksComments: AAAI camera ready version. 18 pages, 9 figures, 2 tables. Corrected name particle capitalisation and formattingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [70] arXiv:2002.08526 (replaced) [pdf, other]

Title: Scalable Constrained Bayesian OptimizationComments: To appear in Proceedings of AISTATS 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [71] arXiv:2002.12047 (replaced) [pdf, other]

Title: FMix: Enhancing Mixed Sample Data AugmentationAuthors: Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan, Adam PrügelBennett, Jonathon HareComments: Code available at this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT); Machine Learning (stat.ML)
 [72] arXiv:2003.06113 (replaced) [pdf, ps, other]

Title: Ultra Efficient Transfer Learning with Meta Update for Cross Subject EEG ClassificationSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
 [73] arXiv:2003.13441 (replaced) [pdf]

Title: Introduction to RareEvent Predictive Modeling for Inferential Statisticians  A HandsOn Application in the Prediction of Breakthrough PatentsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [74] arXiv:2004.01646 (replaced) [pdf, other]

Title: M2: Mixed Models with Preferences, Popularities and Transitions for NextBasket RecommendationSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
 [75] arXiv:2004.02786 (replaced) [pdf, other]

Title: Adaptive Partial Scanning Transmission Electron Microscopy with Reinforcement LearningAuthors: Jeffrey M. EdeComments: 13 pages, 3 figures + 1 algorithmSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
 [76] arXiv:2004.07320 (replaced) [pdf, other]

Title: Training with Quantization Noise for Extreme Model CompressionAuthors: Angela Fan, Pierre Stock, Benjamin Graham, Edouard Grave, Remi Gribonval, Herve Jegou, Armand JoulinSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [77] arXiv:2005.07587 (replaced) [pdf, ps, other]

Title: NonSparse PCA in High Dimensions via Cone Projected Power IterationSubjects: Statistics Theory (math.ST); Computation (stat.CO); Methodology (stat.ME); Machine Learning (stat.ML)
 [78] arXiv:2005.10531 (replaced) [pdf, ps, other]

Title: Supervised Learning in the Presence of Concept Drift: A modelling frameworkComments: 17 pages in twocolumnSubjects: Machine Learning (cs.LG); Statistical Mechanics (condmat.statmech); Machine Learning (stat.ML)
 [79] arXiv:2005.14708 (replaced) [pdf, other]

Title: Online DRSubmodular Maximization with Stochastic Cumulative ConstraintsComments: To appear in proceedings of AAAI 2021Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [80] arXiv:2006.06147 (replaced) [pdf, other]

Title: Implicit Kernel AttentionComments: AAAI21Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [81] arXiv:2006.06853 (replaced) [pdf, ps, other]

Title: Training a Single Bandit ArmComments: 32 pages, 3 figures, 1 tableSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [82] arXiv:2006.06981 (replaced) [pdf, other]

Title: Kernel Distributionally Robust OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [83] arXiv:2006.07886 (replaced) [pdf, other]

Title: On Disentangled Representations Learned From Correlated DataAuthors: Frederik Träuble, Elliot Creager, Niki Kilbertus, Francesco Locatello, Andrea Dittadi, Anirudh Goyal, Bernhard Schölkopf, Stefan BauerComments: 31 pages, 17 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [84] arXiv:2006.08875 (replaced) [pdf, other]

Title: Modelbased Adversarial MetaReinforcement LearningComments: Accepted by NeurIPS 2020. Code at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [85] arXiv:2006.09255 (replaced) [pdf, other]

Title: Corralling Stochastic Bandit AlgorithmsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [86] arXiv:2006.12369 (replaced) [pdf, other]

Title: Controlling for sparsity in sparse factor analysis models: adaptive latent feature sharing for piecewise linear dimensionality reductionComments: Interactive demo available at this https URLSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [87] arXiv:2006.13026 (replaced) [pdf, other]

Title: Deep Polynomial Neural NetworksAuthors: Grigorios Chrysos, Stylianos Moschoglou, Giorgos Bouritsas, Jiankang Deng, Yannis Panagakis, Stefanos ZafeiriouComments: Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Code: this https URL arXiv admin note: substantial text overlap with arXiv:2003.03828Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [88] arXiv:2006.13636 (replaced) [pdf, other]

Title: Approximating a Target Distribution using Weight QueriesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [89] arXiv:2007.00339 (replaced) [pdf, other]

Title: MultiTask Variational Information BottleneckComments: 10 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [90] arXiv:2007.04169 (replaced) [pdf, other]

Title: An exploration of the influence of path choice in gametheoretic attribution algorithmsComments: 21 pages, 12 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [91] arXiv:2007.04640 (replaced) [pdf, other]

Title: TaskAgnostic Exploration via Policy Gradient of a NonParametric State Entropy EstimateComments: In 35th AAAI Conference on Artificial Intelligence (AAAI 2021)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [92] arXiv:2007.07177 (replaced) [pdf, other]

Title: MosAIc: Finding Artistic Connections across Culture with Conditional Image RetrievalAuthors: Mark Hamilton, Stephanie Fu, Mindren Lu, Johnny Bui, Darius Bopp, Zhenbang Chen, Felix Tran, Margaret Wang, Marina Rogers, Lei Zhang, Chris Hoder, William T. FreemanSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Information Retrieval (cs.IR); Machine Learning (stat.ML)
 [93] arXiv:2008.02839 (replaced) [pdf, other]

Title: Learned convex regularizers for inverse problemsAuthors: Subhadip Mukherjee, Sören Dittmer, Zakhar Shumaylov, Sebastian Lunz, Ozan Öktem, CarolaBibiane SchönliebSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [94] arXiv:2008.06081 (replaced) [pdf, other]

Title: Adversarial Training and Provable Robustness: A Tale of Two ObjectivesComments: Accepted at AAAI 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [95] arXiv:2008.07283 (replaced) [pdf, other]

Title: Estimating Causal Effects with the Neural Autoregressive Density EstimatorSubjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [96] arXiv:2008.10183 (replaced) [pdf, other]

Title: HALO: Learning to Prune Neural Networks with ShrinkageComments: Accepted at SDM 2021Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [97] arXiv:2009.00387 (replaced) [pdf, other]

Title: Boosting Share Routing for Multitask LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [98] arXiv:2009.07888 (replaced) [pdf, other]

Title: Transfer Learning in Deep Reinforcement Learning: A SurveySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [99] arXiv:2009.07938 (replaced) [pdf, ps, other]

Title: Typeaugmented Relation Prediction in Knowledge GraphsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [100] arXiv:2009.10780 (replaced) [pdf, other]

Title: Independent finite approximations for Bayesian nonparametric inferenceSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [101] arXiv:2009.10990 (replaced) [pdf, other]

Title: Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance PlansAuthors: Rohun Kshirsagar, LiYen Hsu, Vatshank Chaturvedi, Charles H. Greenberg, Matthew McClelland, Anushadevi Mohan, Wideet Shende, Nicolas P. Tilmans, Renzo Frigato, Min Guo, Ankit Chheda, Meredith Trotter, Shonket Ray, Arnold Lee, Miguel AlvaradoComments: Accepted for publication in The ThirtyFifth AAAI Conference on Artificial Intelligence (AAAI21), in the Innovative Applications of Artificial Intelligence track. This is the extended version with some stylistic fixes from the first posting and complete author listSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [102] arXiv:2010.00381 (replaced) [pdf, other]

Title: StudentInitiated Action Advising via Advice NoveltySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [103] arXiv:2010.00743 (replaced) [pdf, other]

Title: Cell Complex Neural NetworksSubjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Computer Vision and Pattern Recognition (cs.CV); Algebraic Topology (math.AT); Machine Learning (stat.ML)
 [104] arXiv:2010.00904 (replaced) [pdf, other]

Title: Autoregressive Entity RetrievalComments: Accepted (spotlight) at International Conference on Learning Representations (ICLR) 2021. Code at this https URL 20 pages, 9 figures, 8 tablesSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [105] arXiv:2010.02459 (replaced) [pdf, other]

Title: Usable Information and Evolution of Optimal Representations During TrainingComments: ICLR 2021Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
 [106] arXiv:2010.04050 (replaced) [pdf, other]

Title: A survey of algorithmic recourse: definitions, formulations, solutions, and prospectsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [107] arXiv:2010.05820 (replaced) [pdf, other]

Title: Permutation invariant networks to learn Wasserstein metricsComments: Fix typos, Accepted as a spotlight at Topological Data Analysis and Beyond Workshop at Neurips 2020. Added more experiments and results. Comments welcomeSubjects: Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
 [108] arXiv:2010.10346 (replaced) [pdf, other]

Title: Deep Importance Sampling based on Regression for Model Inversion and EmulationSubjects: Computation (stat.CO); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [109] arXiv:2010.14763 (replaced) [pdf, other]

Title: Hogwild! over Distributed Local Data Sets with Linearly Increasing MiniBatch SizesAuthors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc TranDinh, Phuong Ha NguyenComments: arXiv admin note: substantial text overlap with arXiv:2007.09208 AISTATS 2021Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [110] arXiv:2010.15285 (replaced) [pdf, other]

Title: Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and GraphsComments: Improved exposition, add resampling based test, source codeSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
 [111] arXiv:2011.00364 (replaced) [pdf, ps, other]

Title: Efficient Methods for Structured NonconvexNonconcave MinMax OptimizationComments: in Proc. AISTATS'21Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [112] arXiv:2011.00576 (replaced) [pdf, other]

Title: Experimental Design for Regret Minimization in Linear BanditsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [113] arXiv:2011.03178 (replaced) [pdf, other]

Title: Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?Comments: AISTATS 2021 (Oral)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [114] arXiv:2011.05824 (replaced) [pdf, other]

Title: SemiStructured Deep Piecewise Exponential ModelsAuthors: Philipp Kopper, Sebastian Pölsterl, Christian Wachinger, Bernd Bischl, Andreas Bender, David RügamerComments: 8 pages, 3 figures, Accepted at the AAAI spring symposium: Survival PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [115] arXiv:2012.01205 (replaced) [pdf, other]

Title: VisEvol: Visual Analytics to Support Hyperparameter Search through Evolutionary OptimizationComments: This manuscript is currently under reviewSubjects: Machine Learning (cs.LG); HumanComputer Interaction (cs.HC); Machine Learning (stat.ML)
 [116] arXiv:2101.05201 (replaced) [pdf, other]

Title: Optimisation of Spectral Wavelets for Persistencebased Graph ClassificationSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [117] arXiv:2101.09436 (replaced) [pdf, other]

Title: Hierarchical Variational AutoEncoding for Unsupervised Domain GeneralizationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [118] arXiv:2101.09763 (replaced) [pdf, other]

Title: Analysing the Noise Model Error for Realistic Noisy Label DataComments: Accepted at AAAI 2021, additional material at this https URLSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [119] arXiv:2102.07206 (replaced) [pdf, other]

Title: Sample Efficient Subspacebased Representations for Nonlinear MetaLearningComments: To appear in ICASSP 21'Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [120] arXiv:2102.08248 (replaced) [pdf, other]

Title: Hierarchical VAEs Know What They Don't KnowSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [121] arXiv:2102.09972 (replaced) [pdf, other]

Title: Implicit Regularization in Tensor FactorizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [122] arXiv:2102.11860 (replaced) [pdf, other]

Title: Automated Discovery of Adaptive Attacks on Adversarial DefensesComments: 16 pages, 4 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [123] arXiv:2102.11866 (replaced) [pdf, ps, other]

Title: Doubly Robust OffPolicy ActorCritic: Convergence and OptimalityComments: Submitted for publicationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2103, contact, help (Access key information)