We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 255 entries: 1-255 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Tue, 14 Jul 20

[1]  arXiv:2007.05554 [pdf, other]
Title: Bayesian Optimization of Risk Measures
Comments: The paper is 12 pages and includes 3 figures. The supplement is an additional 11 pages with 2 figures. The paper is currently under review for Neurips 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

We consider Bayesian optimization of objective functions of the form $\rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function and $\rho$ denotes either the VaR or CVaR risk measure, computed with respect to the randomness induced by the environmental random variable $W$. Such problems arise in decision making under uncertainty, such as in portfolio optimization and robust systems design. We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency. Instead of modeling the objective function directly as is typical in Bayesian optimization, these algorithms model $F$ as a Gaussian process, and use the implied posterior on the objective function to decide which points to evaluate. We demonstrate the effectiveness of our approach in a variety of numerical experiments.

[2]  arXiv:2007.05610 [pdf, other]
Title: Batch-Incremental Triplet Sampling for Training Triplet Networks Using Bayesian Updating Theorem
Comments: The first two authors contributed equally to this work
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Variants of Triplet networks are robust entities for learning a discriminative embedding subspace. There exist different triplet mining approaches for selecting the most suitable training triplets. Some of these mining methods rely on the extreme distances between instances, and some others make use of sampling. However, sampling from stochastic distributions of data rather than sampling merely from the existing embedding instances can provide more discriminative information. In this work, we sample triplets from distributions of data rather than from existing instances. We consider a multivariate normal distribution for the embedding of each class. Using Bayesian updating and conjugate priors, we update the distributions of classes dynamically by receiving the new mini-batches of training data. The proposed triplet mining with Bayesian updating can be used with any triplet-based loss function, e.g., triplet-loss or Neighborhood Component Analysis (NCA) loss. Accordingly, Our triplet mining approaches are called Bayesian Updating Triplet (BUT) and Bayesian Updating NCA (BUNCA), depending on which loss function is being used. Experimental results on two public datasets, namely MNIST and histopathology colorectal cancer (CRC), substantiate the effectiveness of the proposed triplet mining method.

[3]  arXiv:2007.05627 [pdf, other]
Title: A Performance Guarantee for Spectral Clustering
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The two-step spectral clustering method, which consists of the Laplacian eigenmap and a rounding step, is a widely used method for graph partitioning. It can be seen as a natural relaxation to the NP-hard minimum ratio cut problem. In this paper we study the central question: when is spectral clustering able to find the global solution to the minimum ratio cut problem? First we provide a condition that naturally depends on the intra- and inter-cluster connectivities of a given partition under which we may certify that this partition is the solution to the minimum ratio cut problem. Then we develop a deterministic two-to-infinity norm perturbation bound for the the invariant subspace of the graph Laplacian that corresponds to the $k$ smallest eigenvalues. Finally by combining these two results we give a condition under which spectral clustering is guaranteed to output the global solution to the minimum ratio cut problem, which serves as a performance guarantee for spectral clustering.

[4]  arXiv:2007.05670 [pdf, other]
Title: An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

The evaluation of hyperparameters, neural architectures, or data augmentation policies becomes a critical model selection problem in advanced deep learning with a large hyperparameter search space. In this paper, we propose an efficient and robust bandit-based algorithm called Sub-Sampling (SS) in the scenario of hyperparameter search evaluation. It evaluates the potential of hyperparameters by the sub-samples of observations and is theoretically proved to be optimal under the criterion of cumulative regret. We further combine SS with Bayesian Optimization and develop a novel hyperparameter optimization algorithm called BOSS. Empirical studies validate our theoretical arguments of SS and demonstrate the superior performance of BOSS on a number of applications, including Neural Architecture Search (NAS), Data Augmentation (DA), Object Detection (OD), and Reinforcement Learning (RL).

[5]  arXiv:2007.05692 [pdf, other]
Title: How Does GAN-based Semi-supervised Learning Work?
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Generative adversarial networks (GANs) have been widely used and have achieved competitive results in semi-supervised learning. This paper theoretically analyzes how GAN-based semi-supervised learning (GAN-SSL) works. We first prove that, given a fixed generator, optimizing the discriminator of GAN-SSL is equivalent to optimizing that of supervised learning. Thus, the optimal discriminator in GAN-SSL is expected to be perfect on labeled data. Then, if the perfect discriminator can further cause the optimization objective to reach its theoretical maximum, the optimal generator will match the true data distribution. Since it is impossible to reach the theoretical maximum in practice, one cannot expect to obtain a perfect generator for generating data, which is apparently different from the objective of GANs. Furthermore, if the labeled data can traverse all connected subdomains of the data manifold, which is reasonable in semi-supervised classification, we additionally expect the optimal discriminator in GAN-SSL to also be perfect on unlabeled data. In conclusion, the minimax optimization in GAN-SSL will theoretically output a perfect discriminator on both labeled and unlabeled data by unexpectedly learning an imperfect generator, i.e., GAN-SSL can effectively improve the generalization ability of the discriminator by leveraging unlabeled information.

[6]  arXiv:2007.05709 [pdf, other]
Title: Scoring Interval Forecasts: Equal-Tailed, Shortest, and Modal Interval
Comments: 22 pages
Subjects: Statistics Theory (math.ST)

We consider different types of predictive intervals and ask whether they are elicitable, i.e. are unique minimizers of a loss or scoring function in expectation. The equal-tailed interval is elicitable, with a rich class of suitable loss functions, though subject to either translation invariance, or positive homogeneity and differentiability, the Winkler interval score becomes a unique choice. The modal interval also is elicitable, with a sole consistent scoring function, up to equivalence. However, the shortest interval fails to be elicitable relative to practically relevant classes of distributions. These results provide guidance in interval forecast evaluation and support recent choices of performance measures in forecast competitions.

[7]  arXiv:2007.05721 [pdf, other]
Title: Towards Robust Classification with Deep Generative Forests
Comments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Decision Trees and Random Forests are among the most widely used machine learning models, and often achieve state-of-the-art performance in tabular, domain-agnostic datasets. Nonetheless, being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions. In this paper, we exploit Generative Forests (GeFs), a recent class of deep probabilistic models that addresses these issues by extending Random Forests to generative models representing the full joint distribution over the feature space. We demonstrate that GeFs are uncertainty-aware classifiers, capable of measuring the robustness of each prediction as well as detecting out-of-distribution samples.

[8]  arXiv:2007.05724 [pdf, other]
Title: Learning Randomly Perturbed Structured Predictors for Direct Loss Minimization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Direct loss minimization is a popular approach for learning predictors over structured label spaces. This approach is computationally appealing as it replaces integration with optimization and allows to propagate gradients in a deep net using loss-perturbed prediction. Recently, this technique was extended to generative models, while introducing a randomized predictor that samples a structure from a randomly perturbed score function. In this work, we learn the variance of these randomized structured predictors and show that it balances better between the learned score function and the randomized noise in structured prediction. We demonstrate empirically the effectiveness of learning the balance between the signal and the random noise in structured discrete spaces.

[9]  arXiv:2007.05737 [pdf, ps, other]
Title: Empirical process theory for locally stationary processes
Subjects: Statistics Theory (math.ST)

We provide a framework for empirical process theory of locally stationary processes using the functional dependence measure. Our results extend known results for stationary mixing sequences by another common possibility to measure dependence and allow for additional time dependence. We develop maximal inequalities for expectations and provide functional limit theorems and Bernstein-type inequalities. We show their applicability to a variety of situations, for instance we prove the weak functional convergence of the empirical distribution function and uniform convergence rates for kernel density and regression estimation if the observations are locally stationary processes.

[10]  arXiv:2007.05748 [pdf, ps, other]
Title: Frequentism-as-model
Authors: Christian Hennig
Comments: 34 pages no figures
Subjects: Other Statistics (stat.OT); Methodology (stat.ME)

Most statisticians are aware that probability models interpreted in a frequentist manner are not really true in objective reality, but only idealisations. I argue that this is often ignored when actually applying frequentist methods and interpreting the results, and that keeping up the awareness for the essential difference between reality and models can lead to a more appropriate use and interpretation of frequentist models and methods, called frequentism-as-model. This is elaborated showing connections to existing work, appreciating the special role of i.i.d. models and subject matter knowledge, giving an account of how and under what conditions models that are not true can be useful, giving detailed interpretations of tests and confidence intervals, confronting their implicit compatibility logic with the inverse probability logic of Bayesian inference, re-interpreting the role of model assumptions, appreciating robustness, and the role of ``interpretative equivalence'' of models. Epistemic (often referred to as Bayesian) probability shares the issue that its models are only idealisations and not really true for modelling reasoning about uncertainty, meaning that it does not have an essential advantage over frequentism, as is often claimed. Bayesian statistics can be combined with frequentism-as-model, leading to what Gelman and Hennig (2017) call ``falsificationist Bayes''.

[11]  arXiv:2007.05812 [pdf, ps, other]
Title: Exact Bayesian inference for diffusion driven Cox processes
Subjects: Methodology (stat.ME)

In this paper we present a novel methodology to perform Bayesian inference for Cox processes in which the intensity function is driven by a diffusion process. The novelty lies on the fact that no discretisation error is involved, despite the non-tractability of both the likelihood function and the transition density of the diffusion. The methodology is based on an MCMC algorithm and its exactness is built on retrospective sampling techniques. The efficiency of the methodology is investigated in some simulated examples and its applicability is illustrated in some real data analyses.

[12]  arXiv:2007.05857 [pdf, other]
Title: Reliability of decisions based on tests: Fourier analysis of Boolean decision functions
Comments: 41 pages, 4 figures
Subjects: Methodology (stat.ME); Other Statistics (stat.OT)

Items in a test are often used as a basis for making decisions and such tests are therefore required to have good psychometric properties, like unidimensionality. In many cases the sum score is used in combination with a threshold to decide between pass or fail, for instance. Here we consider whether such a decision function is appropriate, without a latent variable model, and which properties of a decision function are desirable. We consider reliability (stability) of the decision function, i.e., does the decision change upon perturbations, or changes in a fraction of the outcomes of the items (measurement error). We are concerned with questions of whether the sum score is the best way to aggregate the items, and if so why. We use ideas from test theory, social choice theory, graphical models, computer science and probability theory to answer these questions. We conclude that a weighted sum score has desirable properties that (i) fit with test theory and is observable (similar to a condition like conditional association), (ii) has the property that a decision is stable (reliable), and (iii) satisfies Rousseau's criterion that the input should match the decision. We use Fourier analysis of Boolean functions to investigate whether a decision function is stable and to figure out which (set of) items has proportionally too large an influence on the decision. To apply these techniques we invoke ideas from graphical models and use a pseudo-likelihood factorisation of the probability distribution.

[13]  arXiv:2007.05864 [pdf, other]
Title: Bayesian Deep Ensembles via the Neural Tangent Kernel
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member, that enables a posterior interpretation in the infinite width limit. When ensembled together, our trained NNs give an approximation to a posterior predictive distribution, and we prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit. Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks.

[14]  arXiv:2007.05894 [pdf, other]
Title: A Probabilistic Approach to Identifying Run Scoring Advantage in the Order of Playing Cricket
Subjects: Applications (stat.AP)

In the game of cricket, the result of coin toss is assumed to be one of the determinants of match outcome. The decision to bat first after winning the toss is often taken to make the best use of superior pitch conditions and set a big target for the opponent. However, the opponent may fail to show their natural batting performance in the second innings due to a number of factors, including deteriorated pitch conditions and excessive pressure of chasing a high target score. The advantage of batting first has been highlighted in the literature and expert opinions, however, the effect of batting and bowling order on match outcome has not been investigated well enough to recommend a solution to any potential bias. This study proposes a probability theory-based model to study venue-specific scoring and chasing characteristics of teams under different match outcomes. A total of 1117 one-day international matches held in ten popular venues are analyzed to show substantially high scoring advantage and likelihood when the winning team bat in the first innings. Results suggest that the same 'bat-first' winning team is very unlikely to score or chase such a high score if they were to bat in the second innings. Therefore, the coin toss decision may favor one team over the other. A Bayesian model is proposed to revise the target score for each venue such that the winning and scoring likelihood is equal regardless of the toss decision. The data and source codes have been shared publicly for future research in creating competitive match outcomes by eliminating the advantage of batting order in run scoring.

[15]  arXiv:2007.05940 [pdf, other]
Title: Perfect Sampling of Multivariate Hawkes Process
Subjects: Applications (stat.AP)

As an extension of self-exciting Hawkes process, the multivariate Hawkes process models counting processes of different types of random events with mutual excitement. In this paper, we present a perfect sampling algorithm that can generate i.i.d. stationary sample paths of multivariate Hawkes process without any transient bias. In addition, we provide an explicit expression of algorithm complexity in model and algorithm parameters and provide numerical schemes to find the optimal parameter set that minimizes the complexity of the perfect sampling algorithm.

[16]  arXiv:2007.05974 [pdf, other]
Title: Point and interval estimation of the target dose using weighted regression modelling
Subjects: Methodology (stat.ME)

In a Phase II dose-finding study with a placebo control, a new drug with several dose levels is compared with a placebo to test for the effectiveness of the new drug. The main focus of such studies often lies in the characterization of the dose-response relationship followed by the estimation of a target dose that leads to a clinically relevant effect over the placebo. This target dose is known as the minimum effective dose (MED) in a drug development study. Several approaches exist that combine multiple comparison procedures with modeling techniques to efficiently estimate the dose-response model and thereafter select the target dose. Despite the flexibility of the existing approaches, they cannot completely address the model uncertainty in the model-selection step and may lead to target dose estimates that are biased. In this article, we propose two new MED estimation approaches based on weighted regression modeling that are robust against deviations from the dose-response model assumptions. These approaches are compared with existing approaches with regard to their accuracy in point and interval estimation of the MED. We illustrate by a simulation study that by integrating one of the new dose estimation approaches with the existing dose-response profile estimation approaches one can take into account the uncertainty of the model selection step.

[17]  arXiv:2007.05994 [pdf, other]
Title: State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes
Comments: Accepted to International Conference on Machine Learning (ICML) 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We formulate approximate Bayesian inference in non-conjugate temporal and spatio-temporal Gaussian process models as a simple parameter update rule applied during Kalman smoothing. This viewpoint encompasses most inference schemes, including expectation propagation (EP), the classical (Extended, Unscented, etc.) Kalman smoothers, and variational inference. We provide a unifying perspective on these algorithms, showing how replacing the power EP moment matching step with linearisation recovers the classical smoothers. EP provides some benefits over the traditional methods via introduction of the so-called cavity distribution, and we combine these benefits with the computational efficiency of linearisation, providing extensive empirical analysis demonstrating the efficacy of various algorithms under this unifying framework. We provide a fast implementation of all methods in JAX.

[18]  arXiv:2007.06011 [pdf, other]
Title: Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies
Comments: 23 pages, 6 figures, 2 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Shapley values have become increasingly popular in the machine learning literature thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of `fairness'. The flexibility arises from the myriad potential forms of the Shapley value \textit{game formulation}. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding.
To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a (non-parametric) measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features.
We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a classical medical survey data set.

[19]  arXiv:2007.06018 [pdf, other]
Title: Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation
Comments: Accepted to International Conference on Artificial Intelligence and Statistics 2020
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG)

Auto-regressive sequence generative models trained by Maximum Likelihood Estimation suffer the exposure bias problem in practical finite sample scenarios. The crux is that the number of training samples for Maximum Likelihood Estimation is usually limited and the input data distributions are different at training and inference stages. Many method shave been proposed to solve the above problem (Yu et al., 2017; Lu et al., 2018), which relies on sampling from the non-stationary model distribution and suffers from high variance or biased estimations. In this paper, we propose{\psi}-MLE, a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation. We derive our algorithm from a new perspective of self-augmentation and introduce bias correction with density ratio estimation. Extensive experimental results on synthetic data and real-world text generation tasks demonstrate that our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.

[20]  arXiv:2007.06037 [pdf, other]
Title: Estimating Stochastic Poisson Intensities Using Deep Latent Models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We present methodology for estimating the stochastic intensity of a doubly stochastic Poisson process. Statistical and theoretical analyses of traffic traces show that these processes are appropriate models of high intensity traffic arriving at an array of service systems. The statistical estimation of the underlying latent stochastic intensity process driving the traffic model involves a rather complicated nonlinear filtering problem. We develop a novel simulation methodology, using deep neural networks to approximate the path measures induced by the stochastic intensity process, for solving this nonlinear filtering problem. Our simulation studies demonstrate that the method is quite accurate on both in-sample estimation and on an out-of-sample performance prediction task for an infinite server queue.

[21]  arXiv:2007.06038 [pdf, other]
Title: Fiducial Matching for the Approximate Posterior: F-ABC
Subjects: Methodology (stat.ME)

F-ABC is introduced, using universal sufficient statistics, unlike previous ABC papers, e.g. Bernton et al. (2019), and avoiding in the approximate posterior artifacts due to a Kernel. The nature of matching tolerance is examined and indications for determining its values are presented. F-ABC does not face concerns associated with ABC. Asymptotics and simulation results are also presented.

[22]  arXiv:2007.06054 [pdf, other]
Title: Robust and flexible inference for the covariate-specific ROC curve
Subjects: Methodology (stat.ME)

Diagnostic tests are of critical importance in health care and medical research. Motivated by the impact that atypical and outlying test outcomes might have on the assessment of the discriminatory ability of a diagnostic test, we develop a flexible and robust model for conducting inference about the covariate-specific receiver operating characteristic (ROC) curve that safeguards against outlying test results while also accommodating for possible nonlinear effects of the covariates. Specifically, we postulate a location-scale additive regression model for the test outcomes in in both the diseased and nondiseased populations, combining additive cubic B-splines and M-estimation for the regression function, while the residuals are estimated via a weighted empirical distribution function. The results of the simulation study show that our approach successfully recovers the true covariate-specific ROC curve and corresponding area under the curve on a variety of conceivable test outcomes contamination scenarios. Our method is applied to a dataset derived from a prostate cancer study where we seek to assess the ability of the Prostate Health Index to discriminate between men with and without Gleason 7 or above prostate cancer, and if and how such discriminatory capacity changes with age.

[23]  arXiv:2007.06065 [pdf, other]
Title: The Effects of Vacant Lot Greening and the Impact of Land Use and Business Vibrancy
Subjects: Other Statistics (stat.OT)

We examine the ongoing Philadelphia LandCare (PLC) vacant lot greening initiative and evaluate the association between this built environment intervention and changes in crime incidence. We develop a propensity score matching analysis that estimates the effect of vacant lot greening on different types of crime while accounting for substantial differences between greened and ungreened lots in terms of their surrounding demographic, economic, land use and business vibrancy characteristics. Within these matched pairs of greened vs. ungreened vacant lots, we estimate larger and more significant beneficial effects of greening for reducing violent, non-violent and total crime compared to comparisons of greened vs. ungreened lots without matching. We also investigate the impact of land use zoning and business vibrancy and find that the effect of vacant lot greening on total crime is substantially affected by particular types of surrounding land use zoning and the presence of certain business types.

[24]  arXiv:2007.06072 [pdf, other]
Title: A spectral algorithm for robust regression with subgaussian rates
Authors: Jules Depersin
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)

We study a new linear up to quadratic time algorithm for linear regression in the absence of strong assumptions on the underlying distributions of samples, and in the presence of outliers. The goal is to design a procedure which comes with actual working code that attains the optimal sub-gaussian error bound even though the data have only finite moments (up to $L_4$) and in the presence of possibly adversarial outliers. A polynomial-time solution to this problem has been recently discovered but has high runtime due to its use of Sum-of-Square hierarchy programming. At the core of our algorithm is an adaptation of the spectral method introduced for the mean estimation problem to the linear regression problem. As a by-product we established a connection between the linear regression problem and the furthest hyperplane problem. From a stochastic point of view, in addition to the study of the classical quadratic and multiplier processes we introduce a third empirical process that comes naturally in the study of the statistical properties of the algorithm.

[25]  arXiv:2007.06075 [pdf, other]
Title: Learning latent stochastic differential equations with variational auto-encoders
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We present a method for learning latent stochastic differential equations (SDEs) from high dimensional time series data. Given a time series generated from a lower dimensional It\^{o} process, the proposed method uncovers the relevant parameters of the SDE through a self-supervised learning approach. Using the framework of variational autoencoders (VAEs), we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of semi-supervised learning to show that our model can recover not only the underlying SDE parameters, but also the original latent space, up to an isometry, in the limit of infinite data. We validate the model through a series of different simulated video processing tasks where the underlying SDE is known. Our results suggest that the proposed method effectively learns the underlying SDE, as predicted by the theory.

[26]  arXiv:2007.06076 [pdf, other]
Title: svReg: Structural Varying-coefficient regression to differentiate how regional brain atrophy affects motor impairment for Huntington disease severity groups
Subjects: Methodology (stat.ME)

For Huntington disease, identification of brain regions related to motor impairment can be useful for developing interventions to alleviate the motor symptom, the major symptom of the disease. However, the effects from the brain regions to motor impairment may vary for different groups of patients. Hence, our interest is not only to identify the brain regions but also to understand how their effects on motor impairment differ by patient groups. This can be cast as a model selection problem for a varying-coefficient regression. However, this is challenging when there is a pre-specified group structure among variables. We propose a novel variable selection method for a varying-coefficient regression with such structured variables. Our method is empirically shown to select relevant variables consistently. Also, our method screens irrelevant variables better than existing methods. Hence, our method leads to a model with higher sensitivity, lower false discovery rate and higher prediction accuracy than the existing methods. Finally, we found that the effects from the brain regions to motor impairment differ by disease severity of the patients. To the best of our knowledge, our study is the first to identify such interaction effects between the disease severity and brain regions, which indicates the need for customized intervention by disease severity.

[27]  arXiv:2007.06084 [pdf, other]
Title: Bayesian probabilistic models for corporate context, with an application to internal audit activities
Comments: 34 pages, 8 figures, 10 tables
Subjects: Applications (stat.AP)

In this paper we present a business case carried out in Poste Italiane, in the context of fair performance evaluations of human resources engaged in internal audit activities. In addition to the development of a Bayesian network supporting the goal of the Internal Audit unit of Poste Italiane, the work has led to the development of a methodological approach to advanced analytics in corporate context, whose usefulness goes well beyond the specific use case described here. We thus present the different stages of such analytical strategy, from feature selection, to model structure inference and model selection, as a general toolbox that allows a completely transparent and explainable process to support data-driven decisions in business environments.

[28]  arXiv:2007.06096 [pdf, other]
Title: BaCOUn: Bayesian Classifers with Out-of-Distribution Uncertainty
Comments: ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Traditional training of deep classifiers yields overconfident models that are not reliable under dataset shift. We propose a Bayesian framework to obtain reliable uncertainty estimates for deep classifiers. Our approach consists of a plug-in "generator" used to augment the data with an additional class of points that lie on the boundary of the training data, followed by Bayesian inference on top of features that are trained to distinguish these "out-of-distribution" points.

[29]  arXiv:2007.06101 [pdf, ps, other]
Title: Multiple Imputation and Synthetic Data Generation with the R package NPBayesImputeCat
Subjects: Computation (stat.CO); Applications (stat.AP)

In many contexts, missing data and disclosure control are ubiquitous and difficult issues. In particular at statistical agencies, the respondent-level data they collect from surveys and censuses can suffer from high rates of missingness. Furthermore, agencies are obliged to protect respondents' privacy when publishing the collected data for public use. The NPBayesImputeCat R package, introduced in this paper, provides routines to i) create multiple imputations for missing data, and ii) create synthetic data for disclosure control, for multivariate categorical data, with or without structural zeros. We describe the Dirichlet Process mixtures of products of multinomials (DPMPM) models used in the package, and illustrate various uses of the package using data samples from the American Community Survey (ACS).

[30]  arXiv:2007.06114 [pdf, ps, other]
Title: Simultaneous Feature Selection and Outlier Detection with Optimality Guarantees
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)

Sparse estimation methods capable of tolerating outliers have been broadly investigated in the last decade. We contribute to this research considering high-dimensional regression problems contaminated by multiple mean-shift outliers which affect both the response and the design matrix. We develop a general framework for this class of problems and propose the use of mixed-integer programming to simultaneously perform feature selection and outlier detection with provably optimal guarantees. We characterize the theoretical properties of our approach, i.e. a necessary and sufficient condition for the robustly strong oracle property, which allows the number of features to exponentially increase with the sample size; the optimal estimation of the parameters; and the breakdown point of the resulting estimates. Moreover, we provide computationally efficient procedures to tune integer constraints and to warm-start the algorithm. We show the superior performance of our proposal compared to existing heuristic methods through numerical simulations and an application investigating the relationships between the human microbiome and childhood obesity.

[31]  arXiv:2007.06120 [pdf, other]
Title: Fisher Auto-Encoders
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.

[32]  arXiv:2007.06129 [pdf, other]
Title: The Dependent Dirichlet Process and Related Models
Subjects: Methodology (stat.ME)

Standard regression approaches assume that some finite number of the response distribution characteristics, such as location and scale, change as a (parametric or nonparametric) function of predictors. However, it is not always appropriate to assume a location/scale representation, where the error distribution has unchanging shape over the predictor space. In fact, it often happens in applied research that the distribution of responses under study changes with predictors in ways that cannot be reasonably represented by a finite dimensional functional form. This can seriously affect the answers to the scientific questions of interest, and therefore more general approaches are indeed needed. This gives rise to the study of fully nonparametric regression models. We review some of the main Bayesian approaches that have been employed to define probability models where the complete response distribution may vary flexibly with predictors. We focus on developments based on modifications of the Dirichlet process, historically termed dependent Dirichlet processes, and some of the extensions that have been proposed to tackle this general problem using nonparametric approaches.

[33]  arXiv:2007.06136 [pdf, other]
Title: Bayesian Bi-clustering Methods with Applications in Computational Biology
Subjects: Applications (stat.AP)

Bi-clustering is a useful approach in analyzing biology data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in high dimensions, and propose three Bayesian bi-clustering models on categorical data, which increase in complexities in terms of modeling the distributions of features across bi-clusters. Our proposed methods apply to a wide range of scenarios: from situations where data are distinguished only among a small subset of features but masked by a large amount of noise, to situations where different groups of data are identified by different sets of features, to situations where data exhibits hierarchical structures. Through simulation studies, we show that our methods outperform existing (bi-)clustering methods in both identifying clusters and recovering feature distributional patterns across bi-clusters. We apply our methods to two genetic datasets, though the area of application of our methods is even broader. Our methods show satisfactory performance in real data analysis, and reveal cluster-level relationships.

[34]  arXiv:2007.06154 [pdf, other]
Title: A comprehensive empirical power comparison of univariate goodness-of-fit tests for the Laplace distribution
Comments: 37 pages, 1 figure, 20 tables
Subjects: Methodology (stat.ME)

In this paper, we do a comprehensive survey of all univariate goodness-of-fit tests that we could find in the literature for the Laplace distribution, which amounts to a total of 45 different test statistics. After eliminating duplicates and considering parameters that yield the best power for each test, we obtain a total of 38 different test statistics. An empirical power comparison study of unmatched size is then conducted using Monte Carlo simulations, with 400 alternatives spanning over 20 families of distributions, for various sample sizes and confidence levels. A discussion of the results follows, where the best tests are selected for different classes of alternatives. A similar study was conducted for the normal distribution in Rom\~ao et al. (2010), although on a smaller scale. Our work improves significantly on Puig & Stephens (2000), which was previously the best-known reference of this kind for the Laplace distribution. All test statistics and alternatives considered here are integrated within the PoweR package for the R software.

[35]  arXiv:2007.06160 [pdf, other]
Title: Nested Dirichlet Process For Population Size Estimation From Multi-list Recapture Data
Comments: 24 pages, 9 figures, submitted to Biometrics for review
Subjects: Applications (stat.AP); Methodology (stat.ME)

Heterogeneity of response patterns is important in estimating the size of a closed population from multiple recapture data when capture patterns are different over time and location. In this paper, we extend the non-parametric one layer latent class model for multiple recapture data proposed by Manrique-Vallier (2016) to a nested latent class model with the first layer modeling individual heterogeneity and the second layer modeling location-time differences. Location-time groups with similar recording patterns are in the same top layer latent class and individuals within each top layer class are dependent. The nested latent class model incorporates hierarchical heterogeneity into the modeling to estimate population size from multi-list recapture data. This approach leads to more accurate population size estimation and reduced uncertainty. We apply the method to estimating casualties from the Syrian conflict.

[36]  arXiv:2007.06283 [pdf, ps, other]
Title: Functions with average smoothness: structure, algorithms, and learning
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR)

We initiate a program of average-smoothness analysis for efficiently learning real-valued functions on metric spaces. Rather than using the (global) Lipschitz constant as the regularizer, we define a local slope at each point and gauge the function complexity as the average of these values. Since the average is often much smaller than the maximum, this complexity measure can yield considerably sharper generalization bounds --- assuming that these admit a refinement where the global Lipschitz constant is replaced by our average of local slopes. Our first major contribution is to obtain just such distribution-sensitive bounds. This required overcoming a number of technical challenges, perhaps the most significant of which was bounding the {\em empirical} covering numbers, which can be much worse-behaved than the ambient ones. This in turn is based on a novel Lipschitz-type extension, which is a pointwise minimizer of the local slope, and may be of independent interest. Our combinatorial results are accompanied by efficient algorithms for denoising the random sample, as well as guarantees that the extension from the sample to the whole space will continue to be, with high probability, smooth on average. Along the way we discover a surprisingly rich combinatorial and analytic structure in the function class we define.

[37]  arXiv:2007.06298 [pdf, other]
Title: Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison
Subjects: Methodology (stat.ME); Computation (stat.CO)

Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse, nonparametric and machine learning procedures may thus provide a useful alternative to traditional imputation procedures for deriving a set of imputed values. In this paper, we conduct an extensive empirical investigation that compares a number of imputation procedures in terms of bias and efficiency in a wide variety of settings, including high-dimensional data sets. The results suggest that a number of machine learning procedures perform very well in terms of bias and efficiency.

[38]  arXiv:2007.06299 [pdf, other]
Title: Monitoring and explainability of models in production
Comments: Workshop on Challenges in Deploying and Monitoring Machine Learning Systems (ICML 2020)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The machine learning lifecycle extends beyond the deployment stage. Monitoring deployed models is crucial for continued provision of high quality machine learning enabled services. Key areas include model performance and data monitoring, detecting outliers and data drift using statistical techniques, and providing explanations of historic predictions. We discuss the challenges to successful implementation of solutions in each of these areas with some recent examples of production ready solutions using open source tools.

[39]  arXiv:2007.06352 [pdf, other]
Title: Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)

In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.

[40]  arXiv:2007.06357 [pdf, other]
Title: Feasible Inference for Stochastic Volatility in Brownian Semistationary Processes
Comments: 21 pages, 7 figures
Subjects: Statistics Theory (math.ST); Applications (stat.AP); Methodology (stat.ME)

This article studies the finite sample behaviour of a number of estimators for the integrated power volatility process of a Brownian semistationary process in the non semi-martingale setting. We establish three consistent feasible estimators for the integrated volatility, two derived from parametric methods and one non-parametrically. We then use a simulation study to compare the convergence properties of the estimators to one another, and to a benchmark of an infeasible estimator. We further establish bounds for the asymptotic variance of the infeasible estimator and assess whether a central limit theorem which holds for the infeasible estimator can be translated into a feasible limit theorem for the non-parametric estimator.

[41]  arXiv:2007.06363 [pdf, other]
Title: Orthogonally Decoupled Variational Fourier Features
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Sparse inducing points have long been a standard method to fit Gaussian processes to big data. In the last few years, spectral methods that exploit approximations of the covariance kernel have shown to be competitive. In this work we exploit a recently introduced orthogonally decoupled variational basis to combine spectral methods and sparse inducing points methods. We show that the method is competitive with the state-of-the-art on synthetic and on real-world data.

[42]  arXiv:2007.06380 [pdf, other]
Title: Synthetic Aperture Radar Image Formation with Uncertainty Quantification
Subjects: Applications (stat.AP); Image and Video Processing (eess.IV)

Synthetic aperture radar (SAR) is a day or night any-weather imaging modality that is an important tool in remote sensing. Most existing SAR image formation methods result in a maximum a posteriori image which approximates the reflectivity of an unknown ground scene. This single image provides no quantification of the certainty with which the features in the estimate should be trusted. In addition, finding the mode is generally not the best way to interrogate a posterior. This paper addresses these issues by introducing a sampling framework to SAR image formation. A hierarchical Bayesian model is constructed using conjugate priors that directly incorporate coherent imaging and the problematic speckle phenomenon which is known to degrade image quality. Samples of the resulting posterior as well as parameters governing speckle and noise are obtained using a Gibbs sampler. These samples may then be used to compute estimates, and also to derive other statistics like variance which aid in uncertainty quantification. The latter information is particularly important in SAR, where ground truth images even for synthetically-created examples are typically unknown. An example result using real-world data shows that the sampling-based approach introduced here to SAR image formation provides parameter-free estimates with improved contrast and significantly reduced speckle, as well as unprecedented uncertainty quantification information.

[43]  arXiv:2007.06382 [pdf, ps, other]
Title: A class of ie-merging functions
Comments: 9 pages
Subjects: Statistics Theory (math.ST)

We describe a general class of ie-merging functions and pose the problem of finding ie-merging functions outside this class.

[44]  arXiv:2007.06388 [pdf, ps, other]
Title: Adaptive minimax testing for circular convolution
Subjects: Statistics Theory (math.ST)

Given observations from a circular random variable contaminated by an additive measurement error, we consider the problem of minimax optimal goodness-of-fit testing in a non-asymptotic framework. We propose direct and indirect testing procedures using a projection approach. The structure of the optimal tests depends on regularity and ill-posedness parameters of the model, which are unknown in practice. Therefore, adaptive testing strategies that perform optimally over a wide range of regularity and ill-posedness classes simultaneously are investigated. Considering a multiple testing procedure, we obtain adaptive i.e. assumption-free procedures and analyse their performance. Compared with the non-adaptive tests, their radii of testing face a deterioration by a log-factor. We show that for testing of uniformity this loss is unavoidable by providing a lower bound. The results are illustrated considering Sobolev spaces and ordinary or super smooth error densities.

[45]  arXiv:2007.06408 [pdf, ps, other]
Title: Strong Uniform Consistency with Rates for Kernel Density Estimators with General Kernels on Manifolds
Authors: Hau-Tieng Wu, Nan Wu
Comments: 44 pages
Subjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)

We provide a strong uniform consistency result with the convergence rate for the kernel density estimation on Riemannian manifolds with Riemann integrable kernels (in the ambient Euclidean space). We also provide a strong uniform consistency result for the kernel density estimation on Riemannian manifolds with Lebesgue integrable kernels. The kernels considered in this paper are different from the kernels in the Vapnik-Chervonenkis class that are frequently considered in statistics society. We illustrate the difference when we apply them to estimate probability density function. We also provide the necessary and sufficient condition for a kernel to be Riemann integrable on a submanifold in the Euclidean space.

[46]  arXiv:2007.06461 [pdf, ps, other]
Title: Minimum Relative Entropy Inference for Normal and Monte Carlo Distributions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We represent affine sub-manifolds of exponential family distributions as minimum relative entropy sub-manifolds. With such representation we derive analytical formulas for the inference from partial information on expectations and covariances of multivariate normal distributions; and we improve the numerical implementation via Monte Carlo simulations for the inference from partial information of generalized expectation type.

[47]  arXiv:2007.06476 [pdf, other]
Title: A Latent Mixture Model for Heterogeneous Causal Mechanisms in Mendelian Randomization
Comments: 38 pages, 9 figures, 2 tables
Subjects: Applications (stat.AP); Methodology (stat.ME)

Mendelian Randomization (MR) is a popular method in epidemiology and genetics that uses genetic variation as instrumental variables for causal inference. Existing MR methods usually assume most genetic variants are valid instrumental variables that identify a common causal effect. There is a general lack of awareness that this effect homogeneity assumption can be violated when there are multiple causal pathways involved, even if all the instrumental variables are valid. In this article, we introduce a latent mixture model MR-PATH that groups instruments that yield similar causal effect estimates together. We develop a Monte-Carlo EM algorithm to fit this mixture model, derive approximate confidence intervals for uncertainty quantification, and adopt a modified Bayesian Information Criterion (BIC) for model selection. We verify the efficacy of the Monte-Carlo EM algorithm, confidence intervals, and model selection criterion using numerical simulations. We identify potential mechanistic heterogeneity when applying our method to estimate the effect of high-density lipoprotein cholesterol on coronary heart disease and the effect of adiposity on type II diabetes.

[48]  arXiv:2007.06482 [pdf, other]
Title: Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of \ofulq and cast it into a constrained \textit{extended} LQR problem, where an additional control variable implicitly selects the system dynamics within a confidence interval. We then move to the corresponding Lagrangian formulation for which we prove strong duality. As a result, we show that an $\epsilon$-optimistic controller can be computed efficiently by solving at most $O\big(\log(1/\epsilon)\big)$ Riccati equations. Finally, we prove that relaxing the original \ofu problem does not impact the learning performance, thus recovering the $\tilde{O}(\sqrt{T})$ regret of \ofulq. To the best of our knowledge, this is the first computationally efficient confidence-based algorithm for LQR with worst-case optimal regret guarantees.

[49]  arXiv:2007.06541 [pdf, ps, other]
Title: Bayesian Modeling of COVID-19 Positivity Rate -- the Indiana experience
Comments: 13 pages, 7 figures and 2 tables. The numerical results provided were obtained via an updatable R Markdown document
Subjects: Methodology (stat.ME); Populations and Evolution (q-bio.PE)

In this short technical report we model, within the Bayesian framework, the rate of positive tests reported by the the State of Indiana, accounting also for the substantial variability (and overdispeartion) in the daily count of the tests performed. The approach we take, results with a simple procedure for prediction, a posteriori, of this rate of 'positivity' and allows for an easy and a straightforward adaptation by any agency tracking daily results of COVID-19 tests. The numerical results provided herein were obtained via an updatable R Markdown document.

[50]  arXiv:2007.06543 [pdf, ps, other]
Title: Dynamics of ternary statistical experiments with equilibrium state
Comments: 7 pages, 2 figures
Journal-ref: Journal of Computational & Applied Mathematics, Kiev, 2015, No.2 (119), 3-7
Subjects: Other Statistics (stat.OT)

We study the scenarios of the dynamics of ternary statistical experiments, modeled employing difference equations. The important features are a balance condition and the existence of a steady-state (equilibrium). We give a classification of scenarios of the model evolution which are significantly different between them, depending on the domain of the values of the model basic parameters.

[51]  arXiv:2007.06552 [pdf, ps, other]
Title: Relaxing the I.I.D. Assumption: Adaptive Minimax Optimal Sequential Prediction with Expert Advice
Comments: 60 pages. Blair Bilodeau and Jeffrey Negrea are equal-contribution authors; order was determined randomly
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We consider sequential prediction with expert advice when the data are generated stochastically, but the distributions generating the data may vary arbitrarily among some constraint set. We quantify relaxations of the classical I.I.D. assumption in terms of possible constraint sets, with I.I.D. at one extreme, and an adversarial mechanism at the other. The Hedge algorithm, long known to be minimax optimal in the adversarial regime, has recently been shown to also be minimax optimal in the I.I.D. setting. We show that Hedge is suboptimal between these extremes, and present a new algorithm that is adaptively minimax optimal with respect to our relaxations of the I.I.D. assumption, without knowledge of which setting prevails.

[52]  arXiv:2007.06558 [pdf, ps, other]
Title: Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC)

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that helps encourage exploration -- and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain severely limited even for the tabular setting. This paper develops $\textit{non-asymptotic}$ convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly -- or even quadratically once it enters a local region around the optimal policy -- when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-\`a-vis inexactness of policy evaluation, and is able to find an $\epsilon$-optimal policy for the original MDP when applied to a slightly perturbed MDP. Our convergence results outperform the ones established for unregularized NPG methods (arXiv:1908.00261), and shed light upon the role of entropy regularization in accelerating convergence .

Cross-lists for Tue, 14 Jul 20

[53]  arXiv:1801.00718 (cross-list from cs.CE) [pdf, other]
Title: Selective review of offline change point detection methods
Journal-ref: Signal Processing, 167:107299, 2020
Subjects: Computational Engineering, Finance, and Science (cs.CE); Computation (stat.CO); Methodology (stat.ME)

This article presents a selective survey of algorithms for the offline detection of multiple change points in multivariate time series. A general yet structuring methodological strategy is adopted to organize this vast body of work. More precisely, detection algorithms considered in this review are characterized by three elements: a cost function, a search method and a constraint on the number of changes. Each of those elements is described, reviewed and discussed separately. Implementations of the main algorithms described in this article are provided within a Python package called ruptures.

[54]  arXiv:2007.05535 (cross-list from astro-ph.CO) [pdf, other]
Title: Flow-Based Likelihoods for Non-Gaussian Inference
Comments: 14 pages, 6 figures + appendices
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flow-based generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flow-based likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly non-Gaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonly-used data-driven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with data-driven likelihoods such as these could be underestimating the impact of non-Gaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.

[55]  arXiv:2007.05542 (cross-list from q-bio.PE) [pdf, other]
Title: Climate & BCG: Effects on COVID-19 Death Growth Rates
Comments: 17 pages, 10 figures, 6 tables
Subjects: Populations and Evolution (q-bio.PE); Applications (stat.AP)

Multiple studies have suggested the spread of COVID-19 is affected by factors such as climate, BCG vaccinations, pollution and blood type. We perform a joint study of these factors using the death growth rates of 40 regions worldwide with both machine learning and Bayesian methods. We find weak, non-significant (< 3$\sigma$) evidence for temperature and relative humidity as factors in the spread of COVID-19 but little or no evidence for BCG vaccination prevalence or $\text{PM}_{2.5}$ pollution. The only variable detected at a statistically significant level (>3$\sigma$) is the rate of positive COVID-19 tests, with higher positive rates correlating with higher daily growth of deaths.

[56]  arXiv:2007.05549 (cross-list from cs.LG) [pdf, other]
Title: Meta-Learning Requires Meta-Augmentation
Comments: 14 pages, 8 figures. Code at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Meta-learning algorithms aim to learn two components: a model that predicts targets for a task, and a base learner that quickly updates that model when given examples from a new task. This additional level of learning can be powerful, but it also creates another potential source for overfitting, since we can now overfit in either the model or the base learner. We describe both of these forms of metalearning overfitting, and demonstrate that they appear experimentally in common meta-learning benchmarks. We then use an information-theoretic framework to discuss meta-augmentation, a way to add randomness that discourages the base learner and model from learning trivial solutions that do not generalize to new tasks. We demonstrate that meta-augmentation produces large complementary benefits to recently proposed meta-regularization techniques.

[57]  arXiv:2007.05553 (cross-list from cs.CR) [pdf, other]
Title: Differentially private cross-silo federated learning
Comments: 14 pages, 5 figures
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)

Strict privacy is of paramount importance in distributed machine learning. Federated learning, with the main idea of communicating only what is needed for learning, has been recently introduced as a general approach for distributed learning to enhance learning and improve security. However, federated learning by itself does not guarantee any privacy for data subjects. To quantify and control how much privacy is compromised in the worst-case, we can use differential privacy.
In this paper we combine additively homomorphic secure summation protocols with differential privacy in the so-called cross-silo federated learning setting. The goal is to learn complex models like neural networks while guaranteeing strict privacy for the individual data subjects. We demonstrate that our proposed solutions give prediction accuracy that is comparable to the non-distributed setting, and are fast enough to enable learning models with millions of parameters in a reasonable time.
To enable learning under strict privacy guarantees that need privacy amplification by subsampling, we present a general algorithm for oblivious distributed subsampling. However, we also argue that when malicious parties are present, a simple approach using distributed Poisson subsampling gives better privacy.
Finally, we show that by leveraging random projections we can further scale-up our approach to larger models while suffering only a modest performance loss.

[58]  arXiv:2007.05557 (cross-list from cs.LG) [pdf, other]
Title: Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model
Comments: Appear in COLT'2020. Updates: corrected comments on existing works; added comparison to median estimator
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

In the setting of entangled single-sample distributions, the goal is to estimate some common parameter shared by a family of $n$ distributions, given one single sample from each distribution. This paper studies mean estimation for entangled single-sample Gaussians that have a common mean but different unknown variances. We propose the subset-of-signals model where an unknown subset of $m$ variances are bounded by 1 while there are no assumptions on the other variances. In this model, we analyze a simple and natural method based on iteratively averaging the truncated samples, and show that the method achieves error $O \left(\frac{\sqrt{n\ln n}}{m}\right)$ with high probability when $m=\Omega(\sqrt{n\ln n})$, matching existing bounds for this range of $m$. We further prove lower bounds, showing that the error is $\Omega\left(\left(\frac{n}{m^4}\right)^{1/2}\right)$ when $m$ is between $\Omega(\ln n)$ and $O(n^{1/4})$, and the error is $\Omega\left(\left(\frac{n}{m^4}\right)^{1/6}\right)$ when $m$ is between $\Omega(n^{1/4})$ and $O(n^{1 - \epsilon})$ for an arbitrarily small $\epsilon>0$, improving existing lower bounds and extending to a wider range of $m$.

[59]  arXiv:2007.05558 (cross-list from cs.LG) [pdf, other]
Title: The Computational Limits of Deep Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep learning's recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image recognition, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article reports on the computational demands of Deep Learning applications in five prominent application areas and shows that progress in all five is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.

[60]  arXiv:2007.05565 (cross-list from cs.LG) [pdf, other]
Title: Reverse Annealing for Nonnegative/Binary Matrix Factorization
Comments: 9 pages, 5 figures
Subjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET); Quantum Physics (quant-ph); Machine Learning (stat.ML)

It was recently shown that quantum annealing can be used as an effective, fast subroutine in certain types of matrix factorization algorithms. The quantum annealing algorithm performed best for quick, approximate answers, but performance rapidly plateaued. In this paper, we utilize reverse annealing instead of forward annealing in the quantum annealing subroutine for nonnegative/binary matrix factorization problems. After an initial global search with forward annealing, reverse annealing performs a series of local searches that refine existing solutions. The combination of forward and reverse annealing significantly improves performance compared to forward annealing alone for all but the shortest run times.

[61]  arXiv:2007.05566 (cross-list from cs.LG) [pdf, other]
Title: Contrastive Training for Improved Out-of-Distribution Detection
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to collect in practice. We show in extensive experiments that contrastive training significantly helps OOD detection performance on a number of common benchmarks. By introducing and employing the Confusion Log Probability (CLP) score, which quantifies the difficulty of the OOD detection task by capturing the similarity of inlier and outlier datasets, we show that our method especially improves performance in the `near OOD' classes -- a particularly challenging setting for previous methods.

[62]  arXiv:2007.05572 (cross-list from cs.LG) [pdf, other]
Title: Variable Skipping for Autoregressive Range Density Estimation
Comments: ICML 2020. Code released at: this https URL
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Machine Learning (stat.ML)

Deep autoregressive models compute point likelihood estimates of individual data points. However, many applications (i.e., database cardinality estimation) require estimating range densities, a capability that is under-explored by current neural density estimation literature. In these applications, fast and accurate range density estimates over high-dimensional data directly impact user-perceived performance. In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models. This technique exploits the sparse structure of range density queries to avoid sampling unnecessary variables during approximate inference. We show that variable skipping provides 10-100$\times$ efficiency improvements when targeting challenging high-quantile error metrics, enables complex applications such as text pattern matching, and can be realized via a simple data augmentation procedure without changing the usual maximum likelihood objective.

[63]  arXiv:2007.05577 (cross-list from cs.LG) [pdf, other]
Title: Vizarel: A System to Help Better Understand RL Agents
Comments: Accepted to ICML 2020 Workshop on Human Interpretability in Machine Learning (Spotlight)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)

Visualization tools for supervised learning have allowed users to interpret, introspect, and gain intuition for the successes and failures of their models. While reinforcement learning practitioners ask many of the same questions, existing tools are not applicable to the RL setting. In this work, we describe our initial attempt at constructing a prototype of these ideas, through identifying possible features that such a system should encapsulate. Our design is motivated by envisioning the system to be a platform on which to experiment with interpretable reinforcement learning.

[64]  arXiv:2007.05611 (cross-list from cs.LG) [pdf, other]
Title: Deep Contextual Clinical Prediction with Reverse Distillation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Healthcare providers are increasingly using learned methods to predict and understand long-term patient outcomes in order to make meaningful interventions. However, despite innovations in this area, deep learning models often struggle to match performance of shallow linear models in predicting these outcomes, making it difficult to leverage such techniques in practice. In this work, motivated by the task of clinical prediction from insurance claims, we present a new technique called reverse distillation which pretrains deep models by using high-performing linear models for initialization. We make use of the longitudinal structure of insurance claims datasets to develop Self Attention with Reverse Distillation, or SARD, an architecture that utilizes a combination of contextual embedding, temporal embedding and self-attention mechanisms and most critically is trained via reverse distillation. SARD outperforms state-of-the-art methods on multiple clinical prediction outcomes, with ablation studies revealing that reverse distillation is a primary driver of these improvements.

[65]  arXiv:2007.05646 (cross-list from cs.LG) [pdf, other]
Title: Transformations between deep neural networks
Comments: 14 pages, 10 figures, submitted to Neural Information Processing Systems 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose to test, and when possible establish, an equivalence between two different artificial neural networks by attempting to construct a data-driven transformation between them, using manifold-learning techniques. In particular, we employ diffusion maps with a Mahalanobis-like metric. If the construction succeeds, the two networks can be thought of as belonging to the same equivalence class.
We first discuss transformation functions between only the outputs of the two networks; we then also consider transformations that take into account outputs (activations) of a number of internal neurons from each network. In general, Whitney's theorem dictates the number of measurements from one of the networks required to reconstruct each and every feature of the second network. The construction of the transformation function relies on a consistent, intrinsic representation of the network input space.
We illustrate our algorithm by matching neural network pairs trained to learn (a) observations of scalar functions; (b) observations of two-dimensional vector fields; and (c) representations of images of a moving three-dimensional object (a rotating horse). The construction of such equivalence classes across different network instantiations clearly relates to transfer learning. We also expect that it will be valuable in establishing equivalence between different Machine Learning-based models of the same phenomenon observed through different instruments and by different research groups.

[66]  arXiv:2007.05665 (cross-list from cs.LG) [pdf, ps, other]
Title: A Computational Separation between Private Learning and Online Learning
Authors: Mark Bun
Comments: 15 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A recent line of work has shown a qualitative equivalence between differentially private PAC learning and online learning: A concept class is privately learnable if and only if it is online learnable with a finite mistake bound. However, both directions of this equivalence incur significant losses in both sample and computational efficiency. Studying a special case of this connection, Gonen, Hazan, and Moran (NeurIPS 2019) showed that uniform or highly sample-efficient pure-private learners can be time-efficiently compiled into online learners. We show that, assuming the existence of one-way functions, such an efficient conversion is impossible even for general pure-private learners with polynomial sample complexity. This resolves a question of Neel, Roth, and Wu (FOCS 2019).

[67]  arXiv:2007.05675 (cross-list from cs.CV) [pdf, other]
Title: Coarse-to-Fine Pseudo-Labeling Guided Meta-Learning for Few-Shot Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

To endow neural networks with the potential to learn rapidly from a handful of samples, meta-learning blazes a trail to acquire across-task knowledge from a variety of few-shot learning tasks. However, most existing meta-learning algorithms retain the requirement of fine-grained supervision, which is expensive in many applications. In this paper, we show that meta-learning models can extract transferable knowledge from coarse-grained supervision for few-shot classification. We propose a weakly-supervised framework, namely Coarse-to-fine Pseudo-labeling Guided Meta-Learning (CPGML), to alleviate the need for data annotation. In our framework, the coarse-categories are grouped into pseudo sub-categories to construct a task distribution for meta-training, following the cosine distance between the corresponding embedding vectors of images. For better feature representation in this process, we develop Dual-level Discriminative Embedding (DDE) aiming to keep the distance between learned embeddings consistent with the visual similarity and semantic relation of input images simultaneously. Moreover, we propose a task-attention mechanism to reduce the weight of the training tasks with potentially higher label noises based on the observation of task-nonequivalence. Extensive experiments conducted on two hierarchical meta-learning benchmarks demonstrate that, under the proposed framework, meta-learning models can effectively extract task-independent knowledge from the roughly-generated tasks and generalize well to unseen tasks.

[68]  arXiv:2007.05683 (cross-list from cs.LG) [pdf, other]
Title: Batch-level Experience Replay with Review for Continual Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Continual learning is a branch of deep learning that seeks to strike a balance between learning stability and plasticity. The CVPR 2020 CLVision Continual Learning for Computer Vision challenge is dedicated to evaluating and advancing the current state-of-the-art continual learning methods using the CORe50 dataset with three different continual learning scenarios. This paper presents our approach, called Batch-level Experience Replay with Review, to this challenge. Our team achieved the 1'st place in all three scenarios out of 79 participated teams. The codebase of our implementation is publicly available at https://github.com/RaptorMai/CVPR20_CLVision_challenge

[69]  arXiv:2007.05690 (cross-list from cs.LG) [pdf, other]
Title: Federated Learning's Blessing: FedAvg has Linear Speedup
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-iid data across the network, low device participation, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly in regards to how convergence scales with the number of participating devices. In this paper, we focus on Federated Averaging (FedAvg)--the most widely used and effective FL algorithm in use today--and provide a comprehensive study of its convergence rate. Although FedAvg has recently been studied by an emerging line of literature, it remains open as to how FedAvg's convergence scales with the number of participating devices in the FL setting--a crucial question whose answer would shed light on the performance of FedAvg in large FL systems. We fill this gap by establishing convergence guarantees for FedAvg under three classes of problems: strongly convex smooth, convex smooth, and overparameterized strongly convex smooth problems. We show that FedAvg enjoys linear speedup in each case, although with different convergence rates. For each class, we also characterize the corresponding convergence rates for the Nesterov accelerated FedAvg algorithm in the FL setting: to the best of our knowledge, these are the first linear speedup guarantees for FedAvg when Nesterov acceleration is used. To accelerate FedAvg, we also design a new momentum-based FL algorithm that further improves the convergence rate in overparameterized linear regression problems. Empirical studies of the algorithms in various settings have supported our theoretical results.

[70]  arXiv:2007.05694 (cross-list from cs.LG) [pdf, other]
Title: Long-Term Planning with Deep Reinforcement Learning on Autonomous Drones
Authors: Ugurkan Ates
Comments: Submitted to Association for the Advancement of Artificial Intelligence(AAAI) 2020 Fall Symposium Series
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

In this paper, we study a long-term planning scenario that is based on drone racing competitions held in real life. We conducted this experiment on a framework created for "Game of Drones: Drone Racing Competition" at NeurIPS 2019. The racing environment was created using Microsoft's AirSim Drone Racing Lab. A reinforcement learning agent, a simulated quadrotor in our case, has trained with the Policy Proximal Optimization(PPO) algorithm was able to successfully compete against another simulated quadrotor that was running a classical path planning algorithm. Agent observations consist of data from IMU sensors, GPS coordinates of drone obtained through simulation and opponent drone GPS information. Using opponent drone GPS information during training helps dealing with complex state spaces, serving as expert guidance allows for efficient and stable training process. All experiments performed in this paper can be found and reproduced with code at our GitHub repository

[71]  arXiv:2007.05700 (cross-list from cs.LG) [pdf, other]
Title: M-Evolve: Structural-Mapping-Based Data Augmentation for Graph Classification
Comments: 11 pages, 9 figures
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Graph classification, which aims to identify the category labels of graphs, plays a significant role in drug classification, toxicity detection, protein analysis etc. However, the limitation of scale in the benchmark datasets makes it easy for graph classification models to fall into over-fitting and undergeneralization. To improve this, we introduce data augmentation on graphs (i.e. graph augmentation) and present four methods:random mapping, vertex-similarity mapping, motif-random mapping and motif-similarity mapping, to generate more weakly labeled data for small-scale benchmark datasets via heuristic transformation of graph structures. Furthermore, we propose a generic model evolution framework, named M-Evolve, which combines graph augmentation, data filtration and model retraining to optimize pre-trained graph classifiers. Experiments on six benchmark datasets demonstrate that the proposed framework helps existing graph classification models alleviate over-fitting and undergeneralization in the training on small-scale benchmark datasets, which successfully yields an average improvement of 3-13% accuracy on graph classification tasks.

[72]  arXiv:2007.05732 (cross-list from cs.LG) [pdf, other]
Title: Online Parameter-Free Learning of Multiple Low Variance Tasks
Journal-ref: Conference on Uncertainty in Artificial Intelligence (UAI) 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a method to learn a common bias vector for a growing sequence of low-variance tasks. Unlike state-of-the-art approaches, our method does not require tuning any hyper-parameter. Our approach is presented in the non-statistical setting and can be of two variants. The "aggressive" one updates the bias after each datapoint, the "lazy" one updates the bias only at the end of each task. We derive an across-tasks regret bound for the method. When compared to state-of-the-art approaches, the aggressive variant returns faster rates, the lazy one recovers standard rates, but with no need of tuning hyper-parameters. We then adapt the methods to the statistical setting: the aggressive variant becomes a multi-task learning method, the lazy one a meta-learning method. Experiments confirm the effectiveness of our methods in practice.

[73]  arXiv:2007.05742 (cross-list from cs.LG) [pdf, other]
Title: Relation-Guided Representation Learning
Comments: Appear in Neural Networks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Deep auto-encoders (DAEs) have achieved great success in learning data representations via the powerful representability of neural networks. But most DAEs only focus on the most dominant structures which are able to reconstruct the data from a latent space and neglect rich latent structural information. In this work, we propose a new representation learning method that explicitly models and leverages sample relations, which in turn is used as supervision to guide the representation learning. Different from previous work, our framework well preserves the relations between samples. Since the prediction of pairwise relations themselves is a fundamental problem, our model adaptively learns them from data. This provides much flexibility to encode real data manifold. The important role of relation and representation learning is evaluated on the clustering task. Extensive experiments on benchmark data sets demonstrate the superiority of our approach. By seeking to embed samples into subspace, we further show that our method can address the large-scale and out-of-sample problem.

[74]  arXiv:2007.05756 (cross-list from cs.CV) [pdf, other]
Title: Generative Graph Perturbations for Scene Graph Prediction
Comments: this https URL, ICML Workshop 2020 on "Object-Oriented Learning (OOL): Perception, Representation, and Reasoning"
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Inferring objects and their relationships from an image is useful in many applications at the intersection of vision and language. Due to a long tail data distribution, the task is challenging, with the inevitable appearance of zero-shot compositions of objects and relationships at test time. Current models often fail to properly understand a scene in such cases, as during training they only observe a tiny fraction of the distribution corresponding to the most frequent compositions. This motivates us to study whether increasing the diversity of the training distribution, by generating replacement for parts of real scene graphs, can lead to better generalization? We employ generative adversarial networks (GANs) conditioned on scene graphs to generate augmented visual features. To increase their diversity, we propose several strategies to perturb the conditioning. One of them is to use a language model, such as BERT, to synthesize plausible yet still unlikely scene graphs. By evaluating our model on Visual Genome, we obtain both positive and negative results. This prompts us to make several observations that can potentially lead to further improvements.

[75]  arXiv:2007.05758 (cross-list from cs.LG) [pdf]
Title: Feature Interactions in XGBoost
Comments: 7 pages, 2 Figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we investigate how feature interactions can be identified to be used as constraints in the gradient boosting tree models using XGBoost's implementation. Our results show that accurate identification of these constraints can help improve the performance of baseline XGBoost model significantly. Further, the improvement in the model structure can also lead to better interpretability.

[76]  arXiv:2007.05783 (cross-list from cs.LG) [pdf]
Title: Simulating multi-exit evacuation using deep reinforcement learning
Comments: 25 pages, 5 figures, submitted to Transactions in GIS
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Conventional simulations on multi-exit indoor evacuation focus primarily on how to determine a reasonable exit based on numerous factors in a changing environment. Results commonly include some congested and other under-utilized exits, especially with massive pedestrians. We propose a multi-exit evacuation simulation based on Deep Reinforcement Learning (DRL), referred to as the MultiExit-DRL, which involves in a Deep Neural Network (DNN) framework to facilitate state-to-action mapping. The DNN framework applies Rainbow Deep Q-Network (DQN), a DRL algorithm that integrates several advanced DQN methods, to improve data utilization and algorithm stability, and further divides the action space into eight isometric directions for possible pedestrian choices. We compare MultiExit-DRL with two conventional multi-exit evacuation simulation models in three separate scenarios: 1) varying pedestrian distribution ratios, 2) varying exit width ratios, and 3) varying open schedules for an exit. The results show that MultiExit-DRL presents great learning efficiency while reducing the total number of evacuation frames in all designed experiments. In addition, the integration of DRL allows pedestrians to explore other potential exits and helps determine optimal directions, leading to the high efficiency of exit utilization.

[77]  arXiv:2007.05817 (cross-list from cs.CR) [pdf, other]
Title: ManiGen: A Manifold Aided Black-box Generator of Adversarial Examples
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Machine learning models, especially neural network (NN) classifiers, have acceptable performance and accuracy that leads to their wide adoption in different aspects of our daily lives. The underlying assumption is that these models are generated and used in attack free scenarios. However, it has been shown that neural network based classifiers are vulnerable to adversarial examples. Adversarial examples are inputs with special perturbations that are ignored by human eyes while can mislead NN classifiers. Most of the existing methods for generating such perturbations require a certain level of knowledge about the target classifier, which makes them not very practical. For example, some generators require knowledge of pre-softmax logits while others utilize prediction scores.
In this paper, we design a practical black-box adversarial example generator, dubbed ManiGen. ManiGen does not require any knowledge of the inner state of the target classifier. It generates adversarial examples by searching along the manifold, which is a concise representation of input data. Through extensive set of experiments on different datasets, we show that (1) adversarial examples generated by ManiGen can mislead standalone classifiers by being as successful as the state-of-the-art white-box generator, Carlini, and (2) adversarial examples generated by ManiGen can more effectively attack classifiers with state-of-the-art defenses.

[78]  arXiv:2007.05824 (cross-list from cs.LG) [pdf, ps, other]
Title: Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics
Authors: Taiji Suzuki
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence. This potentially makes it difficult to directly deal with finite width network; especially in the neural tangent kernel regime, we cannot reveal favorable properties of neural networks beyond kernel methods. To realize more natural analysis, we consider a completely different approach in which we formulate the parameter training as a transportation map estimation and show its global convergence via the theory of the {\it infinite dimensional Langevin dynamics}. This enables us to analyze narrow and wide networks in a unifying manner. Moreover, we give generalization gap and excess risk bounds for the solution obtained by the dynamics. The excess risk bound achieves the so-called fast learning rate. In particular, we show an exponential convergence for a classification problem and a minimax optimal rate for a regression problem.

[79]  arXiv:2007.05825 (cross-list from physics.soc-ph) [pdf, other]
Title: Safer working spaces at coronavirus time: A novel use of antibody tests
Comments: 24 pages, 11 figures, preliminary preprint
Subjects: Physics and Society (physics.soc-ph); Populations and Evolution (q-bio.PE); Applications (stat.AP)

As SARS-Cov 2 spreads worldwide, governments struggle to keep people safe without collapsing the economy. Social distancing and quarantines have proven to be effective measures to save lives, yet their impact on the economy is becoming apparent. The major challenge faced by many countries at this point of the pandemic, is to find a way to keep their critical industries such as health, telecommunications, national security, transportation, food and energy functioning while having a safe environment for their workers. In this paper we propose a novel approach based on periodic SARS-CoV 2 antibody testing to reduce the risk of contagious within the working space, and evaluate it using stochastic simulations of the health evolution of the workforce. Our simulations indicate that the proper use of testing and quarantine of workers suspected of being infected can greatly reduce the number of infections while improving the productivity of the company in the long run.

[80]  arXiv:2007.05830 (cross-list from cs.LG) [pdf, other]
Title: AutoEmbedder: A semi-supervised DNN embedding system for clustering
Comments: The manuscript is accepted and published in Knowledge-Based System
Journal-ref: Knowledge-Based Systems, p.106190 (2020)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Clustering is widely used in unsupervised learning method that deals with unlabeled data. Deep clustering has become a popular study area that relates clustering with Deep Neural Network (DNN) architecture. Deep clustering method downsamples high dimensional data, which may also relate clustering loss. Deep clustering is also introduced in semi-supervised learning (SSL). Most SSL methods depend on pairwise constraint information, which is a matrix containing knowledge if data pairs can be in the same cluster or not. This paper introduces a novel embedding system named AutoEmbedder, that downsamples higher dimensional data to clusterable embedding points. To the best of our knowledge, this is the first research endeavor that relates to traditional classifier DNN architecture with a pairwise loss reduction technique. The training process is semi-supervised and uses Siamese network architecture to compute pairwise constraint loss in the feature learning phase. The AutoEmbedder outperforms most of the existing DNN based semi-supervised methods tested on famous datasets.

[81]  arXiv:2007.05838 (cross-list from cs.LG) [pdf, other]
Title: Control as Hybrid Inference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The field of reinforcement learning can be split into model-based and model-free methods. Here, we unify these approaches by casting model-free policy optimisation as amortised variational inference, and model-based planning as iterative variational inference, within a `control as hybrid inference' (CHI) framework. We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. Using a didactic experiment, we demonstrate that the proposed algorithm operates in a model-based manner at the onset of learning, before converging to a model-free algorithm once sufficient data have been collected. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines. CHI thus provides a principled framework for harnessing the sample efficiency of model-based planning while retaining the asymptotic performance of model-free policy optimisation.

[82]  arXiv:2007.05840 (cross-list from cs.LG) [pdf, other]
Title: Representation Learning via Adversarially-Contrastive Optimal Transport
Comments: Accepted at ICML 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

In this paper, we study the problem of learning compact (low-dimensional) representations for sequential data that captures its implicit spatio-temporal cues. To maximize extraction of such informative cues from the data, we set the problem within the context of contrastive representation learning and to that end propose a novel objective via optimal transport. Specifically, our formulation seeks a low-dimensional subspace representation of the data that jointly (i) maximizes the distance of the data (embedded in this subspace) from an adversarial data distribution under the optimal transport, a.k.a. the Wasserstein distance, (ii) captures the temporal order, and (iii) minimizes the data distortion. To generate the adversarial distribution, we propose a novel framework connecting Wasserstein GANs with a classifier, allowing a principled mechanism for producing good negative distributions for contrastive learning, which is currently a challenging problem. Our full objective is cast as a subspace learning problem on the Grassmann manifold and solved via Riemannian optimization. To empirically study our formulation, we provide experiments on the task of human action recognition in video sequences. Our results demonstrate competitive performance against challenging baselines.

[83]  arXiv:2007.05852 (cross-list from cs.LG) [pdf, other]
Title: Submodular Meta-Learning
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)

In this paper, we introduce a discrete variant of the meta-learning framework. Meta-learning aims at exploiting prior experience and data to improve performance on future tasks. By now, there exist numerous formulations for meta-learning in the continuous domain. Notably, the Model-Agnostic Meta-Learning (MAML) formulation views each task as a continuous optimization problem and based on prior data learns a suitable initialization that can be adapted to new, unseen tasks after a few simple gradient updates. Motivated by this terminology, we propose a novel meta-learning framework in the discrete domain where each task is equivalent to maximizing a set function under a cardinality constraint. Our approach aims at using prior data, i.e., previously visited tasks, to train a proper initial solution set that can be quickly adapted to a new task at a relatively low computational cost. This approach leads to (i) a personalized solution for each individual task, and (ii) significantly reduced computational cost at test time compared to the case where the solution is fully optimized once the new task is revealed. The training procedure is performed by solving a challenging discrete optimization problem for which we present deterministic and randomized algorithms. In the case where the tasks are monotone and submodular, we show strong theoretical guarantees for our proposed methods even though the training objective may not be submodular. We also demonstrate the effectiveness of our framework on two real-world problem instances where we observe that our methods lead to a significant reduction in computational complexity in solving the new tasks while incurring a small performance loss compared to when the tasks are fully optimized.

[84]  arXiv:2007.05860 (cross-list from math.OC) [pdf, other]
Title: Solving Bayesian Risk Optimization via Nested Stochastic Gradient Estimation
Comments: The paper is 21 pages with 3 figures. The supplement is an additional 16 pages. The paper is currently under review at IISE Transactions
Subjects: Optimization and Control (math.OC); Computation (stat.CO)

In this paper, we aim to solve Bayesian Risk Optimization (BRO), which is a recently proposed framework that formulates simulation optimization under input uncertainty. In order to efficiently solve the BRO problem, we derive nested stochastic gradient estimators and propose corresponding stochastic approximation algorithms. We show that our gradient estimators are asymptotically unbiased and consistent, and that the algorithms converge asymptotically. We demonstrate the empirical performance of the algorithms on a two-sided market model. Our estimators are of independent interest in extending the literature of stochastic gradient estimation to the case of nested risk functions.

[85]  arXiv:2007.05869 (cross-list from cs.LG) [pdf, other]
Title: Adversarially-Trained Deep Nets Transfer Better
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Transfer learning has emerged as a powerful methodology for adapting pre-trained deep neural networks to new domains. This process consists of taking a neural network pre-trained on a large feature-rich source dataset, freezing the early layers that encode essential generic image properties, and then fine-tuning the last few layers in order to capture specific information related to the target situation. This approach is particularly useful when only limited or weakly labelled data are available for the new task. In this work, we demonstrate that adversarially-trained models transfer better across new domains than naturally-trained models, even though it's known that these models do not generalize as well as naturally-trained models on the source domain. We show that this behavior results from a bias, introduced by the adversarial training, that pushes the learned inner layers to more natural image representations, which in turn enables better transfer.

[86]  arXiv:2007.05879 (cross-list from cs.LG) [pdf, other]
Title: On Improving Hotspot Detection Through Synthetic Pattern-Based Database Enhancement
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Continuous technology scaling and the introduction of advanced technology nodes in Integrated Circuit (IC) fabrication is constantly exposing new manufacturability issues. One such issue, stemming from complex interaction between design and process, is the problem of design hotspots. Such hotspots are known to vary from design to design and, ideally, should be predicted early and corrected in the design stage itself, as opposed to relying on the foundry to develop process fixes for every hotspot, which would be intractable. In the past, various efforts have been made to address this issue by using a known database of hotspots as the source of information. The majority of these efforts use either Machine Learning (ML) or Pattern Matching (PM) techniques to identify and predict hotspots in new incoming designs. However, almost all of them suffer from high false-alarm rates, mainly because they are oblivious to the root causes of hotspots. In this work, we seek to address this limitation by using a novel database enhancement approach through synthetic pattern generation based on carefully crafted Design of Experiments (DOEs). Effectiveness of the proposed method against the state-of-the-art is evaluated on a 45nm process using industry-standard tools and designs.

[87]  arXiv:2007.05881 (cross-list from cs.LG) [pdf, other]
Title: Applying recent advances in Visual Question Answering to Record Linkage
Authors: Marko Smilevski
Comments: 48 pages, 15 figures, 6 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (stat.ML)

Multi-modal Record Linkage is the process of matching multi-modal records from multiple sources that represent the same entity. This field has not been explored in research and we propose two solutions based on Deep Learning architectures that are inspired by recent work in Visual Question Answering. The neural networks we propose use two different fusion modules, the Recurrent Neural Network + Convolutional Neural Network fusion module and the Stacked Attention Network fusion module, that jointly combine the visual and the textual data of the records. The output of these fusion models is the input of a Siamese Neural Network that computes the similarity of the records. Using data from the Avito Duplicate Advertisements Detection dataset, we train these solutions and from the experiments, we concluded that the Recurrent Neural Network + Convolutional Neural Network fusion module outperforms a simple model that uses hand-crafted features. We also find that the Recurrent Neural Network + Convolutional Neural Network fusion module classifies dissimilar advertisements as similar more frequently if their average description is bigger than 40 words. We conclude that the reason for this is that the longer advertisements have a different distribution then the shorter advertisements who are more prevalent in the dataset. In the end, we also conclude that further research needs to be done with the Stacked Attention Network, to further explore the effects of the visual data on the performance of the fusion modules.

[88]  arXiv:2007.05896 (cross-list from cs.LG) [pdf, other]
Title: Learning Abstract Models for Strategic Exploration and Fast Reward Transfer
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions. However, learning an accurate Markov Decision Process (MDP) over high-dimensional states (e.g., raw pixels) is extremely challenging because it requires function approximation, which leads to compounding errors. Instead, to avoid compounding errors, we propose learning an abstract MDP over abstract states: low-dimensional coarse representations of the state (e.g., capturing agent position, ignoring other objects). We assume access to an abstraction function that maps the concrete states to abstract states. In our approach, we construct an abstract MDP, which grows through strategic exploration via planning. Similar to hierarchical RL approaches, the abstract actions of the abstract MDP are backed by learned subpolicies that navigate between abstract states. Our approach achieves strong results on three of the hardest Arcade Learning Environment games (Montezuma's Revenge, Pitfall!, and Private Eye), including superhuman performance on Pitfall! without demonstrations. After training on one task, we can reuse the learned abstract MDP for new reward functions, achieving higher reward in 1000x fewer samples than model-free methods trained from scratch.

[89]  arXiv:2007.05912 (cross-list from cs.DS) [pdf, ps, other]
Title: Robust Learning of Mixtures of Gaussians
Authors: Daniel M. Kane
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST)

We resolve one of the major outstanding problems in robust statistics. In particular, if $X$ is an evenly weighted mixture of two arbitrary $d$-dimensional Gaussians, we devise a polynomial time algorithm that given access to samples from $X$ an $\eps$-fraction of which have been adversarially corrupted, learns $X$ to error $\poly(\eps)$ in total variation distance.

[90]  arXiv:2007.05929 (cross-list from cs.LG) [pdf, other]
Title: Data-Efficient Reinforcement Learning with Momentum Predictive Representations
Comments: The first two authors contributed equally to this work
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Momentum Predictive Representations (MPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters, and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.444 on Atari in a setting limited to 100K steps of environment interaction, which is a 66% relative improvement over the previous state-of-the-art. Moreover, even in this limited data regime, MPR exceeds expert human scores on 6 out of 26 games.

[91]  arXiv:2007.05943 (cross-list from cs.LG) [pdf, other]
Title: On the generalization of Tanimoto-type kernels to real valued functions
Authors: Sandor Szedmak (1) Eric Bach (1) ((1) Department of Computer Science, Aalto University)
Comments: Pages 12, 3 PDF figures, uses arxiv.sty
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The Tanimoto kernel (Jaccard index) is a well known tool to describe the similarity between sets of binary attributes. It has been extended to the case when the attributes are nonnegative real values. This paper introduces a more general Tanimoto kernel formulation which allows to measure the similarity of arbitrary real-valued functions. This extension is constructed by unifying the representation of the attributes via properly chosen sets. After deriving the general form of the kernel, explicit feature representation is extracted from the kernel function, and a simply way of including general kernels into the Tanimoto kernel is shown. Finally, the kernel is also expressed as a quotient of piecewise linear functions, and a smooth approximation is provided.

[92]  arXiv:2007.05970 (cross-list from cs.LG) [pdf, other]
Title: Inverse Graph Identification: Can We Identify Node Labels Given Graph Labels?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph Identification (GI) has long been researched in graph learning and is essential in certain applications (e.g. social community detection). Specifically, GI requires to predict the label/score of a target graph given its collection of node features and edge connections. While this task is common, more complex cases arise in practice---we are supposed to do the inverse thing by, for example, grouping similar users in a social network given the labels of different communities. This triggers an interesting thought: can we identify nodes given the labels of the graphs they belong to? Therefore, this paper defines a novel problem dubbed Inverse Graph Identification (IGI), as opposed to GI. Upon a formal discussion of the variants of IGI, we choose a particular case study of node clustering by making use of the graph labels and node features, with an assistance of a hierarchical graph that further characterizes the connections between different graphs. To address this task, we propose Gaussian Mixture Graph Convolutional Network (GMGCN), a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI and then infers the category of each node via a Gaussian Mixture Layer (GML). The training of GMGCN is further boosted by a proposed consensus loss to take advantage of the structure of the hierarchical graph. Extensive experiments are conducted to test the rationality of the formulation of IGI. We verify the superiority of the proposed method compared to other baselines on several benchmarks we have built up. We will release our codes along with the benchmark data to facilitate more research attention to the IGI problem.

[93]  arXiv:2007.05975 (cross-list from cs.IT) [pdf, other]
Title: A Graph Symmetrisation Bound on Channel Information Leakage under Blowfish Privacy
Comments: 11 pages, 3 figures
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Blowfish privacy is a recent generalisation of differential privacy that enables improved utility while maintaining privacy policies with semantic guarantees, a factor that has driven the popularity of differential privacy in computer science. This paper relates Blowfish privacy to an important measure of privacy loss of information channels from the communications theory community: min-entropy leakage. Symmetry in an input data neighbouring relation is central to known connections between differential privacy and min-entropy leakage. But while differential privacy exhibits strong symmetry, Blowfish neighbouring relations correspond to arbitrary simple graphs owing to the framework's flexible privacy policies. To bound the min-entropy leakage of Blowfish-private mechanisms we organise our analysis over symmetrical partitions corresponding to orbits of graph automorphism groups. A construction meeting our bound with asymptotic equality demonstrates sharpness.

[94]  arXiv:2007.05986 (cross-list from math.PR) [pdf, ps, other]
Title: Technical Note -- Exact simulation of the first passage time of Brownian motion to a symmetric linear boundary
Comments: 6 pages
Subjects: Probability (math.PR); Statistics Theory (math.ST)

We state an exact simulation scheme for the first passage time of a Brownian motion to a symmetric linear boundary.

[95]  arXiv:2007.06007 (cross-list from cs.LG) [pdf, ps, other]
Title: Universal Approximation Power of Deep Neural Networks via Nonlinear Control Theory
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)

In this paper, we explain the universal approximation capabilities of deep neural networks through geometric nonlinear control. Inspired by recent work establishing links between residual networks and control systems, we provide a general sufficient condition for a residual network to have the power of universal approximation by asking the activation function, or one of its derivatives, to satisfy a quadratic differential equation. Many activation functions used in practice satisfy this assumption, exactly or approximately, and we show this property to be sufficient for an adequately deep neural network with n states to approximate arbitrarily well any continuous function defined on a compact subset of R^n. We further show this result to hold for very simple architectures, where the weights only need to assume two values. The key technical contribution consists of relating the universal approximation problem to controllability of an ensemble of control systems corresponding to a residual network, and to leverage classical Lie algebraic techniques to characterize controllability.

[96]  arXiv:2007.06024 (cross-list from cs.LG) [pdf, other]
Title: The Impossibility Theorem of Machine Fairness -- A Causal Perspective
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

With the increasing pervasive use of machine learning in social and economic settings, there has been an interest in the notion of machine bias in the AI community. Models trained on historic data reflect the biases that exist in society and are propagated to the future through their decisions. A recent study conducted by ProPublica revealed that the COMPAS recidivism prediction tool was biased against the African-American community. There are three prominent metrics of fairness used in the community, and it has been statistically proved that it is impossible to satisfy them at the same time -- which has led to ambiguity about the definition of fairness. In this report, causal perspective to the impossibility theorem of fairness is presented along with a causal goal for machine fairness.

[97]  arXiv:2007.06029 (cross-list from cs.LG) [pdf, other]
Title: Ensuring Fairness Beyond the Training Data
Comments: 18 pages, 3 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We initiate the study of fair classifiers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classifiers. In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the training samples. We formulate a min-max objective function whose goal is to minimize a distributionally robust training loss, and at the same time, find a classifier that is fair with respect to a class of distributions. We first reduce this problem to finding a fair classifier that is robust with respect to the class of distributions. Based on online learning algorithm, we develop an iterative algorithm that provably converges to such a fair and robust solution. Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classifiers, our classifier retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classifiers.

[98]  arXiv:2007.06049 (cross-list from cs.LG) [pdf, other]
Title: An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-uniform probability proportionate to their temporal-difference error. We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function with the same expected gradient. Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance. Furthermore, this relationship suggests a new branch of improvements to PER by correcting its uniformly sampled loss function equivalent. We demonstrate the effectiveness of our proposed modifications to PER and the equivalent loss function in several MuJoCo and Atari environments.

[99]  arXiv:2007.06059 (cross-list from cs.LG) [pdf, other]
Title: It Is Likely That Your Loss Should be a Likelihood
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We recall that certain common losses are simplified likelihoods and instead argue for optimizing full likelihoods that include their parameters, such as the variance of the normal distribution and the temperature of the softmax distribution. Joint optimization of likelihood and model parameters can adaptively tune the scales and shapes of losses and the weights of regularizers. We survey and systematically evaluate how to parameterize and apply likelihood parameters for robust modeling and re-calibration. Additionally, we propose adaptively tuning $L_2$ and $L_1$ weights by fitting the scale parameters of normal and Laplace priors and introduce more flexible element-wise regularizers.

[100]  arXiv:2007.06062 (cross-list from cs.LG) [pdf, other]
Title: Transfer Learning for Activity Recognition in Mobile Health
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

While activity recognition from inertial sensors holds potential for mobile health, differences in sensing platforms and user movement patterns cause performance degradation. Aiming to address these challenges, we propose a transfer learning framework, TransFall, for sensor-based activity recognition. TransFall's design contains a two-tier data transformation, a label estimation layer, and a model generation layer to recognize activities for the new scenario. We validate TransFall analytically and empirically.

[101]  arXiv:2007.06063 (cross-list from cs.LG) [pdf, other]
Title: Exploiting Uncertainties from Ensemble Learners to Improve Decision-Making in Healthcare AI
Comments: Preprint of submission to NeurIPS 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Ensemble learning is widely applied in Machine Learning (ML) to improve model performance and to mitigate decision risks. In this approach, predictions from a diverse set of learners are combined to obtain a joint decision. Recently, various methods have been explored in literature for estimating decision uncertainties using ensemble learning; however, determining which metrics are a better fit for certain decision-making applications remains a challenging task. In this paper, we study the following key research question in the selection of uncertainty metrics: when does an uncertainty metric outperforms another? We answer this question via a rigorous analysis of two commonly used uncertainty metrics in ensemble learning, namely ensemble mean and ensemble variance. We show that, under mild assumptions on the ensemble learners, ensemble mean is preferable with respect to ensemble variance as an uncertainty metric for decision making. We empirically validate our assumptions and theoretical results via an extensive case study: the diagnosis of referable diabetic retinopathy.

[102]  arXiv:2007.06068 (cross-list from cs.CV) [pdf, other]
Title: Visualizing Classification Structure in Deep Neural Networks
Comments: 2020 ICML Workshop on Human Interpretability in Machine Learning (WHI 2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a measure to compute class similarity in large-scale classification based on prediction scores. Such measure has not been formally pro-posed in the literature. We show how visualizing the class similarity matrix can reveal hierarchical structures and relationships that govern the classes. Through examples with various classifiers, we demonstrate how such structures can help in analyzing the classification behavior and in inferring potential corner cases. The source code for one example is available as a notebook at https://github.com/bilalsal/blocks

[103]  arXiv:2007.06081 (cross-list from cs.LG) [pdf, other]
Title: VAFL: a Method of Vertical Asynchronous Federated Learning
Comments: FL-ICML'20: Proc. of ICML Workshop on Federated Learning for User Privacy and Data Confidentiality, July 2020
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)

Horizontal Federated learning (FL) handles multi-client data that share the same set of features, and vertical FL trains a better predictor that combine all the features from different clients. This paper targets solving vertical FL in an asynchronous fashion, and develops a simple FL method. The new method allows each client to run stochastic gradient algorithms without coordination with other clients, so it is suitable for intermittent connectivity of clients. This method further uses a new technique of perturbed local embedding to ensure data privacy and improve communication efficiency. Theoretically, we present the convergence rate and privacy level of our method for strongly convex, nonconvex and even nonsmooth objectives separately. Empirically, we apply our method to FL on various image and healthcare datasets. The results compare favorably to centralized and synchronous FL methods.

[104]  arXiv:2007.06082 (cross-list from quant-ph) [pdf, other]
Title: Entanglement and Tensor Networks for Supervised Image Classification
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)

Tensor networks, originally designed to address computational problems in quantum many-body physics, have recently been applied to machine learning tasks. However, compared to quantum physics, where the reasons for the success of tensor network approaches over the last 30 years is well understood, very little is yet known about why these techniques work for machine learning. The goal of this paper is to investigate entanglement properties of tensor network models in a current machine learning application, in order to uncover general principles that may guide future developments. We revisit the use of tensor networks for supervised image classification using the MNIST data set of handwritten digits, as pioneered by Stoudenmire and Schwab [Adv. in Neur. Inform. Proc. Sys. 29, 4799 (2016)]. Firstly we hypothesize about which state the tensor network might be learning during training. For that purpose, we propose a plausible candidate state $|\Sigma_{\ell}\rangle$ (built as a superposition of product states corresponding to images in the training set) and investigate its entanglement properties. We conclude that $|\Sigma_{\ell}\rangle$ is so robustly entangled that it cannot be approximated by the tensor network used in that work, which must therefore be representing a very different state. Secondly, we use tensor networks with a block product structure, in which entanglement is restricted within small blocks of $n \times n$ pixels/qubits. We find that these states are extremely expressive (e.g. training accuracy of $99.97 \%$ already for $n=2$), suggesting that long-range entanglement may not be essential for image classification. However, in our current implementation, optimization leads to over-fitting, resulting in test accuracies that are not competitive with other current approaches.

[105]  arXiv:2007.06083 (cross-list from math.PR) [pdf, ps, other]
Title: On almost sure limit theorems for long-range dependent, heavy-tailed processes
Authors: Michael A. Kouritzin (1), Sounak Paul (2) ((1) University of Alberta, (2) University of Chicago)
Subjects: Probability (math.PR); Statistics Theory (math.ST)

Classical methods of inference are often rendered inapplicable while dealing with data exhibiting heavy tails, which gives rise to infinite variance and frequent extremes, and long memory, which induces inertia in the data. In this paper, we develop the Marcinkiewicz strong law of large numbers, ${n^{-\frac1p}}\sum_{k=1}^{n} (d_{k}- d)\rightarrow 0\ $ almost surely with $p\in(1,2)$, for products $d_k=\prod_{r=1}^s x_k^{(r)}$, where each $x_k^{(r)} = \sum_{l=-\infty}^{\infty}c_{k-l}^{(r)}\xi_l^{(r)}$ is a two-sided univariate linear process with coefficients $\{c_l^{(r)}\}_{l\in \mathbb{Z}}$ and i.i.d. zero-mean innovations $\{\xi_l^{(r)}\}_{l\in \mathbb{Z}}$ respectively. The decay of the coefficients $c_l^{(r)}$ as $|l|\to\infty$, can be slow enough that $\{x_k^{(r)}\}$ can have long memory while $\{d_k\}$ can have heavy tails. The aim of this paper is to handle the long-range dependence and heavy tails for $\{d_k\}$ simultaneously, and to prove a decoupling property that shows the convergence rate is dictated by the worst of long-range dependence and heavy tails, but not their combination. The multivariate linear process case is also considered.

[106]  arXiv:2007.06093 (cross-list from cs.LG) [pdf, other]
Title: Abstract Universal Approximation for Neural Networks
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Machine Learning (stat.ML)

With growing concerns about the safety and robustness of neural networks, a number of researchers have successfully applied abstract interpretation with numerical domains to verify properties of neural networks. Why do numerical domains work for neural-network verification? We present a theoretical result that demonstrates the power of numerical domains, namely, the simple interval domain, for analysis of neural networks. Our main theorem, which we call the abstract universal approximation (AUA) theorem, generalizes the recent result by Baader et al. [2020] for ReLU networks to a rich class of neural networks. The classical universal approximation theorem says that, given function $f$, for any desired precision, there is a neural network that can approximate $f$. The AUA theorem states that for any function $f$, there exists a neural network whose abstract interpretation is an arbitrarily close approximation of the collecting semantics of $f$. Further, the network may be constructed using any well-behaved activation function---sigmoid, tanh, parametric ReLU, ELU, and more---making our result quite general.
The implication of the AUA theorem is that there exist provably correct neural networks: Suppose, for instance, that there is an ideal robust image classifier represented as function $f$. The AUA theorem tells us that there exists a neural network that approximates $f$ and for which we can automatically construct proofs of robustness using the interval abstract domain. Our work sheds light on the existence of provably correct neural networks, using arbitrary activation functions, and establishes intriguing connections between well-known theoretical properties of neural networks and abstract interpretation using numerical domains.

[107]  arXiv:2007.06106 (cross-list from cs.LG) [pdf, other]
Title: Unsupervised Feature Selection for Tumor Profiles using Autoencoders and Kernel Methods
Subjects: Machine Learning (cs.LG); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Molecular data from tumor profiles is high dimensional. Tumor profiles can be characterized by tens of thousands of gene expression features. Due to the size of the gene expression feature set machine learning methods are exposed to noisy variables and complexity. Tumor types present heterogeneity and can be subdivided in tumor subtypes. In many cases tumor data does not include tumor subtype labeling thus unsupervised learning methods are necessary for tumor subtype discovery. This work aims to learn meaningful and low dimensional representations of tumor samples and find tumor subtype clusters while keeping biological signatures without using tumor labels. The proposed method named Latent Kernel Feature Selection (LKFS) is an unsupervised approach for gene selection in tumor gene expression profiles. By using Autoencoders a low dimensional and denoised latent space is learned as a target representation to guide a Multiple Kernel Learning model that selects a subset of genes. By using the selected genes a clustering method is used to group samples. In order to evaluate the performance of the proposed unsupervised feature selection method the obtained features and clusters are analyzed by clinical significance. The proposed method has been applied on three tumor datasets which are Brain, Renal and Lung, each one composed by two tumor subtypes. When compared with benchmark unsupervised feature selection methods the results obtained by the proposed method reveal lower redundancy in the selected features and a better clustering performance.

[108]  arXiv:2007.06123 (cross-list from cs.SD) [pdf, other]
Title: OtoWorld: Towards Learning to Separate by Learning to Move
Comments: Published in Self Supervision in Audio and Speech Workshop, 37th International Conference on Machine Learning, Vienna, Austria (ICML 2020)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

We present OtoWorld, an interactive environment in which agents must learn to listen in order to solve navigational tasks. The purpose of OtoWorld is to facilitate reinforcement learning research in computer audition, where agents must learn to listen to the world around them to navigate. OtoWorld is built on three open source libraries: OpenAI Gym for environment and agent interaction, PyRoomAcoustics for ray-tracing and acoustics simulation, and nussl for training deep computer audition models. OtoWorld is the audio analogue of GridWorld, a simple navigation game. OtoWorld can be easily extended to more complex environments and games. To solve one episode of OtoWorld, an agent must move towards each sounding source in the auditory scene and "turn it off". The agent receives no other input than the current sound of the room. The sources are placed randomly within the room and can vary in number. The agent receives a reward for turning off a source. We present preliminary results on the ability of agents to win at OtoWorld. OtoWorld is open-source and available.

[109]  arXiv:2007.06126 (cross-list from cs.LG) [pdf, other]
Title: Disentangled Variational Autoencoder based Multi-Label Classification with Covariance-Aware Multivariate Probit Model
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Multi-label classification is the challenging task of predicting the presence and absence of multiple targets, involving representation learning and label correlation modeling. We propose a novel framework for multi-label classification, Multivariate Probit Variational AutoEncoder (MPVAE), that effectively learns latent embedding spaces as well as label correlations. MPVAE learns and aligns two probabilistic embedding spaces for labels and features respectively. The decoder of MPVAE takes in the samples from the embedding spaces and models the joint distribution of output targets under a Multivariate Probit model by learning a shared covariance matrix. We show that MPVAE outperforms the existing state-of-the-art methods on a variety of application domains, using public real-world datasets. MPVAE is further shown to remain robust under noisy settings. Lastly, we demonstrate the interpretability of the learned covariance by a case study on a bird observation dataset.

[110]  arXiv:2007.06133 (cross-list from cs.LG) [pdf, other]
Title: Explainable Recommendation via Interpretable Feature Mapping and Evaluation of Explainability
Comments: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI)
Journal-ref: IJCAI 2020, pages 2690-2696
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Latent factor collaborative filtering (CF) has been a widely used technique for recommender system by learning the semantic representations of users and items. Recently, explainable recommendation has attracted much attention from research community. However, trade-off exists between explainability and performance of the recommendation where metadata is often needed to alleviate the dilemma. We present a novel feature mapping approach that maps the uninterpretable general features onto the interpretable aspect features, achieving both satisfactory accuracy and explainability in the recommendations by simultaneous minimization of rating prediction loss and interpretation loss. To evaluate the explainability, we propose two new evaluation metrics specifically designed for aspect-level explanation using surrogate ground truth. Experimental results demonstrate a strong performance in both recommendation and explaining explanation, eliminating the need for metadata. Code is available from https://github.com/pd90506/AMCF.

[111]  arXiv:2007.06134 (cross-list from cs.LG) [pdf, other]
Title: Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

Stochastic Gradient Descent (SGD) is the key learning algorithm for many machine learning tasks. Because of its computational costs, there is a growing interest in accelerating SGD on HPC resources like GPU clusters. However, the performance of parallel SGD is still bottlenecked by the high communication costs even with a fast connection among the machines. A simple approach to alleviating this problem, used in many existing efforts, is to perform communication every few iterations, using a constant averaging period. In this paper, we show that the optimal averaging period in terms of convergence and communication cost is not a constant, but instead varies over the course of the execution. Specifically, we observe that reducing the variance of model parameters among the computing nodes is critical to the convergence of periodic parameter averaging SGD. Given a fixed communication budget, we show that it is more beneficial to synchronize more frequently in early iterations to reduce the initial large variance and synchronize less frequently in the later phase of the training process. We propose a practical algorithm, named ADaptive Periodic parameter averaging SGD (ADPSGD), to achieve a smaller overall variance of model parameters, and thus better convergence compared with the Constant Periodic parameter averaging SGD (CPSGD). We evaluate our method with several image classification benchmarks and show that our ADPSGD indeed achieves smaller training losses and higher test accuracies with smaller communication compared with CPSGD. Compared with gradient-quantization SGD, we show that our algorithm achieves faster convergence with only half of the communication. Compared with full-communication SGD, our ADPSGD achieves 1:14x to 1:27x speedups with a 100Gbps connection among computing nodes, and the speedups increase to 1:46x ~ 1:95x with a 10Gbps connection.

[112]  arXiv:2007.06140 (cross-list from cs.LG) [pdf, other]
Title: Projected Latent Markov Chain Monte Carlo: Conditional Inference with Normalizing Flows
Comments: 21 pages, 12 figures, 4 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce Projected Latent Markov Chain Monte Carlo (PL-MCMC), a technique for sampling from the high-dimensional conditional distributions learned by a normalizing flow. We prove that PL-MCMC asymptotically samples from the exact conditional distributions associated with a normalizing flow. As a conditional sampling method, PL-MCMC enables Monte Carlo Expectation Maximization (MC-EM) training of normalizing flows from incomplete data. By providing experimental results for a variety of data sets, we demonstrate the practicality and effectiveness of PL-MCMC for missing data inference using normalizing flows.

[113]  arXiv:2007.06157 (cross-list from cs.LG) [pdf, other]
Title: Implementing the ICE Estimator in Multilayer Perceptron Classifiers
Authors: Tyler Ward
Subjects: Machine Learning (cs.LG); Computation (stat.CO)

This paper describes the techniques used to implement the ICE estimator for a multilayer perceptron model, and reviews the performance of the resulting models. The ICE estimator is implemented in the Apache Spark MultilayerPerceptronClassifier, and shown in cross-validation to outperform the stock MultilayerPerceptronClassifier that uses unadjusted MLE (cross-entropy) loss. The resulting models have identical runtime performance, and similar fitting performance to the stock MLP implementations. Additionally, this approach requires no hyper-parameters, and is therefore viable as a drop-in replacement for cross-entropy optimizing multilayer perceptron classifiers wherever overfitting may be a concern.

[114]  arXiv:2007.06159 (cross-list from cs.LG) [pdf, other]
Title: Implicit Distributional Reinforcement Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution. We adopt a distributional perspective on the discounted cumulative return and model it with a state-action-dependent implicit distribution, which is approximated by the DGNs that take state-action pairs and random noises as their input. Moreover, we use the SIA to provide a semi-implicit policy distribution, which mixes the policy parameters with a reparameterizable distribution that is not constrained by an analytic density function. In this way, the policy's marginal distribution is implicit, providing the potential to model complex properties such as covariance structure and skewness, but its parameter and entropy can still be estimated. We incorporate these features with an off-policy algorithm framework to solve problems with continuous action space, and compare IDAC with the state-of-art algorithms on representative OpenAI Gym environments. We observe that IDAC outperforms these baselines for most tasks.

[115]  arXiv:2007.06168 (cross-list from cs.LG) [pdf, other]
Title: Model Fusion with Kullback--Leibler Divergence
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors and proceeds using a simple assign-and-average approach. The components of the dataset posteriors are assigned to the proposed global model components by solving a regularized variant of the assignment problem. The global components are then updated based on these assignments by their mean under a KL divergence. For exponential family variational distributions, our formulation leads to an efficient non-parametric algorithm for computing the fused model. Our algorithm is easy to describe and implement, efficient, and competitive with state-of-the-art on motion capture analysis, topic modeling, and federated learning of Bayesian neural networks.

[116]  arXiv:2007.06169 (cross-list from econ.EM) [pdf, other]
Title: An Adversarial Approach to Structural Estimation
Comments: 58 pages, 3 tables, 4 figures
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

We propose a new simulation-based estimation method, adversarial estimation, for structural models. The estimator is formulated as the solution to a minimax problem between a generator (which generates synthetic observations using the structural model) and a discriminator (which classifies if an observation is synthetic). The discriminator maximizes the accuracy of its classification while the generator minimizes it. We show that, with a sufficiently rich discriminator, the adversarial estimator attains parametric efficiency under correct specification and the parametric rate under misspecification. We advocate the use of a neural network as a discriminator that can exploit adaptivity properties and attain fast rates of convergence. We apply our method to the elderly's saving decision model and show that including gender and health profiles in the discriminator uncovers the bequest motive as an important source of saving across the wealth distribution, not only for the rich.

[117]  arXiv:2007.06184 (cross-list from cs.LG) [pdf, other]
Title: Efficient Planning in Large MDPs with Weak Linear Function Approximation
Comments: 12 pages and appendix (10 pages). Submitted to the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.

[118]  arXiv:2007.06192 (cross-list from cs.LG) [pdf, other]
Title: Probabilistic bounds on data sensitivity in deep rectifier networks
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Neuron death is a complex phenomenon with implications for model trainability, but until recently it was measured only empirically. Recent articles have claimed that, as the depth of a rectifier neural network grows to infinity, the probability of finding a valid initialization decreases to zero. In this work, we provide a simple and rigorous proof of that result. Then, we show what happens when the width of each layer grows simultaneously with the depth. We derive both upper and lower bounds on the probability that a ReLU network is initialized to a trainable point, as a function of model hyperparameters. Contrary to previous claims, we show that it is possible to increase the depth of a network indefinitely, so long as the width increases as well. Furthermore, our bounds are asymptotically tight under reasonable assumptions: first, the upper bound coincides with the true probability for a single-layer network with the largest possible input set. Second, the true probability converges to our lower bound when the network width and depth both grow without limit. Our proof is based on the striking observation that very deep rectifier networks concentrate all outputs towards a single eigenvalue, in the sense that their normalized output variance goes to zero regardless of the network width. Finally, we develop a practical sign flipping scheme which guarantees with probability one that for a $k$-layer network, the ratio of living training data points is at least $2^{-k}$. We confirm our results with numerical simulations, suggesting that the actual improvement far exceeds the theoretical minimum. We also discuss how neuron death provides a theoretical interpretation for various network design choices such as batch normalization, residual layers and skip connections, and could inform the design of very deep neural networks.

[119]  arXiv:2007.06207 (cross-list from cs.LG) [pdf, other]
Title: DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional Action Space
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

It has been arduous to assess the progress of a policy learning algorithm in the domain of hierarchical task with high dimensional action space due to the lack of a commonly accepted benchmark. In this work, we propose a new light-weight benchmark task called Diner Dash for evaluating the performance in a complicated task with high dimensional action space. In contrast to the traditional Atari games that only have a flat structure of goals and very few actions, the proposed benchmark task has a hierarchical task structure and size of 57 for the action space and hence can facilitate the development of policy learning in complicated tasks. On top of that, we introduce Decomposed Policy Graph Modelling (DPGM), an algorithm that combines both graph modelling and deep learning to allow explicit domain knowledge embedding and achieves significant improvement comparing to the baseline. In the experiments, we have shown the effectiveness of the domain knowledge injection via a specially designed imitation algorithm as well as results of other popular algorithms.

[120]  arXiv:2007.06225 (cross-list from cs.LG) [pdf]
Title: ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)

Motivation: NLP continues improving substantially through auto-regressive and auto-encoding Language Models. These LMs require expensive computing resources for self-supervised or un-supervised learning from huge unlabelled text corpora. The information learned is transferred through so-called embeddings to downstream prediction tasks. Bioinformatics provide vast gold-mines of structured and sequentially ordered text data leading to extraordinarily successful protein sequence LMs that promise new frontiers for generative and predictive tasks at low inference cost. Here, we addressed two questions: (1) To which extent can HPC up-scale protein LMs to larger databases and larger models? (2) To which extent can LMs extract features from single proteins to get closer to the performance of methods using evolutionary information?
Methodology: Here, we trained two auto-regressive language models (Transformer-XL and XLNet) and two auto-encoder models (BERT and Albert) using 80 billion amino acids from 200 million protein sequences (UniRef100) and 393 billion amino acids from 2.1 billion protein sequences (BFD). The LMs were trained on the Summit supercomputer, using 5616 GPUs and one TPU Pod, using V3-512 cores.
Results: The results of training these LMs on proteins was assessed by predicting secondary structure in three- and eight-states (Q3=75-83, Q8=63-72), localization for 10 cellular compartments (Q10=74) and whether a protein is membrane-bound or water-soluble (Q2=89). Dimensionality reduction revealed that the LM-embeddings from unlabelled data (only protein sequences) captured important biophysical properties of the protein alphabet, namely the amino acids, and their well orchestrated interplay in governing the shape of proteins. In the analogy of NLP, this implied having learned some of the grammar of the language of life realized in protein sequences.

[121]  arXiv:2007.06226 (cross-list from cs.LG) [pdf, other]
Title: Neural Network Verification through Replication
Comments: 13 pages, 13 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)

A system identification based approach to neural network model replication is presented and the application of model replication to verification of fundamental, single hidden layer, neural network systems is demonstrated. The presented approach serves as a means to partially address the problem of verifying that a neural network implementation meets a provided specification given only grey-box access to the implemented network. The procedure developed involves stimulating a neural network with a chosen signal, extracting a replicated model from the response, and systematically checking that the replicated model is output-equivalent to a specified model in order to verify that the grey-box system under test is implemented to specification without direct access to its hidden parameters. The replication step is introduced to provide an inherent guarantee that the stimulus signals employed yield sufficient test coverage. This method is investigated as a neural network focused nonlinear counterpart to the traditional verification of circuits through system identification. A strategy for choosing the stimulus is provided and an algorithm for verifying that the resulting response is indicative of a specification-compliant neural network system under test is derived. We find that the method can reliably detect defects in small neural networks or in small sub-circuits within larger neural networks.

[122]  arXiv:2007.06229 (cross-list from cs.LG) [pdf, other]
Title: Deep Claim: Payer Response Prediction from Claims Data with Deep Learning
Comments: To be presented at the Healthcare Systems, Population Health, and the Role of Health-Tech (HSYS) Workshop at the 37th International Conference on Machine Learning, Vienna, Austria, July 13-18, 2020
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)

Each year, almost 10% of claims are denied by payers (i.e., health insurance plans). With the cost to recover these denials and underpayments, predicting payer response (likelihood of payment) from claims data with a high degree of accuracy and precision is anticipated to improve healthcare staffs' performance productivity and drive better patient financial experience and satisfaction in the revenue cycle (Barkholz, 2017). However, constructing advanced predictive analytics models has been considered challenging in the last twenty years. That said, we propose a (low-level) context-dependent compact representation of patients' historical claim records by effectively learning complicated dependencies in the (high-level) claim inputs. Built on this new latent representation, we demonstrate that a deep learning-based framework, Deep Claim, can accurately predict various responses from multiple payers using 2,905,026 de-identified claims data from two US health systems. Deep Claim's improvements over carefully chosen baselines in predicting claim denials are most pronounced as 22.21% relative recall gain (at 95% precision) on Health System A, which implies Deep Claim can find 22.21% more denials than the best baseline system.

[123]  arXiv:2007.06230 (cross-list from cs.LG) [pdf, other]
Title: Using LSTM for the Prediction of Disruption in ADITYA Tokamak
Comments: 7 pages, 4 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Major disruptions in tokamak pose a serious threat to the vessel and its surrounding pieces of equipment. The ability of the systems to detect any behavior that can lead to disruption can help in alerting the system beforehand and prevent its harmful effects. Many machine learning techniques have already been in use at large tokamaks like JET and ASDEX, but are not suitable for ADITYA, which is comparatively small. Through this work, we discuss a new real-time approach to predict the time of disruption in ADITYA tokamak and validate the results on an experimental dataset. The system uses selected diagnostics from the tokamak and after some pre-processing steps, sends them to a time-sequence Long Short-Term Memory (LSTM) network. The model can make the predictions 12 ms in advance at less computation cost that is quick enough to be deployed in real-time applications.

[124]  arXiv:2007.06236 (cross-list from cs.LG) [pdf, other]
Title: The Good, The Bad, and The Ugly: Quality Inference in Federated Learning
Authors: Balázs Pejó
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Collaborative machine learning algorithms are developed both for efficiency reasons and to ensure the privacy protection of sensitive data used for processing. Federated learning is the most popular of these methods, where 1) learning is done locally, and 2) only a subset of the participants contribute in each training round. Despite of no data is shared explicitly, recent studies showed that models trained with FL could potentially still leak some information. In this paper we focus on the quality property of the datasets and investigate whether the leaked information could be connected to specific participants. Via a differential attack we analyze the information leakage using a few simple metrics, and show that reconstruction of the quality ordering among the training participants' datasets is possible. Our scoring rules are only using an oracle access to a test dataset and no further background information or computational power. We demonstrate two implications of such a quality ordering leakage: 1) we utilized it to increase the accuracy of the model by weighting the participant's updates, and 2) using it to detect misbehaving participants.

[125]  arXiv:2007.06240 (cross-list from cs.CV) [pdf, other]
Title: Expert Training: Task Hardness Aware Meta-Learning for Few-Shot Classification
Comments: 9 pages, 6 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Deep neural networks are highly effective when a large number of labeled samples are available but fail with few-shot classification tasks. Recently, meta-learning methods have received much attention, which train a meta-learner on massive additional tasks to gain the knowledge to instruct the few-shot classification. Usually, the training tasks are randomly sampled and performed indiscriminately, often making the meta-learner stuck into a bad local optimum. Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better. Inspired by this idea, we propose an easy-to-hard expert meta-training strategy to arrange the training tasks properly, where easy tasks are preferred in the first phase, then, hard tasks are emphasized in the second phase. A task hardness aware module is designed and integrated into the training procedure to estimate the hardness of a task based on the distinguishability of its categories. In addition, we explore multiple hardness measurements including the semantic relation, the pairwise Euclidean distance, the Hausdorff distance, and the Hilbert-Schmidt independence criterion. Experimental results on the miniImageNet and tieredImageNetSketch datasets show that the meta-learners can obtain better results with our expert training strategy.

[126]  arXiv:2007.06245 (cross-list from cs.LG) [pdf, other]
Title: Reconstruction Bottlenecks in Object-Centric Generative Models
Comments: 10 pages, 7 Figures, Workshop on Object-Oriented Learning at ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A range of methods with suitable inductive biases exist to learn interpretable object-centric representations of images without supervision. However, these are largely restricted to visually simple images; robust object discovery in real-world sensory datasets remains elusive. To increase the understanding of such inductive biases, we empirically investigate the role of "reconstruction bottlenecks" for scene decomposition in GENESIS, a recent VAE-based model. We show such bottlenecks determine reconstruction and segmentation quality and critically influence model behaviour.

[127]  arXiv:2007.06252 (cross-list from cs.LG) [pdf, other]
Title: ProteiNN: Intrinsic-Extrinsic Convolution and Pooling for Scalable Deep Protein Analysis
Subjects: Machine Learning (cs.LG); Biomolecules (q-bio.BM); Machine Learning (stat.ML)

Proteins perform a large variety of functions in living organisms, thus playing a key role in biology. As of now, available learning algorithms to process protein data do not consider several particularities of such data and/or do not scale well for large protein conformations. To fill this gap, we propose two new learning operations enabling deep 3D analysis of large-scale protein data. First, we introduce a novel convolution operator which considers both, the intrinsic (invariant under protein folding) as well as extrinsic (invariant under bonding) structure, by using $n$-D convolutions defined on both the Euclidean distance, as well as multiple geodesic distances between atoms in a multi-graph. Second, we enable a multi-scale protein analysis by introducing hierarchical pooling operators, exploiting the fact that proteins are a recombination of a finite set of amino acids, which can be pooled using shared pooling matrices. Lastly, we evaluate the accuracy of our algorithms on several large-scale data sets for common protein analysis tasks, where we outperform state-of-the-art methods.

[128]  arXiv:2007.06281 (cross-list from cs.LG) [pdf, other]
Title: Distributed Graph Convolutional Networks
Comments: Preprint submitted to IEEE TSIPN
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

The aim of this work is to develop a fully-distributed algorithmic framework for training graph convolutional networks (GCNs). The proposed method is able to exploit the meaningful relational structure of the input data, which are collected by a set of agents that communicate over a sparse network topology. After formulating the centralized GCN training problem, we first show how to make inference in a distributed scenario where the underlying data graph is split among different agents. Then, we propose a distributed gradient descent procedure to solve the GCN training problem. The resulting model distributes computation along three lines: during inference, during back-propagation, and during optimization. Convergence to stationary solutions of the GCN training problem is also established under mild conditions. Finally, we propose an optimization criterion to design the communication topology between agents in order to match with the graph describing data relationships. A wide set of numerical results validate our proposal. To the best of our knowledge, this is the first work combining graph convolutional neural networks with distributed optimization.

[129]  arXiv:2007.06324 (cross-list from cs.LG) [pdf, other]
Title: TrustNet: Learning from Trusted Data Against (A)symmetric Label Noise
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Robustness to label noise is a critical property for weakly-supervised classifiers trained on massive datasets. Robustness to label noise is a critical property for weakly-supervised classifiers trained on massive datasets. In this paper, we first derive analytical bound for any given noise patterns. Based on the insights, we design TrustNet that first adversely learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10 and CIFAR-100, and real-world data with label noise, i.e., Clothing1M. We compare against state-of-the-art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.

[130]  arXiv:2007.06346 (cross-list from cs.LG) [pdf, other]
Title: Whitening for Self-Supervised Representation Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Recent literature on self-supervised learning is based on the contrastive loss, where image instances which share the same semantic content ("positives") are contrasted with instances extracted from other images ("negatives"). However, in order for the learning to be effective, a lot of negatives should be compared with a positive pair. This is not only computationally demanding, but it also requires that the positive and the negative representations are kept consistent with each other over a long training period. In this paper we propose a different direction and a new loss function for self-supervised learning which is based on the whitening of the latent-space features. The whitening operation has a "scattering" effect on the batch samples, which compensates the lack of a large number of negatives, avoiding degenerate solutions where all the sample representations collapse to a single point. We empirically show that our loss accelerates self-supervised training and the learned representations are much more effective for downstream tasks than previously published work.

[131]  arXiv:2007.06368 (cross-list from cs.LG) [pdf, other]
Title: Contextual Bandit with Missing Rewards
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed("missing rewards"). This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. In order to address the missing rewards setting, we propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering. Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on several real-life datasets.

[132]  arXiv:2007.06379 (cross-list from cs.LG) [pdf, other]
Title: Rule Covering for Interpretation and Boosting
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose two algorithms for interpretation and boosting of tree-based ensemble methods. Both algorithms make use of mathematical programming models that are constructed with a set of rules extracted from an ensemble of decision trees. The objective is to obtain the minimum total impurity with the least number of rules that cover all the samples. The first algorithm uses the collection of decision trees obtained from a trained random forest model. Our numerical results show that the proposed rule covering approach selects only a few rules that could be used for interpreting the random forest model. Moreover, the resulting set of rules closely matches the accuracy level of the random forest model. Inspired by the column generation algorithm in linear programming, our second algorithm uses a rule generation scheme for boosting decision trees. We use the dual optimal solutions of the linear programming models as sample weights to obtain only those rules that would improve the accuracy. With a computational study, we observe that our second algorithm performs competitively with the other well-known boosting methods. Our implementations also demonstrate that both algorithms can be trivially coupled with the existing random forest and decision tree packages.

[133]  arXiv:2007.06381 (cross-list from cs.LG) [pdf, other]
Title: A simple defense against adversarial attacks on heatmap explanations
Comments: Accepted at 2020 Workshop on Human Interpretability in Machine Learning (WHI)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

With machine learning models being used for more sensitive applications, we rely on interpretability methods to prove that no discriminating attributes were used for classification. A potential concern is the so-called "fair-washing" - manipulating a model such that the features used in reality are hidden and more innocuous features are shown to be important instead.
In our work we present an effective defence against such adversarial attacks on neural networks. By a simple aggregation of multiple explanation methods, the network becomes robust against manipulation. This holds even when the attacker has exact knowledge of the model weights and the explanation methods used.

[134]  arXiv:2007.06402 (cross-list from cs.CV) [pdf, other]
Title: Nested Learning For Multi-Granular Tasks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Standard deep neural networks (DNNs) are commonly trained in an end-to-end fashion for specific tasks such as object recognition, face identification, or character recognition, among many examples. This specificity often leads to overconfident models that generalize poorly to samples that are not from the original training distribution. Moreover, such standard DNNs do not allow to leverage information from heterogeneously annotated training data, where for example, labels may be provided with different levels of granularity. Furthermore, DNNs do not produce results with simultaneous different levels of confidence for different levels of detail, they are most commonly an all or nothing approach. To address these challenges, we introduce the concept of nested learning: how to obtain a hierarchical representation of the input such that a coarse label can be extracted first, and sequentially refine this representation, if the sample permits, to obtain successively refined predictions, all of them with the corresponding confidence. We explicitly enforce this behavior by creating a sequence of nested information bottlenecks. Looking at the problem of nested learning from an information theory perspective, we design a network topology with two important properties. First, a sequence of low dimensional (nested) feature embeddings are enforced. Then we show how the explicit combination of nested outputs can improve both the robustness and the accuracy of finer predictions. Experimental results on Cifar-10, Cifar-100, MNIST, Fashion-MNIST, Dbpedia, and Plantvillage demonstrate that nested learning outperforms the same network trained in the standard end-to-end fashion.

[135]  arXiv:2007.06414 (cross-list from q-bio.PE) [pdf, other]
Title: Epidemic modelling of bovine tuberculosis in cattle herds and badgers in Ireland
Comments: 32 pages, 2 figures
Subjects: Populations and Evolution (q-bio.PE); Applications (stat.AP)

Bovine tuberculosis, a disease that affects cattle and badgers in Ireland, was studied via stochastic epidemic modeling using incidence data from the Four Area Project (Griffin et al., 2005). The Four Area Project was a large scale field trial conducted in four diverse farming regions of Ireland over a five-year period (1997-2002) to evaluate the impact of badger culling on bovine tuberculosis incidence in cattle herds.
Based on the comparison of several models, the model with no between-herd transmission and badger-to-herd transmission proportional to the total number of infected badgers culled was best supported by the data.
Detailed model validation was conducted via model prediction, identifiability checks and sensitivity analysis.
The results suggest that badger-to-cattle transmission is of more importance than between-herd transmission and that if there was no badger-to-herd transmission, levels of bovine tuberculosis in cattle herds in Ireland could decrease considerably.

[136]  arXiv:2007.06418 (cross-list from cs.LG) [pdf, other]
Title: Lessons Learned from the Training of GANs on Artificial Datasets
Authors: Shichang Tang
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Generative Adversarial Networks (GANs) have made great progress in synthesizing realistic images in recent years. However, they are often trained on image datasets with either too few samples or too many classes belonging to different data distributions. Consequently, GANs are prone to underfitting or overfitting, making the analysis of them difficult and constrained. Therefore, in order to conduct a thorough study on GANs while obviating unnecessary interferences introduced by the datasets, we train them on artificial datasets where there are infinitely many samples and the real data distributions are simple, high-dimensional and have structured manifolds. Moreover, the generators are designed such that optimal sets of parameters exist. Empirically, we find that under various distance measures, the generator fails to learn such parameters with the GAN training procedure. We also find that training mixtures of GANs leads to more performance gain compared to increasing the network depth or width when the model complexity is high enough. Our experimental results demonstrate that a mixture of generators can discover different modes or different classes automatically in an unsupervised setting, which we attribute to the distribution of the generation and discrimination tasks across multiple generators and discriminators. As an example of the generalizability of our conclusions to realistic datasets, we train a mixture of GANs on the CIFAR-10 dataset and our method significantly outperforms the state-of-the-art in terms of popular metrics, i.e., Inception Score (IS) and Fr\'echet Inception Distance (FID).

[137]  arXiv:2007.06437 (cross-list from cs.LG) [pdf, other]
Title: A Provably Efficient Sample Collection Strategy for Reinforcement Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A common assumption in reinforcement learning (RL) is to have access to a generative model (i.e., a simulator of the environment), which allows to generate samples from any desired state-action pair. Nonetheless, in many settings a generative model may not be available and an adaptive exploration strategy is needed to efficiently collect samples from an unknown environment by direct interaction. In this paper, we study the scenario where an algorithm based on the generative model assumption defines the (possibly time-varying) amount of samples $b(s,a)$ required at each state-action pair $(s,a)$ and an exploration strategy has to learn how to generate $b(s,a)$ samples as fast as possible. Building on recent results for regret minimization in the stochastic shortest path (SSP) setting (Cohen et al., 2020; Tarbouriech et al., 2020), we derive an algorithm that requires $\tilde{O}( B D + D^{3/2} S^2 A)$ time steps to collect the $B = \sum_{s,a} b(s,a)$ desired samples, in any unknown and communicating MDP with $S$ states, $A$ actions and diameter $D$. Leveraging the generality of our strategy, we readily apply it to a variety of existing settings (e.g., model estimation, pure exploration in MDPs) for which we obtain improved sample-complexity guarantees, and to a set of new problems such as best-state identification and sparse reward discovery.

[138]  arXiv:2007.06503 (cross-list from cs.LG) [pdf, other]
Title: PRI-VAE: Principle-of-Relevant-Information Variational Autoencoders
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Although substantial efforts have been made to learn disentangled representations under the variational autoencoder (VAE) framework, the fundamental properties to the dynamics of learning of most VAE models still remain unknown and under-investigated. In this work, we first propose a novel learning objective, termed the principle-of-relevant-information variational autoencoder (PRI-VAE), to learn disentangled representations. We then present an information-theoretic perspective to analyze existing VAE models by inspecting the evolution of some critical information-theoretic quantities across training epochs. Our observations unveil some fundamental properties associated with VAEs. Empirical results also demonstrate the effectiveness of PRI-VAE on four benchmark data sets.

[139]  arXiv:2007.06528 (cross-list from math.OC) [pdf, other]
Title: Random extrapolation for primal-dual coordinate descent
Comments: To appear in ICML 2020
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function. Our method updates only a subset of primal and dual variables with sparse data, and it uses large step sizes with dense data, retaining the benefits of the specific methods designed for each case. In addition to adapting to sparsity, our method attains fast convergence guarantees in favorable cases \textit{without any modifications}. In particular, we prove linear convergence under metric subregularity, which applies to strongly convex-strongly concave problems and piecewise linear quadratic functions. We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case. Numerical evidence demonstrates the state-of-the-art empirical performance of our method in sparse and dense settings, matching and improving the existing methods.

[140]  arXiv:2007.06533 (cross-list from cs.LG) [pdf, other]
Title: S2RMs: Spatially Structured Recurrent Modules
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalize well and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. We accomplish this by abstracting the modeled dynamical system as a collection of autonomous but sparsely interacting sub-systems. The sub-systems interact according to a topology that is learned, but also informed by the spatial structure of the underlying real-world system. This results in a class of models that are well suited for modeling the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modeling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalization to novel tasks without additional training, even when compared against strong baselines that perform equally well or better on the training distribution.

[141]  arXiv:2007.06555 (cross-list from cs.LG) [pdf, other]
Title: Adversarial robustness via robust low rank representations
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)

Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees.
Our first contribution is for certified robustness to perturbations measured in $\ell_2$ norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100.
Our second contribution is for the more challenging setting of certified robustness to perturbations measured in $\ell_\infty$ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide significantly better guarantees for certified robustness to $\ell_\infty$ perturbations in those representations. Our certificate of $\ell_\infty$ robustness relies on a natural quantity involving the $\infty \to 2$ matrix operator norm associated with the representation, to translate robustness guarantees from $\ell_2$ to $\ell_\infty$ perturbations.
A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.

[142]  arXiv:2007.06557 (cross-list from cs.SI) [pdf, other]
Title: Scalable Learning of Independent Cascade Dynamics from Partial Observations
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)

Spreading processes play an increasingly important role in modeling for diffusion networks, information propagation, marketing, and opinion setting. Recent real-world spreading events further highlight the need for prediction, optimization, and control of diffusion dynamics. To tackle these tasks, it is essential to learn the effective spreading model and transmission probabilities across the network of interactions. However, in most cases the transmission rates are unknown and need to be inferred from the spreading data. Additionally, full observation of the dynamics is rarely available. As a result, standard approaches such as maximum likelihood quickly become intractable for large network instances. In this work, we study the popular Independent Cascade model of stochastic diffusion dynamics. We introduce a computationally efficient algorithm, based on a scalable dynamic message-passing approach, which is able to learn parameters of the effective spreading model given only limited information on the activation times of nodes in the network. Importantly, we show that the resulting model approximates the marginal activation probabilities that can be used for prediction of the spread.

[143]  arXiv:2007.06559 (cross-list from cs.LG) [pdf, other]
Title: Graph Structure of Neural Networks
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

Neural networks are often represented as graphs of connections between neurons. However, despite their wide use, there is currently little understanding of the relationship between the graph structure of the neural network and its predictive performance. Here we systematically investigate how does the graph structure of neural networks affect their predictive performance. To this end, we develop a novel graph-based representation of neural networks called relational graph, where layers of neural network computation correspond to rounds of message exchange along the graph structure. Using this representation we show that: (1) a "sweet spot" of relational graphs leads to neural networks with significantly improved predictive performance; (2) neural network's performance is approximately a smooth function of the clustering coefficient and average path length of its relational graph; (3) our findings are consistent across many different tasks and datasets; (4) the sweet spot can be identified efficiently; (5) top-performing neural networks have graph structure surprisingly similar to those of real biological neural networks. Our work opens new directions for the design of neural architectures and the understanding on neural networks in general.

Replacements for Tue, 14 Jul 20

[144]  arXiv:1712.07248 (replaced) [pdf, ps, other]
Title: Towards a General Large Sample Theory for Regularized Estimators
Subjects: Statistics Theory (math.ST); Econometrics (econ.EM)
[145]  arXiv:1802.08667 (replaced) [pdf, ps, other]
Title: De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers
Comments: 41 pages; submitted version
Subjects: Machine Learning (stat.ML); Econometrics (econ.EM); Statistics Theory (math.ST)
[146]  arXiv:1807.04010 (replaced) [pdf, ps, other]
Title: Causal Discovery in the Presence of Missing Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[147]  arXiv:1808.08558 (replaced) [pdf, other]
Title: Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error
Comments: 17 pages, 4 figures. Accepted in IJCAI-PRICAI 2020. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pages 2839--2846
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[148]  arXiv:1809.05224 (replaced) [pdf, ps, other]
Title: Automatic Debiased Machine Learning of Causal and Structural Effects
Subjects: Statistics Theory (math.ST); Econometrics (econ.EM)
[149]  arXiv:1811.00401 (replaced) [pdf, other]
Title: Excessive Invariance Causes Adversarial Vulnerability
Journal-ref: Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[150]  arXiv:1811.03064 (replaced) [pdf, other]
Title: Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile
Comments: PhD dissertation (2018)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[151]  arXiv:1901.03904 (replaced) [pdf]
Title: A Speech Act Classifier for Persian Texts and its Application in Identifying Rumors
Comments: Published Link: this http URL
Journal-ref: Journal of Soft Computing and Information Technology, 9, 1, 1399 (2020), 18-27
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
[152]  arXiv:1902.01542 (replaced) [pdf, other]
Title: Learning Hierarchical Interactions at Scale: A Convex Optimization Approach
Comments: AISTATS 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO)
[153]  arXiv:1902.10459 (replaced) [pdf, other]
Title: Data segmentation based on the local intrinsic dimension
Comments: 11 pages, 6 figures + 9 pages Supplementary Information
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[154]  arXiv:1903.02050 (replaced) [pdf, other]
Title: Revisiting the Evaluation of Uncertainty Estimation and Its Application to Explore Model Complexity-Uncertainty Trade-Off
Comments: CVPR 2020 - Fair, Data Efficient and Trusted Computer Vision Workshop
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[155]  arXiv:1904.04276 (replaced) [pdf, other]
Title: On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning
Comments: Significant updates from the previous version. In press in Statistical Science
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
[156]  arXiv:1905.12813 (replaced) [pdf, other]
Title: Data-Dependent Differentially Private Parameter Learning for Directed Graphical Models
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[157]  arXiv:1906.00042 (replaced) [pdf, other]
Title: Bayesian Profiling Multiple Imputation for Missing Electronic Health Records
Subjects: Methodology (stat.ME)
[158]  arXiv:1906.04538 (replaced) [pdf, other]
Title: Identification of taxon through fuzzy classification
Comments: About half are appendices, which contains mathematical details
Subjects: Applications (stat.AP); Methodology (stat.ME)
[159]  arXiv:1906.05363 (replaced) [pdf, other]
Title: Competing Bandits in Matching Markets
Comments: 15 pages, 3 figures. A version appears in the Proceedings of The 23nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[160]  arXiv:1907.00502 (replaced) [pdf, other]
Title: Wave-shape oscillatory model for nonstationary periodic time series analysis
Comments: 40 pages, 15 figures
Subjects: Applications (stat.AP)
[161]  arXiv:1907.04147 (replaced) [pdf, ps, other]
Title: Adaptive inference for a semiparametric generalized autoregressive conditional heteroskedasticity model
Subjects: Methodology (stat.ME); Econometrics (econ.EM)
[162]  arXiv:1907.06734 (replaced) [pdf]
Title: Mediation effects that emulate a target randomised trial: Simulation-based evaluation of ill-defined interventions on multiple mediators
Subjects: Methodology (stat.ME)
[163]  arXiv:1907.11546 (replaced) [pdf, other]
Title: Compressing deep quaternion neural networks with targeted regularization
Comments: Published on CAAI Transactions on Intelligence Technology, this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[164]  arXiv:1908.03620 (replaced) [pdf, other]
Title: Learning physics-based reduced-order models for a single-injector combustion process
Journal-ref: AIAA Journal 58:6, 2658-2672, 2020
Subjects: Computational Physics (physics.comp-ph); Machine Learning (cs.LG); Systems and Control (eess.SY); Dynamical Systems (math.DS); Machine Learning (stat.ML)
[165]  arXiv:1909.02496 (replaced) [pdf, ps, other]
Title: The Benefits of Diversity: Permutation Recovery in Unlabeled Sensing from Multiple Measurement Vectors
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
[166]  arXiv:1909.06039 (replaced) [pdf, other]
Title: d-blink: Distributed End-to-End Bayesian Entity Resolution
Comments: 30 pages, 6 figures, 4 tables. Includes 21 pages of supplementary material. This revision includes: updates to the related work, improvements to the clarity of writing and minor updates to the experimental results
Subjects: Computation (stat.CO); Databases (cs.DB); Machine Learning (cs.LG); Machine Learning (stat.ML)
[167]  arXiv:1909.06389 (replaced) [pdf, other]
Title: Spectral Analysis Of Weighted Laplacians Arising In Data Clustering
Subjects: Spectral Theory (math.SP); Analysis of PDEs (math.AP); Machine Learning (stat.ML)
[168]  arXiv:1909.06677 (replaced) [pdf, other]
Title: Predictive Multiplicity in Classification
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
[169]  arXiv:1909.11062 (replaced) [pdf, other]
Title: Wavelet invariants for statistically robust multi-reference alignment
Comments: 59 pages, 8 figures. v3 replaces v2 and is an extensive revision. Revisions include additional background and motivation, additional context relating the approach to other methods, a discussion of stability, and improved presentation. Code reproducing all numerical results is available at this https URL
Subjects: Signal Processing (eess.SP); Statistics Theory (math.ST)
[170]  arXiv:1910.00270 (replaced) [pdf, other]
Title: Robust Learning with the Hilbert-Schmidt Independence Criterion
Comments: Proceedings of the 37th International Conference on Machine Learning (ICML 2020)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[171]  arXiv:1910.02919 (replaced) [pdf, other]
Title: Multi-step Greedy Reinforcement Learning Algorithms
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[172]  arXiv:1910.08442 (replaced) [pdf, ps, other]
Title: Center-Outward R-Estimation for Semiparametric VARMA Models
Comments: 55 pages, 16 figures, 3 tables
Subjects: Statistics Theory (math.ST)
[173]  arXiv:1910.12327 (replaced) [pdf, ps, other]
Title: A simple measure of conditional dependence
Comments: 35 pages, 2 tables. A section on interpreting the coefficient as a generalization of partial R^2 has been added. R package available at this https URL
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Probability (math.PR); Methodology (stat.ME)
[174]  arXiv:1911.00115 (replaced) [pdf, other]
Title: The consequences of checking for zero-inflation and overdispersion in the analysis of count data
Authors: Harlan Campbell
Comments: 30 pages, 17 figures
Subjects: Methodology (stat.ME)
[175]  arXiv:1911.02109 (replaced) [pdf, other]
Title: Deep least-squares methods: an unsupervised learning-based numerical method for solving elliptic PDEs
Comments: 15 pages, 6 figures, 5 tables, accepted by Journal of Computational Physics
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)
[176]  arXiv:1911.02768 (replaced) [pdf, other]
Title: Confidence Intervals for Policy Evaluation in Adaptive Experiments
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[177]  arXiv:1911.09721 (replaced) [pdf, other]
Title: Communication-Efficient and Byzantine-Robust Distributed Learning with Error Feedback
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[178]  arXiv:1912.02279 (replaced) [pdf, other]
Title: Angular Visual Hardness
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[179]  arXiv:1912.07458 (replaced) [pdf, other]
Title: On-manifold Adversarial Data Augmentation Improves Uncertainty Calibration
Comments: changes in appendix
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[180]  arXiv:1912.08521 (replaced) [pdf, other]
Title: Semantically Plausible and Diverse 3D Human Motion Prediction
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[181]  arXiv:1912.13053 (replaced) [pdf, other]
Title: Disentangling Trainability and Generalization in Deep Neural Networks
Comments: 22 pages, 3 figures, ICML 2020. Associated Colab notebook at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[182]  arXiv:1912.13119 (replaced) [pdf, other]
Title: Clustering and Prediction with Variable Dimension Covariates
Subjects: Methodology (stat.ME)
[183]  arXiv:2001.00102 (replaced) [pdf, other]
Title: The Gambler's Problem and Beyond
Comments: International Conference on Learning Representations (ICLR) 2020
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[184]  arXiv:2001.03955 (replaced) [pdf, other]
Title: Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers
Comments: Accepted to AAAI2020.arXiv admin note: text overlap with arXiv:1807.10251
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[185]  arXiv:2001.06485 (replaced) [pdf, ps, other]
Title: K-NN active learning under local smoothness assumption
Comments: arXiv admin note: substantial text overlap with arXiv:1902.03055
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[186]  arXiv:2001.08950 (replaced) [pdf, other]
Title: PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Comments: 11 pages, 8 figures, 4 tables
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[187]  arXiv:2002.01328 (replaced) [pdf, other]
Title: TRAP: A Predictive Framework for Trail Running Assessment of Performance
Subjects: Applications (stat.AP)
[188]  arXiv:2002.03328 (replaced) [pdf, other]
Title: Out-of-Distribution Detection with Distance Guarantee in Deep Generative Models
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[189]  arXiv:2002.04108 (replaced) [pdf, other]
Title: Adversarial Filters of Dataset Biases
Comments: Accepted to ICML 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
[190]  arXiv:2002.04518 (replaced) [pdf, other]
Title: Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[191]  arXiv:2002.04788 (replaced) [pdf, other]
Title: To Split or Not to Split: The Impact of Disparate Treatment in Classification
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Information Theory (cs.IT); Machine Learning (stat.ML)
[192]  arXiv:2002.05551 (replaced) [pdf, other]
Title: PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[193]  arXiv:2002.06836 (replaced) [pdf, other]
Title: Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning
Journal-ref: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[194]  arXiv:2002.07598 (replaced) [pdf, ps, other]
Title: A confidence interval robust to publication bias for random-effects meta-analysis of few studies
Subjects: Methodology (stat.ME)
[195]  arXiv:2002.07772 (replaced) [pdf, other]
Title: The Tree Ensemble Layer: Differentiability meets Conditional Computation
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[196]  arXiv:2002.07836 (replaced) [pdf, ps, other]
Title: Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning
Comments: 40 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[197]  arXiv:2002.08958 (replaced) [pdf, other]
Title: Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor
Comments: 22 pages, 6 figures, 2 tables
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
[198]  arXiv:2002.11151 (replaced) [pdf, other]
Title: TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar Systems
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[199]  arXiv:2002.11651 (replaced) [pdf, other]
Title: Fair Learning with Private Demographic Data
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
[200]  arXiv:2002.11815 (replaced) [pdf, other]
Title: Uncertainty Quantification for Sparse Deep Learning
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
[201]  arXiv:2003.00295 (replaced) [pdf, other]
Title: Adaptive Federated Optimization
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
[202]  arXiv:2003.02460 (replaced) [pdf, other]
Title: A Closer Look at Accuracy vs. Robustness
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[203]  arXiv:2003.03241 (replaced) [pdf, other]
Title: Automated detection of corrosion in used nuclear fuel dry storage canisters using residual neural networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP); Machine Learning (stat.ML)
[204]  arXiv:2003.03919 (replaced) [pdf, other]
Title: Temporal Attribute Prediction via Joint Modeling of Multi-Relational Structure Evolution
Comments: In Proceedings of IJCAI 2020. Code can be found at this https URL . The sole copyright holder is IJCAI (International Joint Conferences on Artificial Intelligence), all rights reserved. Original Publication available at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[205]  arXiv:2003.07070 (replaced) [pdf, other]
Title: Merge-split Markov chain Monte Carlo for community detection
Authors: Tiago P. Peixoto
Comments: 13 pages, 6 figures. Code available at this https URL
Journal-ref: Phys. Rev. E 102, 012305 (2020)
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
[206]  arXiv:2003.11194 (replaced) [pdf, ps, other]
Title: A Poisson Kalman filter for disease surveillance
Comments: 19 Pages, 8 Figures
Subjects: Methodology (stat.ME); Quantitative Methods (q-bio.QM)
[207]  arXiv:2003.11542 (replaced) [pdf, other]
Title: Partial least squares for sparsely observed curves with measurement errors
Comments: 42 pages and 3 figures
Subjects: Methodology (stat.ME)
[208]  arXiv:2003.11941 (replaced) [pdf, other]
Title: Validation Set Evaluation can be Wrong: An Evaluator-Generator Approach for Maximizing Online Performance of Ranking in E-commerce
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[209]  arXiv:2003.12699 (replaced) [pdf, ps, other]
Title: Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[210]  arXiv:2004.00353 (replaced) [pdf, other]
Title: SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models
Comments: ICLR 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[211]  arXiv:2004.03391 (replaced) [pdf, other]
Title: Exploiting context dependence for image compression with upsampling
Authors: Jarek Duda
Comments: 6 pages, 4 figures
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Multimedia (cs.MM); Machine Learning (stat.ML)
[212]  arXiv:2004.05912 (replaced) [pdf, other]
Title: Towards GANs' Approximation Ability
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[213]  arXiv:2004.05944 (replaced) [pdf, ps, other]
Title: Exact recovery and sharp thresholds of Stochastic Ising Block Model
Authors: Min Ye
Comments: Corrected some typos. Submitted to IEEE Transactions on Information Theory
Subjects: Probability (math.PR); Information Theory (cs.IT); Machine Learning (stat.ML)
[214]  arXiv:2004.06448 (replaced) [pdf, other]
Title: Measurement Error in Nutritional Epidemiology: A Survey
Authors: Huimin Peng
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
[215]  arXiv:2004.06633 (replaced) [pdf, other]
Title: Occupant Plugload Management for Demand Response in Commercial Buildings: Field Experimentation and Statistical Characterization
Comments: 20 pages, 15 figures, 4 tables, preprint
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Machine Learning (stat.ML)
[216]  arXiv:2004.08919 (replaced) [pdf, other]
Title: DeepPurpose: a Deep Learning Library for Drug-Target Interaction Prediction and Applications to Repurposing and Screening
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[217]  arXiv:2004.10181 (replaced) [pdf, other]
Title: Knowing what you know: valid and validated confidence sets in multiclass and multilabel prediction
Comments: Updated section on multilabel settings addressing the cases when labels may repel each other
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[218]  arXiv:2005.01814 (replaced) [pdf, other]
Title: Cross-validation based adaptive sampling for Gaussian process models
Subjects: Computation (stat.CO)
[219]  arXiv:2005.02532 (replaced) [src]
Title: Statistical errors in Monte Carlo-based inference for random elements
Authors: Yasutaka Shimizu
Comments: We need to change the discussion drastically
Subjects: Statistics Theory (math.ST)
[220]  arXiv:2005.02979 (replaced) [pdf, ps, other]
Title: A Survey of Algorithms for Black-Box Safety Validation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)
[221]  arXiv:2005.03899 (replaced) [pdf, other]
Title: Amortized Bayesian Inference for Models of Cognition
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[222]  arXiv:2005.05080 (replaced) [pdf, other]
Title: Continual Learning Using Multi-view Task Conditional Neural Networks
Comments: 10 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[223]  arXiv:2005.10779 (replaced) [pdf, other]
Title: Using the "Hidden" Genome to Improve Classification of Cancer Types
Comments: 24 pages, 4 figures, 2 tables
Subjects: Methodology (stat.ME)
[224]  arXiv:2005.11736 (replaced) [pdf, other]
Title: Efficient Intervention Design for Causal Discovery with Latents
Comments: International Conference on Machine Learning 2020
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[225]  arXiv:2005.12620 (replaced) [pdf, other]
Title: On the Likelihood of Local Projection Models
Authors: Masahiro Tanaka
Subjects: Methodology (stat.ME)
[226]  arXiv:2006.01225 (replaced) [pdf, ps, other]
Title: Streaming Coresets for Symmetric Tensor Factorization
Comments: Accepted at ICML 2020. Included algorithm with improved update time and fixed minor bugs
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[227]  arXiv:2006.03227 (replaced) [pdf, other]
Title: Population-Based Black-Box Optimization for Biological Sequence Design
Journal-ref: Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[228]  arXiv:2006.03745 (replaced) [pdf, other]
Title: Understanding Finite-State Representations of Recurrent Policy Networks
Comments: ICML 2020 XXAI
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[229]  arXiv:2006.03968 (replaced) [pdf, other]
Title: Generative Design of Hardware-aware DNNs
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[230]  arXiv:2006.03980 (replaced) [pdf, other]
Title: Fast and Powerful Conditional Randomization Testing via Distillation
Comments: This paper has been merged with a parallel work arXiv:2006.08482 by Eugene Katsevich and Aaditya Ramdas
Subjects: Methodology (stat.ME)
[231]  arXiv:2006.04131 (replaced) [pdf, other]
Title: Deep Graph Contrastive Representation Learning
Comments: Work in progress; updated experiments
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[232]  arXiv:2006.04588 (replaced) [pdf, ps, other]
Title: EDCompress: Energy-Aware Model Compression for Dataflows
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[233]  arXiv:2006.05301 (replaced) [pdf, other]
Title: VAEs in the Presence of Missing Data
Comments: Accepted to ICML Workshop on the Art of Learning with Missing Values (Artemiss), 17 July 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[234]  arXiv:2006.06599 (replaced) [pdf, other]
Title: Robust model training and generalisation with Studentising flows
Comments: 9 pages, 8 figures, accepted for publication at INNF+ 2020 (Second ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[235]  arXiv:2006.07314 (replaced) [pdf, other]
Title: Zeroth-order Deterministic Policy Gradient
Comments: 18 pages, 5 figures. Fixed some minor oversights in the theoretical development present in the previous version of the manuscript and significantly revised and expanded the simulations sections, both in the main body and supplementary material
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[236]  arXiv:2006.08482 (replaced) [src]
Title: The leave-one-covariate-out conditional randomization test
Comments: This paper has been withdrawn by the authors, because it has now been merged with (and superseded by) a parallel work arXiv:2006.03980 by Molei Liu and Lucas Janson
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[237]  arXiv:2006.08684 (replaced) [pdf, other]
Title: Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY); Machine Learning (stat.ML)
[238]  arXiv:2006.09396 (replaced) [pdf, other]
Title: Density Deconvolution with Normalizing Flows
Comments: Appearing at the second workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (ICML 2020), Virtual Conference. 8 pages, 6 figures, 5 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[239]  arXiv:2006.09635 (replaced) [pdf, other]
Title: Solving Constrained CASH Problems with ADMM
Comments: 7th ICML Workshop on Automated Machine Learning (2020)
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[240]  arXiv:2006.13975 (replaced) [pdf, other]
Title: Estimation and Comparison of Correlation-based Measures of Concordance
Comments: 35 pages, 1 figure
Subjects: Statistics Theory (math.ST)
[241]  arXiv:2006.14217 (replaced) [pdf, other]
Title: Stratified stochastic variational inference for high-dimensional network factor model
Comments: 25 pages, 1 figures. Corrected compilation issues and minor typos
Subjects: Computation (stat.CO); Methodology (stat.ME)
[242]  arXiv:2006.14937 (replaced) [pdf, ps, other]
Title: Joints in Random Forests
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[243]  arXiv:2006.15107 (replaced) [pdf, other]
Title: Building powerful and equivariant graph neural networks with structural message-passing
Comments: Submitted to Neurips 2020. 18 pages, 5 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[244]  arXiv:2006.15785 (replaced) [pdf, other]
Title: A No-Free-Lunch Theorem for MultiTask Learning
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[245]  arXiv:2006.15799 (replaced) [src]
Title: Cluster-Based Partitioning of Convolutional Neural Networks, A Solution for Computational Energy and Complexity Reduction
Comments: paper need to be majorly revised
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[246]  arXiv:2006.16193 (replaced) [pdf, other]
Title: Spectral Gap of Replica Exchange Langevin Diffusion on Mixture Distributions
Subjects: Probability (math.PR); Statistics Theory (math.ST)
[247]  arXiv:2007.01231 (replaced) [pdf, other]
Title: Software Engineering Event Modeling using Relative Time in Temporal Knowledge Graphs
Comments: 11 pages, 1 figure. 37th International Conference on Machine Learning (ICML 2020) - Workshop on Graph Representation Learning and Beyond
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML)
[248]  arXiv:2007.01285 (replaced) [pdf, other]
Title: Deep Learning for Neuroimaging-based Diagnosis and Rehabilitation of Autism Spectrum Disorder: A Review
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[249]  arXiv:2007.01888 (replaced) [pdf, other]
Title: Inference on the change point in high dimensional time series models via plug in least squares
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
[250]  arXiv:2007.03016 (replaced) [pdf]
Title: Multiple Imputation with Massive Data: an Application to the Panel Study of Income Dynamics
Subjects: Methodology (stat.ME); Applications (stat.AP)
[251]  arXiv:2007.03383 (replaced) [pdf, other]
Title: RGCF: Refined Graph Convolution Collaborative Filtering with concise and expressive embedding
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
[252]  arXiv:2007.04439 (replaced) [pdf, other]
Title: Combining Differentiable PDE Solvers and Graph Neural Networks for Fluid Flow Prediction
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Machine Learning (stat.ML)
[253]  arXiv:2007.04728 (replaced) [pdf, other]
Title: Let the Data Choose its Features: Differentiable Unsupervised Feature Selection
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[254]  arXiv:2007.05305 (replaced) [pdf, other]
Title: ExpertNet: Adversarial Learning and Recovery Against Noisy Labels
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[255]  arXiv:2007.05424 (replaced) [pdf, other]
Title: High heritability does not imply accurate prediction under the small additive effects hypothesis
Authors: Arthur Frouin (1), Claire Dandine-Roulland (1), Morgane Pierre-Jean (1), Jean-François Deleuze (1), Christophe Ambroise (2), Edith Le Floch (1) ((1) CNRGH, Institut Jacob, CEA - Université Paris-Saclay, (2) LaMME, Université Paris-Saclay, CNRS, Université d'Évry val d'Essonne)
Subjects: Methodology (stat.ME); Genomics (q-bio.GN)
[ total of 255 entries: 1-255 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2007, contact, help  (Access key information)