Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 10 Jul 20
 [1] arXiv:2007.04358 [pdf, other]

Title: Generalised Bayes Updates with $f$divergences through Probabilistic ClassifiersSubjects: Methodology (stat.ME); Computation (stat.CO)
A stream of algorithmic advances has steadily increased the popularity of the Bayesian approach as an inference paradigm, both from the theoretical and applied perspective. Even with apparent successes in numerous application fields, a rising concern is the robustness of Bayesian inference in the presence of model misspecification, which may lead to undesirable extreme behavior of the posterior distributions for large sample sizes. Generalized belief updating with a loss function represents a central principle to making Bayesian inference more robust and less vulnerable to deviations from the assumed model. Here we consider such updates with $f$divergences to quantify a discrepancy between the assumed statistical model and the probability distribution which generated the observed data. Since the latter is generally unknown, estimation of the divergence may be viewed as an intractable problem. We show that the divergence becomes accessible through the use of probabilistic classifiers that can leverage an estimate of the ratio of two probability distributions even when one or both of them is unknown. We demonstrate the behavior of generalized belief updates for various specific choices under the $f$divergence family. We show that for specific divergence functions such an approach can even improve on methods evaluating the correct model likelihood function analytically.
 [2] arXiv:2007.04386 [pdf, other]

Title: Contour Models for Boundaries Enclosing StarShaped and Approximately StarShaped PolygonsSubjects: Methodology (stat.ME)
Boundaries on spatial fields divide regions with particular features from surrounding background areas. These boundaries are often described with contour lines. To measure and record these boundaries, contours are often represented as ordered sequences of spatial points that connect to form a line. Methods to identify boundary lines from interpolated spatial fields are wellestablished. Less attention has been paid to how to model sequences of connected spatial points. For data of the latter form, we introduce the Gaussian Starshaped Contour Model (GSCM). GSMCs generate sequences of spatial points via generating sets of distances in various directions from a fixed starting point. The GSCM is designed for modeling contours that enclose regions that are starshaped polygons or approximately starshaped polygons. Metrics are introduced to assess the extent to which a polygon deviates from starshaped. Simulation studies illustrate the performance of the GSCM in various scenarios and an analysis of Arctic sea ice edge contour data highlights how GSCMs can be applied to observational data.
 [3] arXiv:2007.04387 [pdf, other]

Title: Double spike Dirichlet priors for structured weightingSubjects: Methodology (stat.ME)
Assigning weights to a large pool of objects is a fundamental task in a wide variety of applications. In this article, we introduce a concept of structured highdimensional probability simplexes, whose most components are zero or near zero and the remaining ones are close to each other. Such structure is well motivated by 1) highdimensional weights that are common in modern applications, and 2) ubiquitous examples in which equal weightsdespite their simplicityoften achieve favorable or even stateoftheart predictive performances. This particular structure, however, presents unique challenges both computationally and statistically. To address these challenges, we propose a new class of double spike Dirichlet priors to shrink a probability simplex to one with the desired structure. When applied to ensemble learning, such priors lead to a Bayesian method for structured highdimensional ensembles that is useful for forecast combination and improving random forests, while enabling uncertainty quantification. We design efficient Markov chain Monte Carlo algorithms for easy implementation. Posterior contraction rates are established to provide theoretical support. We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters dataset and a UCI dataset.
 [4] arXiv:2007.04441 [pdf, other]

Title: Sparse Regression for Extreme ValuesComments: 3 figuresSubjects: Methodology (stat.ME)
We study the problem of selecting features associated with extreme values in high dimensional linear regression. Normally, in linear modeling problems, the presence of abnormal extreme values or outliers is considered an anomaly which should either be removed from the data or remedied using robust regression methods. In many situations, however, the extreme values in regression modeling are not outliers but rather the signals of interest; consider traces from spiking neurons, volatility in finance, or extreme events in climate science, for example. In this paper, we propose a new method for sparse highdimensional linear regression for extreme values which is motivated by the Subbotin, or generalized normal distribution. This leads us to utilize an $\ell_p$ norm loss where $p$ is an even integer greater than two; we demonstrate that this loss increases the weight on extreme values. We prove consistency and variable selection consistency for the $\ell_p$ norm regression with a Lasso penalty, which we term the Extreme Lasso. Through simulation studies and realworld data data examples, we show that this method outperforms other methods currently used in the literature for selecting features of interest associated with extreme values in highdimensional regression.
 [5] arXiv:2007.04443 [pdf, other]

Title: Minimax Efficient FiniteDifference Stochastic Gradient Estimators Using BlackBox Function EvaluationsSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
We consider stochastic gradient estimation using noisy blackbox function evaluations. A standard approach is to use the finitedifference method or its variants. While natural, it is open to our knowledge whether its statistical accuracy is the best possible. This paper argues so by showing that central finitedifference is a nearly minimax optimal zerothorder gradient estimator, among both the class of linear estimators and the much larger class of all (nonlinear) estimators.
 [6] arXiv:2007.04445 [pdf, ps, other]

Title: Estimation and inference on highdimensional individualized treatment rule in observational data using splitandpooled decorrelated scoreComments: 10 pages, 6 figures, 2 tablesSubjects: Methodology (stat.ME)
With the increasing adoption of electronic health records, there is an increasing interest in developing individualized treatment rules (ITRs), which recommend treatments according to patients' characteristics, from large observational data. However, there is a lack of valid inference procedures for ITRs developed from this type of data in the presence of highdimensional covariates. In this work, we develop a penalized doubly robust method to estimate the optimal ITRs from highdimensional data. We propose a splitandpooled decorrelated score to construct hypothesis tests and confidence intervals. Our proposal utilizes the data splitting to conquer the slow convergence rate of nuisance parameter estimations, such as nonparametric methods for outcome regression or propensity models. We establish the limiting distributions of the splitandpooled decorrelated score test and the corresponding onestep estimator in highdimensional setting. Simulation and real data analysis are conducted to demonstrate the superiority of the proposed method.
 [7] arXiv:2007.04446 [pdf, other]

Title: StructureBoost: Efficient Gradient Boosting for Structured Categorical VariablesAuthors: Brian LucenaSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO)
Gradient boosting methods based on Structured Categorical Decision Trees (SCDT) have been demonstrated to outperform numerical and onehotencodings on problems where the categorical variable has a known underlying structure. However, the enumeration procedure in the SCDT is infeasible except for categorical variables with low or moderate cardinality. We propose and implement two methods to overcome the computational obstacles and efficiently perform Gradient Boosting on complex structured categorical variables. The resulting package, called StructureBoost, is shown to outperform established packages such as CatBoost and LightGBM on problems with categorical predictors that contain sophisticated structure. Moreover, we demonstrate that StructureBoost can make accurate predictions on unseen categorical values due to its knowledge of the underlying structure.
 [8] arXiv:2007.04470 [pdf, other]

Title: Finite mixture models are typically inconsistent for the number of componentsComments: 16 pages, 1 figureSubjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. Practitioners commonly use a Dirichlet process mixture model (DPMM) for this purpose; in particular, they count the number of clustersi.e. components containing at least one data pointin the DPMM posterior. But Miller and Harrison (2013) warn that the DPMM clustercount posterior is severely inconsistent for the number of latent components when the data are truly generated from a finite mixture; that is, the clustercount posterior probability on the true generating number of components goes to zero in the limit of infinite data. A potential alternative is to use a finite mixture model (FMM) with a prior on the number of components. Past work has shown the resulting FMM componentcount posterior is consistent. But existing results crucially depend on the assumption that the component likelihoods are perfectly specified. In practice, this assumption is unrealistic, and empirical evidence (Miller and Dunson, 2019) suggests that the FMM posterior on the number of components is sensitive to the likelihood choice. In this paper, we add rigor to dataanalysis folk wisdom by proving that under even the slightest model misspecification, the FMM posterior on the number of components is ultraseverely inconsistent: for any finite $k \in \mathbb{N}$, the posterior probability that the number of components is $k$ converges to 0 in the limit of infinite data. We illustrate practical consequences of our theory on simulated and real data sets.
 [9] arXiv:2007.04486 [pdf, other]

Title: Making learning more transparent using conformalized performance predictionAuthors: Matthew J. HollandSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In this work, we study some novel applications of conformal inference techniques to the problem of providing machine learning procedures with more transparent, accurate, and practical performance guarantees. We provide a natural extension of the traditional conformal prediction framework, done in such a way that we can make valid and wellcalibrated predictive statements about the future performance of arbitrary learning algorithms, when passed an asyet unseen training set. In addition, we include some nascent empirical examples to illustrate potential applications.
 [10] arXiv:2007.04509 [pdf, other]

Title: Supervised Robust Profile ClusteringComments: 30 pages, 3 figures, Supplementary materials (5 figures, 1 table)Subjects: Applications (stat.AP)
In many studies, dimension reduction methods are used to profile participant characteristics. For example, nutrition epidemiologists often use latent class models to characterize dietary patterns. One challenge with such approaches is understanding subtle variations in patterns across subpopulations. Robust Profile Clustering (RPC) provides a dual flexible clustering model, where participants may cluster at two levels: (1) globally, where participants are clustered according to behaviors shared across an overall population, and (2) locally, where individual behaviors can deviate and cluster in subpopulations. We link clusters to a health outcome using a joint model. This model is used to derive dietary patterns in the United States and evaluate case proportion of orofacial clefts. Using dietary consumption data from the 19972009 National Birth Defects Prevention Study, a populationbased casecontrol study, we determine how maternal dietary profiles are associated with an orofacial cleft among offspring. Results indicated that mothers who consumed a high proportion of fruits and vegetables compared to meats, such as chicken and beef, had lower odds delivering a child with an orofacial cleft defect.
 [11] arXiv:2007.04511 [pdf, ps, other]

Title: Causal Effects in Twin Studies: the Role of InterferenceSubjects: Methodology (stat.ME)
The use of twins designs to address causal questions is becoming increasingly popular. A standard assumption is that there is no interference between twinsthat is, no twin's exposure has a causal impact on their cotwin's outcome. However, there may be settings in which this assumption would not hold, and this would (1) impact the causal interpretation of parameters obtained by commonly used existing methods; (2) change which effects are of greatest interest; and (3) impact the conditions under which we may estimate these effects. We explore these issues, and we derive semiparametric efficient estimators for causal effects in the presence of interference between twins. Using data from the Minnesota Twin Family Study, we apply our estimators to assess whether twins' consumption of alcohol in early adolescence may have a causal impact on their cotwins' substance use later in life.
 [12] arXiv:2007.04518 [pdf, other]

Title: Robust Geodesic RegressionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
This paper studies robust regression for data on Riemannian manifolds. Geodesic regression is the generalization of linear regression to a setting with a manifoldvalued dependent variable and one or more realvalued independent variables. The existing work on geodesic regression uses the sumofsquared errors to find the solution, but as in the classical Euclidean case, the leastsquares method is highly sensitive to outliers. In this paper, we use Mtype estimators, including the $L_1$, Huber and Tukey biweight estimators, to perform robust geodesic regression, and describe how to calculate the tuning parameters for the latter two. We also show that, on compact symmetric spaces, all Mtype estimators are maximum likelihood estimators, and argue for the overall superiority of the $L_1$ estimator over the $L_2$ and Huber estimators on highdimensional manifolds and over the Tukey biweight estimator on compact highdimensional manifolds. Results from numerical examples, including analysis of real neuroimaging data, demonstrate the promising empirical properties of the proposed approach.
 [13] arXiv:2007.04558 [pdf, other]

Title: Beyond Scalar Treatment: A Causal Analysis of Hippocampal Atrophy on Behavioral Deficits in Alzheimer's StudiesSubjects: Applications (stat.AP); Methodology (stat.ME)
Alzheimer's disease is a progressive form of dementia that results in problems with memory, thinking and behavior. It often starts with abnormal aggregation and deposition of betaamyloid and tau, followed by neuronal damage such as atrophy of the hippocampi, and finally leads to behavioral deficits. Despite significant progress in finding biomarkers associated with behavioral deficits, the underlying causal mechanism remains largely unknown. Here we investigate whether and how hippocampal atrophy contributes to behavioral deficits based on a largescale observational study conducted by the Alzheimer's Disease Neuroimaging Initiative (ADNI). As a key novelty, we use 2D representations of the hippocampi, which allows us to better understand atrophy associated with different subregions. It, however, introduces methodological challenges as existing causal inference methods are not well suited for exploiting structural information embedded in the 2D exposures. Moreover, our data contain more than 6 million clinical and genetic covariates, necessitating appropriate confounder selection methods. We hence develop a novel twostep causal inference approach tailored for our ADNI data application. Analysis results suggest that atrophy of CA1 and subiculum subregions may cause more severe behavioral deficits compared to CA2 and CA3 subregions. We further evaluate our method using simulations and provide theoretical guarantees.
 [14] arXiv:2007.04586 [pdf, other]

Title: $K$Means and Gaussian Mixture Modeling with a Separation ConstraintComments: 16 pages, 6 tables, 1 figure with 3 subfiguresSubjects: Computation (stat.CO)
We consider the problem of clustering with $K$means and Gaussian mixture models with a constraint on the separation between the centers in the context of realvalued data. We first propose a dynamic programming approach to solving the $K$means problem with a separation constraint on the centers, building on (Wang and Song, 2011). In the context of fitting a Gaussian mixture model, we then propose an EM algorithm that incorporates such a constraint. A separation constraint can help regularize the output of a clustering algorithm, and we provide both simulated and real data examples to illustrate this point.
 [15] arXiv:2007.04727 [pdf, other]

Title: Supplemental Studies for Simultaneous GoodnessofFit TestingAuthors: Wolfgang RolkeSubjects: Applications (stat.AP); High Energy Physics  Experiment (hepex)
Testing to see whether a given data set comes from some specified distribution is among the oldest types of problems in Statistics. Many such tests have been developed and their performance studied. The general result has been that while a certain test might perform well, aka have good power, in one situation it will fail badly in others. This is not a surprise given the great many ways in which a distribution can differ from the one specified in the null hypothesis. It is therefore very difficult to decide a priori which test to use. The obvious solution is not to rely on any one test but to run several of them. This however leads to the problem of simultaneous inference, that is, if several tests are done even if the null hypothesis were true, one of them is likely to reject it anyway just by random chance. In this paper we present a method that yields a p value that is uniform under the null hypothesis no matter how many tests are run. This is achieved by adjusting the p value via simulation. We present a number of simulation studies that show the uniformity of the p value and others that show that this test is superior to any one test if the power is averaged over a large number of cases.
 [16] arXiv:2007.04767 [pdf, other]

Title: Nonproportional hazards in immunooncology: is an old perspective needed?Authors: Dominic MagirrSubjects: Applications (stat.AP)
A fundamental concept in twoarm nonparametric survival analysis is the comparison of observed versus expected numbers of events on one of the treatment arms (the choice of which arm is arbitrary), where the expectation is taken assuming that the true survival curves in the two arms are identical. This concept is at the heart of the countingprocess theory that provides a rigorous basis for methods such as the logrank test. It is natural, therefore, to maintain this perspective when extending the logrank test to deal with nonproportional hazards, for example by considering a weighted sum of the "observed  expected" terms, where larger weights are given to time periods where the hazard ratio is expected to favour the experimental treatment. In doing so, however, one may stumble across some rather subtle issues, related to the difficulty in ascribing a causal interpretation to hazard ratios, that may lead to strange conclusions. An alternative approach is to view nonparametric survival comparisons as permutation tests. With this perspective, one can easily improve on the efficiency of the logrank test, whilst thoroughly controlling the false positive rate. In particular, for the field of immunooncology, where researchers often anticipate a delayed treatment effect, sample sizes could be substantially reduced without loss of power.
 [17] arXiv:2007.04791 [pdf, other]

Title: varTestnlme: Variance Components Testing in Linear and Nonlinear Mixedeffects ModelsSubjects: Methodology (stat.ME); Computation (stat.CO)
The issue of variance components testing arises naturally when building mixedeffects models, to decide which effects should be modeled as fixed or random. While tests for fixed effects are available in R for models fitted with lme4, tools are missing when it comes to random effects. The varTestnlme package for R aims at filling this gap. It allows to test whether any subset of the variances and covariances are equal to zero using likelihood ratio tests. It also offers the possibility to test simultaneously for fixed effects and variance components. It can be used for linear, generalized linear or nonlinear mixedeffects models fitted via lme4, nlme or saemix. Theoretical properties of the used likelihood ratio test are recalled and examples based on different real datasets using different mixed models are provided.
 [18] arXiv:2007.04799 [pdf, other]

Title: Dissimilarity functions for rankbased hierarchical clustering of continuous variablesComments: 36 pages, 10 figures, 7 tablesSubjects: Methodology (stat.ME)
We present a theoretical framework for a (copulabased) notion of dissimilarity between subsets of continuous random variables and study its main properties. Special attention is paid to those properties that are prone to the hierarchical agglomerative methods, such as reducibility. We hence provide insights for the use of such a measure in clustering algorithms, which allows us to cluster random variables according to the association/dependence among them, and present a simulation study. Real case studies illustrate the whole methodology.
 [19] arXiv:2007.04803 [pdf, other]

Title: Online Approximate Bayesian learningComments: 76 pages(including an Appendix of 43 pages), 3 figures, 2 tablesSubjects: Machine Learning (stat.ML); Statistics Theory (math.ST); Computation (stat.CO)
We introduce in this work a new approach for online approximate Bayesian learning. The main idea of the proposed method is to approximate the sequence $(\pi_t)_{t\geq 1}$ of posterior distributions by a sequence $(\tilde{\pi}_t)_{t\geq 1}$ which (i) can be estimated in an online fashion using sequential Monte Carlo methods and (ii) is shown to converge to the same distribution as the sequence $(\pi_t)_{t\geq 1}$, under weak assumptions on the statistical model at hand. In its simplest version, $(\tilde{\pi}_t)_{t\geq 1}$ is the sequence of filtering distributions associated to a particular statespace model, which can therefore be approximated using a standard particle filter algorithm. We illustrate on several challenging examples the benefits of this approach for approximate Bayesian parameter inference, and with one real data example we show that its online predictive performance can significantly outperform that of stochastic gradient descent and streaming variational Bayes.
 [20] arXiv:2007.04813 [pdf, other]

Title: GraphBased Continual LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Despite significant advances, continual learning models still suffer from catastrophic forgetting when exposed to incrementally available data from nonstationary distributions. Rehearsal approaches alleviate the problem by maintaining and replaying a small episodic memory of previous samples, often implemented as an array of independent memory slots. In this work, we propose to augment such an array with a learnable random graph that captures pairwise similarities between its samples, and use it not only to learn new tasks but also to guard against forgetting. Empirical results on several benchmark datasets show that our model consistently outperforms recently proposed baselines for taskfree continual learning.
 [21] arXiv:2007.04951 [pdf, other]

Title: Adding experimental treatment arms to MultiArm MultiStage platform trials in progressComments: 22 pages, 2 figuresSubjects: Applications (stat.AP)
MultiArm MultiStage (MAMS) platform trials are an efficient tool for the comparison of several treatments with a control. Suppose a new treatment becomes available at some stage of a trial already in progress. There are clear benefits to adding the treatment to the current trial for comparison, but how?
As flexible as the MAMS framework is, it requires preplanned options for how the trial proceeds at each stage in order to control the familywise error rate. Thus, as with many adaptive designs, it is difficult to make unplanned design modifications. The conditional error approach is a tool that allows unplanned design modifications while maintaining the overall error rate. In this work we use the conditional error approach to allow adding new arms to a MAMS trial in progress.
Using a single stage twoarm trial, we demonstrate the principals of incorporating additional hypotheses into the testing structure. With this framework for adding treatments and hypotheses in place, we show how to update the testing procedure for a MAMS trial in progress to incorporate additional treatment arms. Through simulation, we illustrate the operating characteristics of such procedures.  [22] arXiv:2007.04956 [pdf, other]

Title: Bayesian Computation in Dynamic Latent Factor ModelsComments: 21 pages, 7 figuresSubjects: Methodology (stat.ME); Computation (stat.CO)
Bayesian computation for filtering and forecasting analysis is developed for a broad class of dynamic models. The ability to scaleup such analyses in nonGaussian, nonlinear multivariate time series models is advanced through the introduction of a novel copula construction in sequential filtering of coupled sets of dynamic generalized linear models. The new copula approach is integrated into recently introduced multiscale models in which univariate time series are coupled via nonlinear forms involving dynamic latent factors representing crossseries relationships. The resulting methodology offers dramatic speedup in online Bayesian computations for sequential filtering and forecasting in this broad, flexible class of multivariate models. Two examples in nonlinear models for very heterogeneous time series of nonnegative counts demonstrate massive computational efficiencies relative to existing simulationbased methods, while defining similar filtering and forecasting outcomes.
Crosslists for Fri, 10 Jul 20
 [23] arXiv:2007.04393 (crosslist from cs.LG) [pdf, other]

Title: Adaptive Regret for Control of TimeVarying DynamicsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider regret minimization for online control with timevarying linear dynamical systems. The metric of performance we study is adaptive policy regret, or regret compared to the best policy on {\it any interval in time}. We give an efficient algorithm that attains firstorder adaptive regret guarantees for the setting of online convex optimization with memory. We also show that these firstorder bounds are nearly tight.
This algorithm is then used to derive a controller with adaptive regret guarantees that provably competes with the best linear controller on any interval in time. We validate these theoretical findings experimentally on simulations of timevarying dynamics and disturbances.  [24] arXiv:2007.04395 (crosslist from cs.LG) [pdf, other]

Title: Hierarchical Graph Matching Networks for Deep Graph Similarity LearningAuthors: Xiang Ling, Lingfei Wu, Saizhuo Wang, Tengfei Ma, Fangli Xu, Alex X. Liu, Chunming Wu, Shouling JiComments: 17 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
While the celebrated graph neural networks yield effective representations for individual nodes of a graph, there has been relatively less success in extending to deep graph similarity learning. Recent work has considered either globallevel graphgraph interactions or lowlevel nodenode interactions, ignoring the rich crosslevel interactions (e.g., between nodes and a whole graph). In this paper, we propose a Hierarchical Graph Matching Network (HGMN) for computing the graph similarity between any pair of graphstructured objects. Our model jointly learns graph representations and a graph matching metric function for computing graph similarities in an endtoend fashion. The proposed HGMN model consists of a nodegraph matching network for effectively learning crosslevel interactions between nodes of a graph and a whole graph, and a siamese graph neural network for learning globallevel interactions between two graphs. Our comprehensive experiments demonstrate that HGMN consistently outperforms stateoftheart graph matching network baselines for both classification and regression tasks.
 [25] arXiv:2007.04410 (crosslist from cs.SI) [pdf, other]

Title: Network Modelling of Criminal Collaborations with Dynamic Bayesian Steady EvolutionsSubjects: Social and Information Networks (cs.SI); Applications (stat.AP); Machine Learning (stat.ML)
The threat status and criminal collaborations of potential terrorists are hidden but give rise to observable behaviours and communications. Terrorists, when acting in concert, need to communicate to organise their plots. The authorities utilise such observable behaviour and communication data to inform their investigations and policing. We present a dynamic latent network model that integrates realtime communications data with prior knowledge on individuals. This model estimates and predicts the latent strength of criminal collaboration between individuals to assist in the identification of potential cells and the measurement of their threat levels. We demonstrate how, by assuming certain plausible conditional independences across the measurements associated with this population, the network model can be combined with models of individual suspects to provide fast transparent algorithms to predict group attacks. The methods are illustrated using a simulated example involving the threat posed by a cell suspected of plotting an attack.
 [26] arXiv:2007.04431 (crosslist from cs.LG) [pdf]

Title: Understanding the effect of hyperparameter optimization on machine learning models for structure design problemsComments: 41 pages, 15 figures,7 tables, under revision in the Computeraided designSubjects: Machine Learning (cs.LG); Applications (stat.AP)
To relieve the computational cost of design evaluations using expensive finite element simulations, surrogate models have been widely applied in computeraided engineering design. Machine learning algorithms (MLAs) have been implemented as surrogate models due to their capability of learning the complex interrelations between the design variables and the response from big datasets. Typically, an MLA regression model contains model parameters and hyperparameters. The model parameters are obtained by fitting the training data. Hyperparameters, which govern the model structures and the training processes, are assigned by users before training. There is a lack of systematic studies on the effect of hyperparameters on the accuracy and robustness of the surrogate model. In this work, we proposed to establish a hyperparameter optimization (HOpt) framework to deepen our understanding of the effect. Four frequently used MLAs, namely Gaussian Process Regression (GPR), Support Vector Machine (SVM), Random Forest Regression (RFR), and Artificial Neural Network (ANN), are tested on four benchmark examples. For each MLA model, the model accuracy and robustness before and after the HOpt are compared. The results show that HOpt can generally improve the performance of the MLA models in general. HOpt leads to few improvements in the MLAs accuracy and robustness for complex problems, which are featured by highdimensional mixedvariable design space. The HOpt is recommended for the design problems with intermediate complexity. We also investigated the additional computational costs incurred by HOpt. The training cost is closely related to the MLA architecture. After HOpt, the training cost of ANN and RFR is increased more than that of the GPR and SVM. To sum up, this study benefits the selection of HOpt method for the different types of design problems based on their complexity.
 [27] arXiv:2007.04432 (crosslist from cs.LG) [pdf, other]

Title: Collapsing Bandits and Their Application to Public Health InterventionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose and study Collpasing Bandits, a new restless multiarmed bandit (RMAB) setting in which each arm follows a binarystate Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are natural models for many healthcare domains in which workers must simultaneously monitor patients and deliver interventions in a way that maximizes the health of their patient cohort. Our main contributions are as follows: (i) Building on the Whittle index technique for RMABs, we derive conditions under which the Collapsing Bandits problem is indexable. Our derivation hinges on novel conditions that characterize when the optimal policies may take the form of either "forward" or "reverse" threshold policies. (ii) We exploit the optimality of threshold policies to build fast algorithms for computing the Whittle index, including a closedform. (iii) We evaluate our algorithm on several data distributions including data from a realworld healthcare task in which a worker must monitor and deliver interventions to maximize their patients' adherence to tuberculosis medication. Our algorithm achieves a 3orderofmagnitude speedup compared to stateoftheart RMAB techniques while achieving similar performance.
 [28] arXiv:2007.04439 (crosslist from cs.LG) [pdf, other]

Title: Combining Differentiable PDE Solvers and Graph Neural Networks for Fluid Flow PredictionComments: ICML 2020Subjects: Machine Learning (cs.LG); Computational Physics (physics.compph); Machine Learning (stat.ML)
Solving large complex partial differential equations (PDEs), such as those that arise in computational fluid dynamics (CFD), is a computationally expensive process. This has motivated the use of deep learning approaches to approximate the PDE solutions, yet the simulation results predicted from these approaches typically do not generalize well to truly novel scenarios. In this work, we develop a hybrid (graph) neural network that combines a traditional graph convolutional network with an embedded differentiable fluid dynamics simulator inside the network itself. By combining an actual CFD simulator (run on a much coarser resolution representation of the problem) with the graph network, we show that we can both generalize well to new situations and benefit from the substantial speedup of neural network CFD predictions, while also substantially outperforming the coarse CFD simulation alone.
 [29] arXiv:2007.04440 (crosslist from cs.LG) [pdf, other]

Title: On the relationship between class selectivity, dimensionality, and robustnessSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
While the relative tradeoffs between sparse and distributed representations in deep neural networks (DNNs) are wellstudied, less is known about how these tradeoffs apply to representations of semanticallymeaningful information. Class selectivity, the variability of a unit's responses across data classes or dimensions, is one way of quantifying the sparsity of semantic representations. Given recent evidence showing that class selectivity can impair generalization, we sought to investigate whether it also confers robustness (or vulnerability) to perturbations of input data. We found that mean class selectivity predicts vulnerability to naturalistic corruptions; networks regularized to have lower levels of class selectivity are more robust to corruption, while networks with higher class selectivity are more vulnerable to corruption, as measured using Tiny ImageNetC and CIFAR10C. In contrast, we found that class selectivity increases robustness to multiple types of gradientbased adversarial attacks. To examine this difference, we studied the dimensionality of the change in the representation due to perturbation, finding that decreasing class selectivity increases the dimensionality of this change for both corruption types, but with a notably larger increase for adversarial attacks. These results demonstrate the causal relationship between selectivity and robustness and provide new insights into the mechanisms of this relationship.
 [30] arXiv:2007.04451 (crosslist from cs.LG) [pdf, ps, other]

Title: Online probabilistic label treesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce online probabilistic label trees (OPLTs), an algorithm that trains a label tree classifier in a fully online manner, without any prior knowledge about the number of training instances, their features and labels. OPLTs are characterized by low time and space complexity as well as strong theoretical guarantees. They can be used for online multilabel and multiclass classification, including the very challenging scenarios of one or fewshot learning. We demonstrate the attractiveness of OPLTs in a wide empirical study on several instances of the tasks mentioned above.
 [31] arXiv:2007.04458 (crosslist from cs.LG) [pdf, other]

Title: Robust Bayesian Classification Using an Optimistic Score RatioSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We build a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the classconditional, or contextual, distribution. The optimistic score searches for the distribution that is most plausible to explain the observed outcomes in the testing sample among all distributions belonging to the contextual ambiguity set which is prescribed using a limited structural constraint on the mean vector and the covariance matrix of the underlying contextual distribution. We show that the Bayesian classifier using the optimistic score ratio is conceptually attractive, delivers solid statistical guarantees and is computationally tractable. We showcase the power of the proposed optimistic score ratio classifier on both synthetic and empirical data.
 [32] arXiv:2007.04459 (crosslist from cs.LG) [pdf, other]

Title: MetaLearning OneClass Classification with DeepSets: Application in the Milky WayAuthors: Ademola Oladosu, Tony Xu, Philip Ekfeldt, Brian A. Kelly, Miles Cranmer, Shirley Ho, Adrian M. PriceWhelan, Gabriella ContardoSubjects: Machine Learning (cs.LG); Astrophysics of Galaxies (astroph.GA); Machine Learning (stat.ML)
We explore in this paper the use of neural networks designed for pointclouds and sets on a new metalearning task. We present experiments on the astronomical challenge of characterizing the stellar population of stellar streams. Stellar streams are elongated structures of stars in the outskirts of the Milky Way that form when a (small) galaxy breaks up under the Milky Way's gravitational force. We consider that we obtain, for each stream, a small 'support set' of stars that belongs to this stream. We aim to predict if the other stars in that region of the sky are from that stream or not, similar to oneclass classification. Each "stream task" could also be transformed into a binary classification problem in a highly imbalanced regime (or supervised anomaly detection) by using the much bigger set of "other" stars and considering them as noisy negative examples. We propose to study the problem in the metalearning regime: we expect that we can learn general information on characterizing a stream's stellar population by metalearning across several streams in a fully supervised regime, and transfer it to new streams using only positive supervision. We present a novel use of Deep Sets, a model developed for pointcloud and sets, trained in a metalearning fully supervised regime, and evaluated in a oneclass classification setting. We compare it against Random Forests (with and without selflabeling) in the classic setting of binary classification, retrained for each task. We show that our method outperforms the RandomForests even though the Deep Sets is not retrained on the new tasks, and accesses only a small part of the data compared to the Random Forest. We also show that the model performs well on a reallife stream when including additional finetuning.
 [33] arXiv:2007.04462 (crosslist from cs.LG) [pdf, other]

Title: Scalable Computations of Wasserstein Barycenter via Input Convex Neural NetworksComments: 16 pages,12 figuresSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Wasserstein Barycenter is a principled approach to represent the weighted mean of a given set of probability distributions, utilizing the geometry induced by optimal transport. In this work, we present a novel scalable algorithm to approximate the Wasserstein Barycenters aiming at highdimensional applications in machine learning. Our proposed algorithm is based on the Kantorovich dual formulation of the 2Wasserstein distance as well as a recent neural network architecture, input convex neural network, that is known to parametrize convex functions. The distinguishing features of our method are: i) it only requires samples from the marginal distributions; ii) unlike the existing semidiscrete approaches, it represents the Barycenter with a generative model; iii) it allows to compute the barycenter with arbitrary weights after one training session. We demonstrate the efficacy of our algorithm by comparing it with the stateofart methods in multiple experiments.
 [34] arXiv:2007.04466 (crosslist from cs.LG) [pdf, ps, other]

Title: URSABench: Comprehensive Benchmarking of Approximate Bayesian Inference Methods for Deep Neural NetworksComments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
While deep learning methods continue to improve in predictive accuracy on a wide range of application domains, significant issues remain with other aspects of their performance including their ability to quantify uncertainty and their robustness. Recent advances in approximate Bayesian inference hold significant promise for addressing these concerns, but the computational scalability of these methods can be problematic when applied to largescale models. In this paper, we describe initial work on the development ofURSABench(the Uncertainty, Robustness, Scalability, and Accuracy Benchmark), an opensource suite of benchmarking tools for comprehensive assessment of approximate Bayesian inference methods with a focus on deep learningbased classification tasks
 [35] arXiv:2007.04472 (crosslist from cs.LG) [pdf, other]

Title: Evaluation of Adversarial Training on Different Types of Neural Networks in Deep Learningbased IDSsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Network security applications, including intrusion detection systems of deep neural networks, are increasing rapidly to make detection task of anomaly activities more accurate and robust. With the rapid increase of using DNN and the volume of data traveling through systems, different growing types of adversarial attacks to defeat them create a severe challenge. In this paper, we focus on investigating the effectiveness of different evasion attacks and how to train a resilience deep learningbased IDS using different Neural networks, e.g., convolutional neural networks (CNN) and recurrent neural networks (RNN). We use the minmax approach to formulate the problem of training robust IDS against adversarial examples using two benchmark datasets. Our experiments on different deep learning algorithms and different benchmark datasets demonstrate that defense using an adversarial trainingbased minmax approach improves the robustness against the five wellknown adversarial attack methods.
 [36] arXiv:2007.04480 (crosslist from eess.IV) [pdf, other]

Title: Automatic Probe Movement Guidance for Freehand Obstetric UltrasoundComments: Accepted at the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2020)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
We present the first system that provides realtime probe movement guidance for acquiring standard planes in routine freehand obstetric ultrasound scanning. Such a system can contribute to the worldwide deployment of obstetric ultrasound scanning by lowering the required level of operator expertise. The system employs an artificial neural network that receives the ultrasound video signal and the motion signal of an inertial measurement unit (IMU) that is attached to the probe, and predicts a guidance signal. The network termed USGuideNet predicts either the movement towards the standard plane position (goal prediction), or the next movement that an expert sonographer would perform (action prediction). While existing models for other ultrasound applications are trained with simulations or phantoms, we train our model with realworld ultrasound video and probe motion data from 464 routine clinical scans by 17 accredited sonographers. Evaluations for 3 standard plane types show that the model provides a useful guidance signal with an accuracy of 88.8% for goal prediction and 90.9% for action prediction.
 [37] arXiv:2007.04484 (crosslist from cs.LG) [pdf, other]

Title: Transparency Tools for Fairness in AI (Luskin)Authors: Mingliang Chen, Aria Shahverdi, Sarah Anderson, Se Yong Park, Justin Zhang, Dana DachmanSoled, Kristin Lauter, Min WuSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
We propose new tools for policymakers to use when assessing and correcting fairness and bias in AI algorithms. The three tools are:
 A new definition of fairness called "controlled fairness" with respect to choices of protected features and filters. The definition provides a simple test of fairness of an algorithm with respect to a dataset. This notion of fairness is suitable in cases where fairness is prioritized over accuracy, such as in cases where there is no "ground truth" data, only data labeled with past decisions (which may have been biased).
 Algorithms for retraining a given classifier to achieve "controlled fairness" with respect to a choice of features and filters. Two algorithms are presented, implemented and tested. These algorithms require training two different models in two stages. We experiment with combinations of various types of models for the first and second stage and report on which combinations perform best in terms of fairness and accuracy.
 Algorithms for adjusting model parameters to achieve a notion of fairness called "classification parity". This notion of fairness is suitable in cases where accuracy is prioritized. Two algorithms are presented, one which assumes that protected features are accessible to the model during testing, and one which assumes protected features are not accessible during testing.
We evaluate our tools on three different publicly available datasets. We find that the tools are useful for understanding various dimensions of bias, and that in practice the algorithms are effective in starkly reducing a given observed bias when tested on new data.  [38] arXiv:2007.04504 (crosslist from cs.LG) [pdf, other]

Title: Learning Differential Equations that are Easy to SolveSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Differential equations parameterized by neural networks become expensive to solve numerically as training progresses. We propose a remedy that encourages learned dynamics to be easier to solve. Specifically, we introduce a differentiable surrogate for the time cost of standard numerical solvers, using higherorder derivatives of solution trajectories. These derivatives are efficient to compute with Taylormode automatic differentiation. Optimizing this additional objective trades model performance against the time cost of solving the learned dynamics. We demonstrate our approach by training substantially faster, while nearly as accurate, models in supervised classification, density estimation, and timeseries modelling tasks.
 [39] arXiv:2007.04528 (crosslist from math.OC) [pdf, ps, other]

Title: Higherorder methods for convexconcave minmax optimization and monotone variational inequalitiesSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
We provide improved convergence rates for constrained convexconcave minmax problems and monotone variational inequalities with higherorder smoothness. In minmax settings where the $p^{th}$order derivatives are Lipschitz continuous, we give an algorithm HigherOrderMirrorProx that achieves an iteration complexity of $O(1/T^{\frac{p+1}{2}})$ when given access to an oracle for finding a fixed point of a $p^{th}$order equation. We give analogous rates for the weak monotone variational inequality problem. For $p>2$, our results improve upon the iteration complexity of the firstorder Mirror Prox method of Nemirovski [2004] and the secondorder method of Monteiro and Svaiter [2012]. We further instantiate our entire algorithm in the unconstrained $p=2$ case.
 [40] arXiv:2007.04532 (crosslist from cs.LG) [pdf, other]

Title: A Study of Gradient Variance in Deep LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The impact of gradient noise on training deep models is widely acknowledged but not well understood. In this context, we study the distribution of gradients during training. We introduce a method, Gradient Clustering, to minimize the variance of average minibatch gradient with stratified sampling. We prove that the variance of average minibatch gradient is minimized if the elements are sampled from a weighted clustering in the gradient space. We measure the gradient variance on common deep learning benchmarks and observe that, contrary to common assumptions, gradient variance increases during training, and smaller learning rates coincide with higher variance. In addition, we introduce normalized gradient variance as a statistic that better correlates with the speed of convergence compared to gradient variance.
 [41] arXiv:2007.04540 (crosslist from cs.SI) [pdf, other]

Title: Contrastive Multiple Correspondence Analysis (cMCA): Applying the Contrastive Learning Method to Identify Political SubgroupsComments: Both authors contributed equally to the paper and listed alphabetically. This manuscript is currently under reviewSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Ideal point estimation and dimensionality reduction have long been utilized to simplify and cluster complex, highdimensional political data (e.g., rollcall votes and surveys) for use in analysis and visualization. These methods often work by finding the directions or principal components (PCs) on which either the data varies the most or respondents make the fewest decision errors. However, these PCs, which usually reflect the leftright political spectrum, are sometimes uninformative in explaining significant differences in the distribution of the data (e.g., how to categorize a set of highlymoderate voters). To tackle this issue, we adopt an emerging analysis approach, called contrastive learning. Contrastive learninge.g., contrastive principal component analysis (cPCA)works by first splitting the data by predefined groups, and then deriving PCs on which the target group varies the most but the background group varies the least. As a result, cPCA can often find `hidden' patterns, such as subgroups within the target group, which PCA cannot reveal when some variables are the dominant source of variations across the groups. We contribute to the field of contrastive learning by extending it to multiple correspondence analysis (MCA) to enable an analysis of data often encountered by social scientistsnamely binary, ordinal, and nominal variables. We demonstrate the utility of contrastive MCA (cMCA) by analyzing three different surveys: The 2015 Cooperative Congressional Election Study, 2012 UTokyoAsahi Elite Survey, and 2018 European Social Survey. Our results suggest that, first, for the cases when ordinary MCA depicts differences between groups, cMCA can further identify characteristics that divide the target group; second, for the cases when MCA does not show clear differences, cMCA can successfully identify meaningful directions and subgroups, which traditional methods overlook.
 [42] arXiv:2007.04546 (crosslist from cs.LG) [pdf, other]

Title: Wandering Within a World: Online Contextualized FewShot LearningSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We aim to bridge the gap between typical human and machinelearning environments by extending the standard framework of fewshot learning to an online, continual setting. In this setting, episodes do not have separate training and testing phases, and instead models are evaluated online while learning novel classes. As in real world, where the presence of spatiotemporal context helps us retrieve learned skills in the past, our online fewshot learning setting also features an underlying context that changes throughout time. Object classes are correlated within a context and inferring the correct context can lead to better performance. Building upon this setting, we propose a new fewshot learning dataset based on large scale indoor imagery that mimics the visual experience of an agent wandering within a world. Furthermore, we convert popular fewshot learning approaches into online versions and we also propose a new model named contextual prototypical memory that can make use of spatiotemporal contextual information from the recent past.
 [43] arXiv:2007.04547 (crosslist from math.PR) [pdf, ps, other]

Title: On Optimal Uniform Concentration Inequalities for Discrete Entropy in the Highdimensional SettingAuthors: Yunpeng ZhaoSubjects: Probability (math.PR); Information Theory (cs.IT); Statistics Theory (math.ST)
We prove an exponential decay concentration inequality to bound the tail probability of the difference between the loglikelihood of discrete random variables and the negative entropy. The concentration bound we derive holds uniformly over all parameter values. The new result improves the convergence rate in an earlier work \cite{zhao2020note}, from $(K^2\log K)/n=o(1)$ to $(\log K)^2/n=o(1)$, where $n$ is the sample size and $K$ is the number of possible values of the discrete variable. We further prove that the rate $(\log K)^2/n=o(1)$ is optimal. The results are extended to misspecified loglikelihoods for grouped random variables.
 [44] arXiv:2007.04553 (crosslist from econ.EM) [pdf, ps, other]

Title: Time Series Analysis of COVID19 Infection Curve: A ChangePoint PerspectiveSubjects: Econometrics (econ.EM); Physics and Society (physics.socph); Applications (stat.AP)
In this paper, we model the trajectory of the cumulative confirmed cases and deaths of COVID19 (in log scale) via a piecewise linear trend model. The model naturally captures the phase transitions of the epidemic growth rate via changepoints and further enjoys great interpretability due to its semiparametric nature. On the methodological front, we advance the nascent selfnormalization (SN) technique (Shao, 2010) to testing and estimation of a single changepoint in the linear trend of a nonstationary time series. We further combine the SNbased changepoint test with the NOT algorithm (Baranowski et al., 2019) to achieve multiple changepoint estimation. Using the proposed method, we analyze the trajectory of the cumulative COVID19 cases and deaths for 30 major countries and discover interesting patterns with potentially relevant implications for effectiveness of the pandemic responses by different countries. Furthermore, based on the changepoint detection algorithm and a flexible extrapolation function, we design a simple twostage forecasting scheme for COVID19 and demonstrate its promising performance in predicting cumulative deaths in the U.S.
 [45] arXiv:2007.04568 (crosslist from cs.LG) [pdf, ps, other]

Title: Learning to Bid Optimally and Efficiently in Adversarial Firstprice AuctionsSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Machine Learning (stat.ML)
Firstprice auctions have very recently swept the online advertising industry, replacing secondprice auctions as the predominant auction mechanism on many platforms. This shift has brought forth important challenges for a bidder: how should one bid in a firstprice auction, where unlike in secondprice auctions, it is no longer optimal to bid one's private value truthfully and hard to know the others' bidding behaviors? In this paper, we take an online learning angle and address the fundamental problem of learning to bid in repeated firstprice auctions, where both the bidder's private valuations and other bidders' bids can be arbitrary. We develop the first minimax optimal online bidding algorithm that achieves an $\widetilde{O}(\sqrt{T})$ regret when competing with the set of all Lipschitz bidding policies, a strong oracle that contains a rich set of bidding strategies. This novel algorithm is built on the insight that the presence of a good expert can be leveraged to improve performance, as well as an original hierarchical expertchaining structure, both of which could be of independent interest in online learning. Further, by exploiting the product structure that exists in the problem, we modify this algorithmin its vanilla form statistically optimal but computationally infeasibleto a computationally efficient and space efficient algorithm that also retains the same $\widetilde{O}(\sqrt{T})$ minimax optimal regret guarantee. Additionally, through an impossibility result, we highlight that one is unlikely to compete this favorably with a stronger oracle (than the considered Lipschitz bidding policies). Finally, we test our algorithm on three realworld firstprice auction datasets obtained from Verizon Media and demonstrate our algorithm's superior performance compared to several existing bidding algorithms.
 [46] arXiv:2007.04583 (crosslist from cs.LG) [pdf, other]

Title: Graph Convolutional Networks for Graphs Containing Missing FeaturesSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
Graph Convolutional Network (GCN) has experienced great success in graph analysis tasks. It works by smoothing the node features across the graph. The current GCN models overwhelmingly assume that node feature information is complete. However, realworld graph data are often incomplete and containing missing features. Traditionally, people have to estimate and fill in the unknown features based on imputation techniques and then apply GCN. However, the process of feature filling and graph learning are separated, resulting in degraded and unstable performance. This problem becomes more serious when a large number of features are missing. We propose an approach that adapts GCN to graphs containing missing features. In contrast to traditional strategy, our approach integrates the processing of missing features and graph learning within the same neural network architecture. Our idea is to represent the missing data by Gaussian Mixture Model (GMM) and calculate the expected activation of neurons in the first hidden layer of GCN, while keeping the other layers of the network unchanged. This enables us to learn the GMM parameters and network weight parameters in an endtoend manner. Notably, our approach does not increase the computational complexity of GCN and it is consistent with GCN when the features are complete. We conduct experiments on the node label classification task and demonstrate that our approach significantly outperforms the best imputation based methods by up to 99.43%, 102.96%, 6.97%, 35.36% in four benchmark graphs when a large portion of features are missing. The performance of our approach for the case with a low level of missing features is even superior to GCN for the case with complete features.
 [47] arXiv:2007.04589 (crosslist from cs.LG) [pdf, other]

Title: InfoMaxGAN: Improved Adversarial Image Generation via Information Maximization and Contrastive LearningComments: Initial version was presented at NeurIPS 2019 Workshop on Information Theory and Machine LearningSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
While Generative Adversarial Networks (GANs) are fundamental to many generative modelling applications, they suffer from numerous issues. In this work, we propose a principled framework to simultaneously address two fundamental issues in GANs: catastrophic forgetting of the discriminator and mode collapse of the generator. We achieve this by employing for GANs a contrastive learning and mutual information maximization approach, and perform extensive analyses to understand sources of improvements. Our approach significantly stabilises GAN training and improves GAN performance for image synthesis across five datasets under the same training and evaluation conditions against stateoftheart works. Our approach is simple to implement and practical: it involves only one objective, is computationally inexpensive, and is robust across a wide range of hyperparameters without any tuning. For reproducibility, our code is available at https://github.com/kwotsin/mimicry.
 [48] arXiv:2007.04596 (crosslist from cs.LG) [pdf, ps, other]

Title: Learning OverParametrized TwoLayer ReLU Neural Networks beyond NTKComments: Conference on Learning Theory (COLT) 2020Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider the dynamic of gradient descent for learning a twolayer neural network. We assume the input $x\in\mathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{\star}(x) = a^{\top}W^{\star}x$, where $a\in\mathbb{R}^d$ is a nonnegative vector and $W^{\star} \in\mathbb{R}^{d\times d}$ is an orthonormal matrix. We show that an overparametrized twolayer neural network with ReLU activation, trained by gradient descent from random initialization, can provably learn the ground truth network with population loss at most $o(1/d)$ in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in $d$, has population loss at least $\Omega(1 / d)$.
 [49] arXiv:2007.04604 (crosslist from cs.HC) [pdf]

Title: Building an Automated Gesture Imitation Game for Teenagers with ASDAuthors: Linda Nanan Vallée (ESATIC), Christophe Lohr, Sao Mai Nguyen (IMT Atlantique), Ioannis Kanellos (IMT Atlantique  INFO), O. Asseu (ESATIC)Journalref: Far East Journal of Electronics and Communications, 2019, 22, pp.19  28Subjects: HumanComputer Interaction (cs.HC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Autism spectrum disorder is a neurodevelopmental condition that includes issues with communication and social interactions. People with ASD also often have restricted interests and repetitive behaviors. In this paper we build preliminary bricks of an automated gesture imitation game that will aim at improving social interactions with teenagers with ASD. The structure of the game is presented, as well as support tools and methods for skeleton detection and imitation learning. The game shall later be implemented using an interactive robot.
 [50] arXiv:2007.04612 (crosslist from cs.LG) [pdf, other]

Title: Concept Bottleneck ModelsAuthors: Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy LiangComments: ICML 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We seek to learn models that we can interact with using highlevel concepts: if the model did not think there was a bone spur in the xray, would it still predict severe arthritis? Stateoftheart models today do not typically support the manipulation of concepts like "the existence of bone spurs", as they are trained endtoend to go directly from raw input (e.g., pixels) to output (e.g., arthritis severity). We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label. By construction, we can intervene on these \emph{concept bottleneck models} by editing their predicted concept values and propagating these changes to the final prediction. On xray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard endtoend models, while enabling interpretation in terms of highlevel clinical concepts ("bone spurs") or bird attributes ("wing color"). These models also allow for richer humanmodel interaction: accuracy improves significantly if we can correct model mistakes on concepts at test time.
 [51] arXiv:2007.04618 (crosslist from cs.LG) [pdf, other]

Title: Federated Learning of User Authentication ModelsAuthors: Hossein Hosseini, Sungrack Yun, Hyunsin Park, Christos Louizos, Joseph Soriaga, Max WellingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Machine learningbased User Authentication (UA) models have been widely deployed in smart devices. UA models are trained to map input data of different users to highly separable embedding vectors, which are then used to accept or reject new inputs at test time. Training UA models requires having direct access to the raw inputs and embedding vectors of users, both of which are privacysensitive information. In this paper, we propose Federated User Authentication (FedUA), a framework for privacypreserving training of UA models. FedUA adopts federated learning framework to enable a group of users to jointly train a model without sharing the raw inputs. It also allows users to generate their embeddings as random binary vectors, so that, unlike the existing approach of constructing the spread out embeddings by the server, the embedding vectors are kept private as well. We show our method is privacypreserving, scalable with number of users, and allows new users to be added to training without changing the output layer. Our experimental results on the VoxCeleb dataset for speaker verification shows our method reliably rejects data of unseen users at very high true positive rates.
 [52] arXiv:2007.04630 (crosslist from cs.LG) [pdf, other]

Title: MaximumandConcatenation NetworksComments: Accepted by ICML2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
While successful in many fields, deep neural networks (DNNs) still suffer from some open problems such as bad local minima and unsatisfactory generalization performance. In this work, we propose a novel architecture called MaximumandConcatenation Networks (MCN) to try eliminating bad local minima and improving generalization ability as well. Remarkably, we prove that MCN has a very nice property; that is, \emph{every local minimum of an $(l+1)$layer MCN can be better than, at least as good as, the global minima of the network consisting of its first $l$ layers}. In other words, by increasing the network depth, MCN can autonomously improve its local minima's goodness, what is more, \emph{it is easy to plug MCN into an existing deep model to make it also have this property}. Finally, under mild conditions, we show that MCN can approximate certain continuous functions arbitrarily well with \emph{high efficiency}; that is, the covering number of MCN is much smaller than most existing DNNs such as deep ReLU. Based on this, we further provide a tight generalization bound to guarantee the inference ability of MCN when dealing with testing samples.
 [53] arXiv:2007.04637 (crosslist from cs.LG) [pdf, other]

Title: IALE: Imitating Active Learner EnsemblesComments: 14 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Active learning (AL) prioritizes the labeling of the most informative data samples. As the performance of wellknown AL heuristics highly depends on the underlying model and data, recent heuristicindependent approaches that are based on reinforcement learning directly learn a policy that makes use of the labeling history to select the next sample. However, those methods typically need a huge number of samples to sufficiently explore the relevant state space. Imitation learning approaches aim to help out but again rely on a given heuristic.
This paper proposes an improved imitation learning scheme that learns a policy for batchmode poolbased AL. This is similar to previously presented multiarmed bandit approaches but in contrast to them we train a policy that imitates the selection of the best expert heuristic at each stage of the AL cycle directly. We use DAGGER to train the policy on a dataset and later apply it to similar datasets. With multiple AL heuristics as experts, the policy is able to reflect the choices of the best AL heuristics given the current state of the active learning process. We evaluate our method on wellknown image datasets and show that we outperform state of the art imitation learners and heuristics.  [54] arXiv:2007.04640 (crosslist from cs.LG) [pdf, other]

Title: A Policy Gradient Method for TaskAgnostic ExplorationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In a rewardfree environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal taskagnostic exploration policy? In this paper, we argue that the entropy of the state distribution induced by limitedhorizon trajectories is a sensible target. Especially, we present a novel and practical policysearch algorithm, Maximum Entropy POLicy optimization (MEPOL), to learn a policy that maximizes a nonparametric, $k$nearest neighbors estimate of the state distribution entropy. In contrast to known methods, MEPOL is completely modelfree as it requires neither to estimate the state distribution of any policy nor to model transition dynamics. Then, we empirically show that MEPOL allows learning a maximumentropy exploration policy in highdimensional, continuouscontrol domains, and how this policy facilitates learning a variety of meaningful rewardbased tasks downstream.
 [55] arXiv:2007.04641 (crosslist from cs.LG) [pdf, other]

Title: Probabilistic Value Selection for Space Efficient ModelComments: Accepted in the 21st IEEE International Conference on Mobile Data Management (July 2020)Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results show that value selection can achieve the balance between accuracy and model size reduction.
 [56] arXiv:2007.04649 (crosslist from cs.LG) [pdf, other]

Title: Learning to Teach with Deep InteractionsAuthors: Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, XiangYang Li, TieYan LiuSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Machine teaching uses a meta/teacher model to guide the training of a student model (which will be used in real tasks) through training data selection, loss function design, etc. Previously, the teacher model only takes shallow/surface information as inputs (e.g., training iteration number, loss and accuracy from training/validation sets) while ignoring the internal states of the student model, which limits the potential of learning to teach. In this work, we propose an improved data teaching algorithm, where the teacher model deeply interacts with the student model by accessing its internal states. The teacher model is jointly trained with the student model using meta gradients propagated from a validation set. We conduct experiments on image classification with clean/noisy labels and empirically demonstrate that our algorithm makes significant improvement over previous data teaching methods.
 [57] arXiv:2007.04662 (crosslist from cs.LG) [pdf, other]

Title: Untapped Potential of Data Augmentation: A Domain Generalization ViewpointComments: 6 pages, ICML 2020 Workshop on Uncertainty and Robustness in Deep LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Data augmentation is a popular preprocessing trick to improve generalization accuracy. It is believed that by processing augmented inputs in tandem with the original ones, the model learns a more robust set of features which are shared between the original and augmented counterparts. However, we show that is not the case even for the best augmentation technique. In this work, we take a Domain Generalization viewpoint of augmentation based methods. This new perspective allowed for probing overfitting and delineating avenues for improvement. Our exploration with the stateofart augmentation method provides evidence that the learned representations are not as robust even towards distortions used during training. This suggests evidence for the untapped potential of augmented examples.
 [58] arXiv:2007.04674 (crosslist from cs.LG) [pdf, other]

Title: Resource Aware Multifidelity Active Learning for Efficient OptimizationComments: 21 pagesSubjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
Traditional methods for black box optimization require a considerable number of evaluations which can be time consuming, unpractical, and often unfeasible for many engineering applications that rely on accurate representations and expensive models to evaluate. Bayesian Optimization (BO) methods search for the global optimum by progressively (actively) learning a surrogate model of the objective function along the search path. Bayesian optimization can be accelerated through multifidelity approaches which leverage multiple blackbox approximations of the objective functions that can be computationally cheaper to evaluate, but still provide relevant information to the search task. Further computational benefits are offered by the availability of parallel and distributed computing architectures whose optimal usage is an open opportunity within the context of active learning. This paper introduces the Resource Aware Active Learning (RAAL) strategy, a multifidelity Bayesian scheme to accelerate the optimization of black box functions. At each optimization step, the RAAL procedure computes the set of best sample locations and the associated fidelity sources that maximize the information gain to acquire during the parallel/distributed evaluation of the objective function, while accounting for the limited computational budget. The scheme is demonstrated for a variety of benchmark problems and results are discussed for both single fidelity and multifidelity settings. In particular we observe that the RAAL strategy optimally seeds multiple points at each iteration allowing for a major speed up of the optimization task.
 [59] arXiv:2007.04676 (crosslist from cs.LG) [pdf, ps, other]

Title: Training Restricted Boltzmann Machines with Binary Synapses using the Bayesian Learning RuleAuthors: Xiangming MengComments: Technical note. Work in progressSubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); Machine Learning (stat.ML)
Restricted Boltzmann machines (RBMs) with lowprecision synapses are much appealing with high energy efficiency. However, training RBMs with binary synapses is challenging due to the discrete nature of synapses. Recently Huang proposed one efficient method to train RBMs with binary synapses by using a combination of gradient ascent and the message passing algorithm under the variational inference framework. However, additional heuristic clipping operation is needed. In this technical note, inspired from Huang's work , we propose one alternative optimization method using the Bayesian learning rule, which is one natural gradient variational inference method. As opposed to Huang's method, we update the natural parameters of the variational symmetric Bernoulli distribution rather than the expectation parameters. Since the natural parameters take values in the entire real domain, no additional clipping is needed. Interestingly, the algorithm in \cite{huang2019data} could be viewed as one firstorder approximation of the proposed algorithm, which justifies its efficacy with heuristic clipping.
 [60] arXiv:2007.04697 (crosslist from cs.DB) [pdf]

Title: Open Data Quality Evaluation: A Comparative Analysis of Open Data in LatviaAuthors: Anastasija NikiforovaComments: 24 pages, 2 tables, 3 figures, Baltic J. Modern ComputingJournalref: Baltic J. Modern Computing, Vol. 6(2018), No. 4, 363386Subjects: Databases (cs.DB); Computers and Society (cs.CY); Information Retrieval (cs.IR); Applications (stat.AP); Computation (stat.CO)
Nowadays open data is entering the mainstream  it is free available for every stakeholder and is often used in business decisionmaking. It is important to be sure data is trustable and errorfree as its quality problems can lead to huge losses. The research discusses how (open) data quality could be assessed. It also covers main points which should be considered developing a data quality management solution. One specific approach is applied to several Latvian open data sets. The research provides a stepbystep open data sets analysis guide and summarizes its results. It is also shown there could exist differences in data quality depending on data supplier (centralized and decentralized data releases) and, unfortunately, trustable data supplier cannot guarantee data quality problems absence. There are also underlined common data quality problems detected not only in Latvian open data but also in open data of 3 European countries.
 [61] arXiv:2007.04713 (crosslist from econ.EM) [pdf, ps, other]

Title: Structural Gaussian mixture vector autoregressive modelAuthors: Savi VirolainenSubjects: Econometrics (econ.EM); Methodology (stat.ME)
A structural version of the Gaussian mixture vector autoregressive model is introduced. The shocks are identified by combining simultaneous diagonalization of the error term covariance matrices with zero and sign constraints. It turns out that this often leads to less restrictive identification conditions than in conventional SVAR models, while some of the constraints are also testable. The accompanying Rpackage gmvarkit provides easytouse tools for estimating the models and applying the introduced methods.
 [62] arXiv:2007.04725 (crosslist from cs.LG) [pdf, other]

Title: EVORL: EvolutionaryDriven Reinforcement LearningAuthors: Ahmed Hallawa, Thorsten Born, Anke Schmeink, Guido Dartmann, Arne Peine, Lukas Martin, Giovanni Iacca, Gusz Eiben, Gerd AscheidComments: 9 pages, 7 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as EvolutionaryDriven Reinforcement Learning (evoRL), embeds the reinforcement learning algorithm in an evolutionary cycle, where we distinctly differentiate between purely evolvable (instinctive) behaviour versus purely learnable behaviour. Furthermore, we propose that this distinction is decided by the evolutionary process, thus allowing evoRL to be adaptive to different environments. In addition, evoRL facilitates learning on environments with rewardless states, which makes it more suited for realworld problems with incomplete information. To show that evoRL leads to stateoftheart performance, we present the performance of different stateoftheart reinforcement learning algorithms when operating within evoRL and compare it with the case when these same algorithms are executed independently. Results show that reinforcement learning algorithms embedded within our evoRL approach significantly outperform the standalone versions of the same RL algorithms on OpenAI Gym control problems with rewardless states constrained by the same computational budget.
 [63] arXiv:2007.04728 (crosslist from cs.LG) [pdf, other]

Title: Let the Data Choose its Features: Differentiable Unsupervised Feature SelectionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Scientific observations often consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, tailored for the task of clustering. We propose a differentiable loss function which combines the graph Laplacian with a gating mechanism based on continuous approximation of Bernoulli random variables. The Laplacian is used to define a scoring term that favors lowfrequency features, while the parameters of the Bernoulli variables are trained to enable selection of the most informative features. We mathematically motivate the proposed approach and demonstrate that in the high noise regime, it is crucial to compute the Laplacian on the gated inputs, rather than on the full feature set. Experimental demonstration of the efficacy of the proposed approach and its advantage over current baselines is provided using several realworld examples.
 [64] arXiv:2007.04731 (crosslist from cs.LG) [pdf, other]

Title: Fast Variational Learning in StateSpace Gaussian Process ModelsComments: To appear in MLSP 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Gaussian process (GP) regression with 1D inputs can often be performed in linear time via a stochastic differential equation formulation. However, for nonGaussian likelihoods, this requires application of approximate inference methods which can make the implementation difficult, e.g., expectation propagation can be numerically unstable and variational inference can be computationally inefficient. In this paper, we propose a new method that removes such difficulties. Building upon an existing method called conjugatecomputation variational inference, our approach enables lineartime inference via Kalman recursions while avoiding numerical instabilities and convergence issues. We provide an efficient JAX implementation which exploits justintime compilation and allows for fast automatic differentiation through large forloops. Overall, our approach leads to fast and stable variational inference in statespace GP models that can be scaled to time series with millions of data points.
 [65] arXiv:2007.04743 (crosslist from qbio.PE) [pdf, other]

Title: Racial Impact on Infections and Deaths due to COVID19 in New York CityComments: 6 pages, 7 figures, 1 tableSubjects: Populations and Evolution (qbio.PE); Physics and Society (physics.socph); Applications (stat.AP)
Redlining is the discriminatory practice whereby institutions avoided investment in certain neighborhoods due to their demographics. Here we explore the lasting impacts of redlining on the spread of COVID19 in New York City (NYC). Using data available through the Home Mortgage Disclosure Act, we construct a redlining index for each NYC census tract via a multilevel logistical model. We compare this redlining index with the COVID19 statistics for each NYC Zip Code Tabulation Area. Accurate mappings of the pandemic would aid the identification of the most vulnerable areas and permit the most effective allocation of medical resources, while reducing ethnic health disparities.
 [66] arXiv:2007.04750 (crosslist from cs.LG) [pdf, other]

Title: Recurrent NeuralLinear Posterior Sampling for NonStationary Contextual BanditsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and noncontextual nonstationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional nonstationary bandit algorithms.
 [67] arXiv:2007.04758 (crosslist from qfin.RM) [pdf, ps, other]

Title: A Bivariate Compound Dynamic Contagion Process for Cyber InsuranceSubjects: Risk Management (qfin.RM); Other Statistics (stat.OT)
As corporates and governments become more digital, they become vulnerable to various forms of cyber attack. Cyber insurance products have been used as risk management tools, yet their pricing does not reflect actual risk, including that of multiple, catastrophic and contagious losses. For the modelling of aggregate losses from cyber events, in this paper we introduce a bivariate compound dynamic contagion process, where the bivariate dynamic contagion process is a point process that includes both externally excited joint jumps, which are distributed according to a shot noise Cox process and two separate selfexcited jumps, which are distributed according to the branching structure of a Hawkes process with an exponential fertility rate, respectively. We analyse the theoretical distributional properties for these processes systematically, based on the piecewise deterministic Markov process developed by Davis (1984) and the univariate dynamic contagion process theory developed by Dassios and Zhao (2011). The analytic expression of the Laplace transform of the compound process and its moments are presented, which have the potential to be applicable to a variety of problems in credit, insurance, market and other operational risks. As an application of this process, we provide insurance premium calculations based on its moments. Numerical examples show that this compound process can be used for the modelling of aggregate losses from cyber events. We also provide the simulation algorithm for statistical analysis, further business applications and research.
 [68] arXiv:2007.04759 (crosslist from cs.LG) [pdf, other]

Title: Expressivity of Deep Neural NetworksComments: This review paper will appear as a book chapter in the book "Theory of Deep Learning" by Cambridge University PressSubjects: Machine Learning (cs.LG); Functional Analysis (math.FA); Machine Learning (stat.ML)
In this review paper, we give a comprehensive overview of the large variety of approximation results for neural networks. Approximation rates for classical function spaces as well as benefits of deep neural networks over shallow ones for specifically structured function classes are discussed. While the mainbody of existing results is for general feedforward architectures, we also depict approximation results for convolutional, residual and recurrent neural networks.
 [69] arXiv:2007.04777 (crosslist from eess.IV) [pdf, other]

Title: Selfsupervised edge features for improved Graph Neural Network trainingComments: Comments welcome. arXiv admin note: substantial text overlap with arXiv:2006.12971Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Genomics (qbio.GN); Machine Learning (stat.ML)
Graph Neural Networks (GNN) have been extensively used to extract meaningful representations from graph structured data and to perform predictive tasks such as node classification and link prediction. In recent years, there has been a lot of work incorporating edge features along with node features for prediction tasks. One of the main difficulties in using edge features is that they are often handcrafted, hard to get, specific to a particular domain, and may contain redundant information. In this work, we present a framework for creating new edge features, applicable to any domain, via a combination of selfsupervised and unsupervised learning. In addition to this, we use FormanRicci curvature as an additional edge feature to encapsulate the local geometry of the graph. We then encode our edge features via a Set Transformer and combine them with node features extracted from popular GNN architectures for node classification in an endtoend training scheme. We validate our work on three biological datasets comprising of singlecell RNA sequencing data of neurological disease, \textit{in vitro} SARSCoV2 infection, and human COVID19 patients. We demonstrate that our method achieves better performance on node classification tasks over baseline Graph Attention Network (GAT) and Graph Convolutional Network (GCN) models. Furthermore, given the attention mechanism on edge and node features, we are able to interpret the cell types and genes that determine the course and severity of COVID19, contributing to a growing list of potential disease biomarkers and therapeutic targets.
 [70] arXiv:2007.04785 (crosslist from cs.LG) [pdf, other]

Title: Neural Architecture Search with GBDTComments: Code is available at this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Neural architecture search (NAS) with an accuracy predictor that predicts the accuracy of candidate architectures has drawn increasing interests due to its simplicity and effectiveness. Previous works employ neural network based predictors which unfortunately cannot well exploit the tabular data representations of network architectures. As decision treebased models can better handle tabular data, in this paper, we propose to leverage gradient boosting decision tree (GBDT) as the predictor for NAS and demonstrate that it can improve the prediction accuracy and help to find better architectures than neural network based predictors. Moreover, considering that a better and compact search space can ease the search process, we propose to prune the search space gradually according to important features derived from GBDT using an interpreting tool named SHAP. In this way, NAS can be performed by first pruning the search space (using GBDT as a pruner) and then searching a neural architecture (using GBDT as a predictor), which is more efficient and effective. Experiments on NASBench101 and ImageNet demonstrate the effectiveness of GBDT for NAS: (1) NAS with GBDT predictor finds top10 architecture (among all the architectures in the search space) with $0.18\%$ test regret on NASBench101, and achieves $24.2\%$ top1 error rate on ImageNet; and (2) GBDT based search space pruning and neural architecture search further achieves $23.5\%$ top1 error rate on ImageNet.
 [71] arXiv:2007.04790 (crosslist from cs.LG) [pdf, other]

Title: MOPaDGAN: Generating Diverse Designs with Multivariate Performance EnhancementComments: arXiv admin note: text overlap with arXiv:2002.11304Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Deep generative models have proven useful for automatic design synthesis and design space exploration. However, they face three challenges when applied to engineering design: 1) generated designs lack diversity, 2) it is difficult to explicitly improve all the performance measures of generated designs, and 3) existing models generally do not generate highperformance novel designs, outside the domain of the training data. To address these challenges, we propose MOPaDGAN, which contains a new Determinantal Point Processes based loss function for probabilistic modeling of diversity and performances. Through a realworld airfoil design example, we demonstrate that MOPaDGAN expands the existing boundary of the design space towards highperformance regions and generates new designs with high diversity and performances exceeding training data.
 [72] arXiv:2007.04793 (crosslist from cs.CV) [pdf, other]

Title: Statistical shape analysis of brain arterial networks (BAN)Comments: arXiv admin note: substantial text overlap with arXiv:2003.00287Subjects: Computer Vision and Pattern Recognition (cs.CV); Differential Geometry (math.DG); Applications (stat.AP)
Structures of brain arterial networks (BANs)  that are complex arrangements of individual arteries, their branching patterns, and interconnectivities  play an important role in characterizing and understanding brain physiology. One would like tools for statistically analyzing the shapes of BANs, i.e. quantify shape differences, compare population of subjects, and study the effects of covariates on these shapes. This paper mathematically represents and statistically analyzes BAN shapes as elastic shape graphs. Each elastic shape graph is made up of nodes that are connected by a number of 3D curves, and edges, with arbitrary shapes. We develop a mathematical representation, a Riemannian metric and other geometrical tools, such as computations of geodesics, means and covariances, and PCA for analyzing elastic graphs and BANs. This analysis is applied to BANs after separating them into four components  top, bottom, left, and right. This framework is then used to generate shape summaries of BANs from 92 subjects, and to study the effects of age and gender on shapes of BAN components. We conclude that while gender effects require further investigation, the age has a clear, quantifiable effect on BAN shapes. Specifically, we find an increased variance in BAN shapes as age increases.
 [73] arXiv:2007.04800 (crosslist from cs.LG) [pdf, other]

Title: When Humans and Machines Make Joint Decisions: A NonSymmetric Bandit ModelSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
How can humans and machines learn to make joint decisions? This has become an important question in domains such as medicine, law and finance. We approach the question from a theoretical perspective and formalize our intuitions about humanmachine decision making in a nonsymmetric bandit model. In doing so, we follow the example of a doctor who is assisted by a computer program. We show that in our model, exploration is generally hard. In particular, unless one is willing to make assumptions about how human and machine interact, the machine cannot explore efficiently. We highlight one such assumption, policy space independence, which resolves the coordination problem and allows both players to explore independently. Our results shed light on the fundamental difficulties faced by the interaction of humans and machines. We also discuss practical implications for the design of algorithmic decision systems.
 [74] arXiv:2007.04806 (crosslist from cs.LG) [pdf, other]

Title: Client Adaptation improves Federated Learning with Simulated NonIID ClientsComments: 11 pages, 11 figures. To appear at International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2020Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We present a federated learning approach for learning a client adaptable, robust model when data is nonidentically and nonindependently distributed (nonIID) across clients. By simulating heterogeneous clients, we show that adding learned clientspecific conditioning improves model performance, and the approach is shown to work on balanced and imbalanced data set from both audio and image domains. The client adaptation is implemented by a conditional gated activation unit and is particularly beneficial when there are large differences between the data distribution for each client, a common scenario in federated learning.
 [75] arXiv:2007.04825 (crosslist from cs.LG) [pdf, other]

Title: Fast Transformers with Clustered AttentionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences. To address this, we propose clustered attention, which instead of computing the attention for every query, groups queries into clusters and computes attention just for the centroids. To further improve this approximation, we use the computed clusters to identify the keys with the highest attention per query and compute the exact key/query dot products. This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters. We evaluate our approach on two automatic speech recognition datasets and show that our model consistently outperforms vanilla transformers for a given computational budget. Finally, we demonstrate that our model can approximate arbitrarily complex attention distributions with a minimal number of clusters by approximating a pretrained BERT model on GLUE and SQuAD benchmarks with only 25 clusters and no loss in performance.
 [76] arXiv:2007.04838 (crosslist from cs.LG) [pdf, other]

Title: Improving the Robustness of Trading Strategy Backtesting with Boltzmann Machines and Generative Adversarial NetworksComments: 72 pages, 30 figuresSubjects: Machine Learning (cs.LG); Portfolio Management (qfin.PM); Statistical Finance (qfin.ST); Machine Learning (stat.ML)
This article explores the use of machine learning models to build a market generator. The underlying idea is to simulate artificial multidimensional financial time series, whose statistical properties are the same as those observed in the financial markets. In particular, these synthetic data must preserve the probability distribution of asset returns, the stochastic dependence between the different assets and the autocorrelation across time. The article proposes then a new approach for estimating the probability distribution of backtest statistics. The final objective is to develop a framework for improving the risk management of quantitative investment strategies, in particular in the space of smart beta, factor investing and alternative risk premia.
 [77] arXiv:2007.04849 (crosslist from quantph) [pdf, ps, other]

Title: Physicsinspired forms of the Bayesian CramérRao boundAuthors: Mankei TsangComments: 4 pagesSubjects: Quantum Physics (quantph); Statistics Theory (math.ST)
Using the language of differential geometry, I derive a form of the Bayesian Cram\'erRao bound that remains invariant under reparametrization. By assuming that the prior probability density is the square of a wavefunction, I also express the bound in terms of functionals that are quadratic with respect to the wavefunction and its gradient. The problem of finding an unfavorable prior to tighten the bound for minimax estimation is shown, in a special case, to be equivalent to finding the groundstate energy with the Schr\"odinger equation, with the Fisher information playing the role of the potential.
 [78] arXiv:2007.04871 (crosslist from cs.LG) [pdf, other]

Title: SubjectAware Contrastive Learning for BiosignalsSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100). To handle these challenges, we propose a selfsupervised approach based on contrastive learning to model biosignals with a reduced reliance on labeled data and with fewer subjects. In this regime of limited labels and subjects, intersubject variability negatively impacts model performance. Thus, we introduce subjectaware learning through (1) a subjectspecific contrastive loss, and (2) an adversarial training to promote subjectinvariance during the selfsupervised learning. We also develop a number of timeseries data augmentation techniques to be used with the contrastive loss for biosignals. Our method is evaluated on publicly available datasets of two different biosignals with different tasks: EEG decoding and ECG anomaly detection. The embeddings learned using selfsupervision yield competitive classification results compared to entirely supervised methods. We show that subjectinvariance improves representation quality for these tasks, and observe that subjectspecific loss increases performance when finetuning with supervised labels.
 [79] arXiv:2007.04873 (crosslist from cs.LG) [pdf, other]

Title: Invertible ZeroShot Recognition FlowsComments: ECCV2020Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Deep generative models have been successfully applied to ZeroShot Learning (ZSL) recently. However, the underlying drawbacks of GANs and VAEs (e.g., the hardness of training with ZSLoriented regularizers and the limited generation quality) hinder the existing generative ZSL models from fully bypassing the seenunseen bias. To tackle the above limitations, for the first time, this work incorporates a new family of generative models (i.e., flowbased models) into ZSL. The proposed Invertible Zeroshot Flow (IZF) learns factorized data embeddings (i.e., the semantic factors and the nonsemantic ones) with the forward pass of an invertible flow network, while the reverse pass generates data samples. This procedure theoretically extends conventional generative flows to a factorized conditional scheme. To explicitly solve the bias problem, our model enlarges the seenunseen distributional discrepancy based on negative samplebased distance measurement. Notably, IZF works flexibly with either a naive Bayesian classifier or a heldout trainable one for zeroshot recognition. Experiments on widelyadopted ZSL benchmarks demonstrate the significant performance gain of IZF over existing methods, in both classic and generalized settings.
 [80] arXiv:2007.04876 (crosslist from cs.LG) [pdf, ps, other]

Title: Multinomial Logit Bandit with Low Switching CostComments: Accepted for presentation at the International Conference on Machine Learning (ICML) 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more finegrained item switching cost. We present an anytime algorithm (ATDUCB) with $O(N \log T)$ assortment switches, almost matching the lower bound $\Omega(\frac{N \log T}{ \log \log T})$. In the fixedhorizon setting, our algorithm FHDUCB incurs $O(N \log \log T)$ assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost $O(N \log^2 T)$.
 [81] arXiv:2007.04897 (crosslist from qbio.QM) [pdf, other]

Title: Guiding Deep Molecular Optimization with Genetic ExplorationSubjects: Quantitative Methods (qbio.QM); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
De novo molecular design attempts to search over the chemical space for molecules with the desired property. Recently, deep learning has gained considerable attention as a promising approach to solve the problem. In this paper, we propose genetic expertguided learning (GEGL), a simple yet novel framework for training a deep neural network (DNN) to generate highlyrewarding molecules. Our main idea is to design a "genetic expert improvement" procedure, which generates highquality targets for imitation learning of the DNN. Extensive experiments show that GEGL significantly improves over stateoftheart methods. For example, GEGL manages to solve the penalized octanolwater partition coefficient optimization with a score of 31.82, while the bestknown score in the literature is 26.1. Besides, for the GuacaMol benchmark with 20 tasks, our method achieves the highest score for 19 tasks, in comparison with stateoftheart methods, and newly obtains the perfect score for three tasks.
 [82] arXiv:2007.04911 (crosslist from cs.LG) [pdf, other]

Title: GAMA: a General Automated Machine learning AssistantSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The General Automated Machine learning Assistant (GAMA) is a modular AutoML system developed to empower users to track and control how AutoML algorithms search for optimal machine learning pipelines, and facilitate AutoML research itself. In contrast to current, often blackbox systems, GAMA allows users to plug in different AutoML and postprocessing techniques, logs and visualizes the search process, and supports easy benchmarking. It currently features three AutoML search algorithms, two model postprocessing steps, and is designed to allow for more components to be added.
 [83] arXiv:2007.04915 (crosslist from cs.LG) [pdf, other]

Title: Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit ProblemsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semibandits, cascading bandits, and lowrank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problemspecific stateoftheart baselines.
 [84] arXiv:2007.04921 (crosslist from qbio.QM) [pdf, other]

Title: Graph Neural Network Based CoarseGrained Mapping PredictionAuthors: Zhiheng Li, Geemi P. Wellawatte, Maghesree Chakraborty, Heta A. Gandhi, Chenliang Xu, Andrew D. WhiteSubjects: Quantitative Methods (qbio.QM); Machine Learning (cs.LG); Machine Learning (stat.ML)
The selection of coarsegrained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation. It is still an open question about what is optimal for this choice and there is a need for theory. The current stateofthe art method is mapping operators manually selected by experts. In this work, we demonstrate an automated approach by viewing this problem as supervised learning where we seek to reproduce the mapping operators produced by experts. We present a graph neural network based CG mapping predictor called DEEP SUPERVISED GRAPH PARTITIONING MODEL(DSGPM) that treats mapping operators as a graph segmentation problem. DSGPM is trained on a novel dataset, Humanannotated Mappings (HAM), consisting of 1,206 molecules with expert annotated mapping operators. HAM can be used to facilitate further research in this area. Our model uses a novel metric learning objective to produce highquality atomic features that are used in spectral clustering. The results show that the DSGPM outperforms stateoftheart methods in the field of graph segmentation.
 [85] arXiv:2007.04929 (crosslist from cs.LG) [pdf, other]

Title: Learning Graph Structure With A FiniteState Automaton LayerComments: Submitted to NeurIPS 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Graphbased neural network models are producing strong results in a number of domains, in part because graphs provide flexibility to encode domain knowledge in the form of relational structure (edges) between nodes in the graph. In practice, edges are used both to represent intrinsic structure (e.g., abstract syntax trees of programs) and more abstract relations that aid reasoning for a downstream task (e.g., results of relevant program analyses). In this work, we study the problem of learning to derive abstract relations from the intrinsic graph structure. Motivated by their power in program analyses, we consider relations defined by paths on the base graph accepted by a finitestate automaton. We show how to learn these relations endtoend by relaxing the problem into learning finitestate automata policies on a graphbased POMDP and then training these policies using implicit differentiation. The result is a differentiable Graph FiniteState Automaton (GFSA) layer that adds a new edge type (expressed as a weighted adjacency matrix) to a base graph. We demonstrate that this layer can find shortcuts in gridworld graphs and reproduce simple static analyses on Python programs. Additionally, we combine the GFSA layer with a larger graphbased model trained endtoend on the variable misuse program understanding task, and find that using the GFSA layer leads to better performance than using handengineered semantic edges or other baseline methods for adding learned edge types.
 [86] arXiv:2007.04938 (crosslist from cs.LG) [pdf, other]

Title: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Modelfree deep reinforcement learning (RL) has been successful in a range of challenging domains. However, there are some remaining issues, such as stabilizing the optimization of nonlinear function approximators, preventing error propagation due to the Bellman backup in Qlearning, and efficient exploration. To mitigate these issues, we present SUNRISE, a simple unified ensemble method, which is compatible with various offpolicy RL algorithms. SUNRISE integrates three key ingredients: (a) bootstrap with random initialization which improves the stability of the learning process by training a diverse ensemble of agents, (b) weighted Bellman backups, which prevent error propagation in Qlearning by reweighing sample transitions based on uncertainty estimates from the ensembles, and (c) an inference method that selects actions using highest upperconfidence bounds for efficient exploration. Our experiments show that SUNRISE significantly improves the performance of existing offpolicy RL algorithms, such as Soft ActorCritic and Rainbow DQN, for both continuous and discrete control tasks on both lowdimensional and highdimensional environments. Our training code is available at https://github.com/pokaxpoka/sunrise.
 [87] arXiv:2007.04965 (crosslist from cs.LG) [pdf, other]

Title: A Study on Encodings for Neural Architecture SearchSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Neural architecture search (NAS) has been extensively studied in the past few years. A popular approach is to represent each neural architecture in the search space as a directed acyclic graph (DAG), and then search over all DAGs by encoding the adjacency matrix and list of operations as a set of hyperparameters. Recent work has demonstrated that even small changes to the way each architecture is encoded can have a significant effect on the performance of NAS algorithms.
In this work, we present the first formal study on the effect of architecture encodings for NAS, including a theoretical grounding and an empirical study. First we formally define architecture encodings and give a theoretical characterization on the scalability of the encodings we study Then we identify the main encodingdependent subroutines which NAS algorithms employ, running experiments to show which encodings work best with each subroutine for many popular algorithms. The experiments act as an ablation study for prior work, disentangling the algorithmic and encodingbased contributions, as well as a guideline for future work. Our results demonstrate that NAS encodings are an important design decision which can have a significant impact on overall performance. Our code is available at https://github.com/naszilla/nasencodings.  [88] arXiv:2007.04972 (crosslist from cs.LG) [pdf, other]

Title: Prostate motion modelling using biomechanicallytrained deep neural networks on unstructured nodesAuthors: Shaheer U. Saeed, Zeike A. Taylor, Mark A. Pinnock, Mark Emberton, Dean C. Barratt, Yipeng HuComments: Accepted to MICCAI 2020Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
In this paper, we propose to train deep neural networks with biomechanical simulations, to predict the prostate motion encountered during ultrasoundguided interventions. In this application, unstructured points are sampled from segmented preoperative MR images to represent the anatomical regions of interest. The point sets are then assigned with pointspecific material properties and displacement loads, forming the unordered input feature vectors. An adapted PointNet can be trained to predict the nodal displacements, using finite element (FE) simulations as groundtruth data. Furthermore, a versatile bootstrap aggregating mechanism is validated to accommodate the variable number of feature vectors due to different patient geometries, comprised of a trainingtime bootstrap sampling and a model averaging inference. This results in a fast and accurate approximation to the FE solutions without requiring subjectspecific solid meshing. Based on 160,000 nonlinear FE simulations on clinical imaging data from 320 patients, we demonstrate that the trained networks generalise to unstructured point sets sampled directly from holdout patient segmentation, yielding a near realtime inference and an expected error of 0.017 mm in predicted nodal displacement.
 [89] arXiv:2007.04973 (crosslist from cs.LG) [pdf, other]

Title: Contrastive Code Representation LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Programming Languages (cs.PL); Software Engineering (cs.SE); Machine Learning (stat.ML)
Machineaided programming tools such as type predictors and code summarizers are increasingly learningbased. However, most code representation learning approaches rely on supervised learning with taskspecific annotated datasets. We propose Contrastive Code Representation Learning (ContraCode), a selfsupervised algorithm for learning taskagnostic semantic representations of programs via contrastive learning. Our approach uses no humanprovided labels, relying only on the raw text of programs. In particular, we design an unsupervised pretext task by generating textually divergent copies of source functions via automated sourcetosource compiler transforms that preserve semantics. We train a neural model to identify variants of an anchor program within a large batch of negatives. To solve this task, the network must extract program features representing the functionality, not form, of the program. This is the first application of instance discrimination to code representation learning to our knowledge. We pretrain models over 1.8m unannotated JavaScript methods mined from GitHub. ContraCode pretraining improves code summarization accuracy by 7.9% over supervised approaches and 4.8% over RoBERTa pretraining. Moreover, our approach is agnostic to model architecture; for a type inference task, contrastive pretraining consistently improves the accuracy of existing baselines.
 [90] arXiv:2007.04976 (crosslist from cs.LG) [pdf, other]

Title: One Policy to Control Them All: Shared Modular Policies for AgentAgnostic ControlComments: Accepted at ICML 2020. Videos and code at this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Reinforcement learning is typically concerned with learning control policies tailored to a particular agent. We investigate whether there exists a single global policy that can generalize to control a wide variety of agent morphologies  ones in which even dimensionality of state and action spaces changes. We propose to express this global policy as a collection of identical modular neural networks, dubbed as Shared Modular Policies (SMP), that correspond to each of the agent's actuators. Every module is only responsible for controlling its corresponding actuator and receives information from only its local sensors. In addition, messages are passed between modules, propagating information between distant modules. We show that a single modular policy can successfully generate locomotion behaviors for several planar agents with different skeletal structures such as monopod hoppers, quadrupeds, bipeds, and generalize to variants not seen during training  a process that would normally require training and manual hyperparameter tuning for each morphology. We observe that a wide variety of drastically diverse locomotion styles across morphologies as well as centralized coordination emerges via message passing between decentralized modules purely from the reinforcement learning objective. Videos and code at https://huangwl18.github.io/modularrl/
Replacements for Fri, 10 Jul 20
 [91] arXiv:1510.02753 (replaced) [pdf, ps, other]

Title: Organic direct and indirect effects with posttreatment common causes of mediator and outcomeAuthors: Judith J LokComments: 9 pagesSubjects: Methodology (stat.ME)
 [92] arXiv:1708.02166 (replaced) [pdf, other]

Title: Nonlinear spectral analysis: A local Gaussian approachComments: Version 4: Major revision from version 3, with new theory/figures. 135 pages (main part 32 + appendices 103), 11 + 16 figuresSubjects: Methodology (stat.ME)
 [93] arXiv:1710.03863 (replaced) [pdf, ps, other]

Title: On Estimation of $L_{r}$Norms in Gaussian White Noise ModelsComments: To appear in Probability Theory and Related FieldsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG)
 [94] arXiv:1711.04145 (replaced) [pdf, other]

Title: Minimax estimation in linear models with unknown design over finite alphabetsSubjects: Statistics Theory (math.ST)
 [95] arXiv:1803.06675 (replaced) [pdf, other]

Title: Rare Feature Selection in High DimensionsComments: 42 pages, 10 figuresSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO); Machine Learning (stat.ML)
 [96] arXiv:1805.08304 (replaced) [pdf, other]

Title: Anchored Bayesian Gaussian Mixture ModelsComments: 65 pages, 11 figures, 11 tablesSubjects: Methodology (stat.ME)
 [97] arXiv:1806.10120 (replaced) [pdf, other]

Title: Maximum Likelihood Estimation for Totally Positive LogConcave DensitiesSubjects: Statistics Theory (math.ST); Combinatorics (math.CO)
 [98] arXiv:1807.02161 (replaced) [pdf, ps, other]

Title: Minimizing Sensitivity to Model MisspecificationSubjects: Econometrics (econ.EM); Methodology (stat.ME)
 [99] arXiv:1811.00724 (replaced) [pdf, other]

Title: Bayesian Hierarchical Modeling on Covariance Valued DataComments: Some key references are missing in the old version which are corrected in this versionSubjects: Applications (stat.AP)
 [100] arXiv:1901.03904 (replaced) [pdf]

Title: A Speech Act Classifier for Persian Texts and its Application in Identify Speech Act of RumorsComments: Published Link: this http URLJournalref: Journal of Soft Computing and Information Technology, 9, 1, 1399 (2020), 1827Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [101] arXiv:1903.12077 (replaced) [pdf, ps, other]

Title: Time series models for realized covariance matrices based on the matrixF distributionSubjects: Statistics Theory (math.ST); Econometrics (econ.EM); Methodology (stat.ME)
 [102] arXiv:1904.07150 (replaced) [pdf, other]

Title: Variational Bayes for highdimensional linear regression with sparse priorsComments: 42 pages. We have added oracle contraction rates, removed the mutual coherence assumption, significantly expanded the simulations and generally improved the presentationSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [103] arXiv:1905.11497 (replaced) [pdf, other]

Title: Estimating Average Treatment Effects Utilizing Fractional Imputation when Confounders are Subject to MissingnessSubjects: Methodology (stat.ME); Other Statistics (stat.OT)
 [104] arXiv:1909.00721 (replaced) [pdf, other]

Title: Greedy clustering of count data through a mixture of multinomial PCAAuthors: Nicolas Jouvin (1 and 2), Pierre Latouche (2), Charles Bouveyron (3), Guillaume Bataillon (4), Alain Livartowski (4) ((1) Laboratoire SAMM EA 4543, (2) Laboratoire MAP5 UMR 8145, (3) Laboratoire J.A. Dieudonné UMR 7351 (4) Institut Curie)Comments: 34 pages, 11 figures, published in : Computational StatisticsSubjects: Methodology (stat.ME)
 [105] arXiv:1909.02553 (replaced) [pdf, other]

Title: Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret RegimesSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
 [106] arXiv:1909.02707 (replaced) [pdf, other]

Title: Restricted Minimum Error Entropy Criterion for Robust ClassificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [107] arXiv:1909.07543 (replaced) [pdf, other]

Title: AttractionRepulsion ActorCritic for Continuous Control Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
 [108] arXiv:1910.00760 (replaced) [pdf, other]

Title: Efficient Graph Generation with Graph Recurrent Attention NetworksAuthors: Renjie Liao, Yujia Li, Yang Song, Shenlong Wang, Charlie Nash, William L. Hamilton, David Duvenaud, Raquel Urtasun, Richard S. ZemelComments: Neural Information Processing Systems (NeurIPS) 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [109] arXiv:1910.00780 (replaced) [pdf, other]

Title: How Does Topology of Neural Architectures Impact Gradient Propagation and Model Performance?Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [110] arXiv:1910.01741 (replaced) [pdf, other]

Title: Improving Sample Efficiency in ModelFree Reinforcement Learning from ImagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
 [111] arXiv:1910.03103 (replaced) [pdf, other]

Title: EnergyAware Neural Architecture Optimization with Fast Splitting Steepest DescentSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [112] arXiv:1910.04817 (replaced) [pdf, other]

Title: Estimation of Bounds on Potential Outcomes For Decision MakingJournalref: ICML 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [113] arXiv:1910.09466 (replaced) [pdf, ps, other]

Title: Sparsification as a Remedy for Staleness in Distributed Asynchronous SGDSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [114] arXiv:1911.13211 (replaced) [pdf, other]

Title: Embedding and learning with signaturesAuthors: Adeline FermanianSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [115] arXiv:2002.04017 (replaced) [pdf, ps, other]
 [116] arXiv:2002.10099 (replaced) [pdf, other]

Title: Implicit Geometric Regularization for Learning ShapesComments: 37th International Conference on Machine Learning, Vienna, Austria, 2020Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (stat.ML)
 [117] arXiv:2003.02570 (replaced) [pdf, other]

Title: Train by Reconnect: Decoupling Locations of Weights from their ValuesComments: 15 pages, 15 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [118] arXiv:2003.05926 (replaced) [pdf, other]

Title: Learning distributed representations of graphs with Geo2DRComments: 9 Pages, Revised version accepted at ICML 2020 GRL+ WorkshopSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [119] arXiv:2003.13461 (replaced) [pdf, other]

Title: Adaptive Personalized Federated LearningComments: [v2] A new generalization analysis is provided. Also, additional experiments are addedSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
 [120] arXiv:2004.03658 (replaced) [pdf, other]

Title: Faithful Embeddings for Knowledge Base QueriesSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [121] arXiv:2004.06630 (replaced) [pdf]

Title: A logicbased resampling with matching approach to multiple imputation of missing dataSubjects: Methodology (stat.ME)
 [122] arXiv:2004.13962 (replaced) [pdf, other]

Title: Energy Balancing of Covariate DistributionsSubjects: Methodology (stat.ME)
 [123] arXiv:2005.00527 (replaced) [pdf, ps, other]

Title: Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [124] arXiv:2005.05080 (replaced) [pdf, other]

Title: Continual Learning Using Task Conditional Neural NetworksComments: 10 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [125] arXiv:2005.05587 (replaced) [pdf, ps, other]

Title: Robustness Verification for Classifier EnsemblesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [126] arXiv:2006.00701 (replaced) [pdf, ps, other]

Title: Locally Differentially Private (Contextual) Bandits LearningComments: 19 pages (including appendix)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [127] arXiv:2006.11419 (replaced) [pdf, other]

Title: SetInvariant Constrained Reinforcement Learning with a MetaOptimizerComments: Accepted to ICML 2020 Workshop Theoretical Foundations of RLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [128] arXiv:2006.11695 (replaced) [pdf, other]

Title: Learned UncertaintyAware (LUNA) Bases for Bayesian Regression using MultiHeaded Auxiliary NetworksComments: ICML 2020 Workshop on Uncertainty and Robustness in Deep LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [129] arXiv:2006.13681 (replaced) [pdf, other]

Title: Multiview Dronebased Geolocalization via Style and Spatial AlignmentComments: 9 pages 9 figures. arXiv admin note: text overlap with arXiv:2002.12186 by other authorsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [130] arXiv:2006.15061 (replaced) [pdf, other]

Title: Intrinsic Reward Driven Imitation Learning via Generative ModelSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [131] arXiv:2006.15935 (replaced) [pdf]

Title: Is Japanese gendered language used on Twitter ? A large scale studySubjects: Computation and Language (cs.CL); Applications (stat.AP)
 [132] arXiv:2007.02153 (replaced) [pdf, other]

Title: An Empirical Bayes Approach to Shrinkage Estimation on the Manifold of Symmetric PositiveDefinite MatricesComments: 55 pages, 5 figures, journal submissionSubjects: Statistics Theory (math.ST)
 [133] arXiv:2007.02196 (replaced) [pdf, other]

Title: Deep Active Learning via Open Set RecognitionSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [134] arXiv:2007.02523 (replaced) [pdf, other]

Title: Covariate Distribution Aware MetalearningJournalref: Published in ICML 2020 Lifelong Learning WorkshopSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [135] arXiv:2007.02725 (replaced) [pdf]

Title: The FMRIB Variational Bayesian Inference Tutorial II: Stochastic Variational BayesComments: Example code and exercises associated with this tutorial can be found here: this https URLSubjects: Methodology (stat.ME); Machine Learning (cs.LG)
 [136] arXiv:2007.03114 (replaced) [pdf, other]

Title: Relaxed Conformal Prediction Cascades for Efficient Inference Over Many LabelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [137] arXiv:2007.03167 (replaced) [pdf, other]

Title: Are Ensemble Classifiers Powerful Enough for the Detection and Diagnosis of IntermediateSeverity Faults?Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [138] arXiv:2007.03506 (replaced) [pdf, other]

Title: Hierarchical nucleation in deep neural networksSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [139] arXiv:2007.03533 (replaced) [pdf, other]

Title: A Federated Fscore Based Ensemble Model for Automatic Rule ExtractionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [140] arXiv:2007.03641 (replaced) [pdf, ps, other]

Title: OneBit Compressed Sensing via OneShot Hard ThresholdingAuthors: Jie ShenComments: Accepted to The Conference on Uncertainty in Artificial Intelligence (UAI) 2020Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
 [141] arXiv:2007.03762 (replaced) [pdf, other]

Title: Transfer Learning for Electricity Price ForecastingSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [142] arXiv:2007.04002 (replaced) [pdf, other]

Title: Unbiased Liftbased Bidding SystemSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
 [143] arXiv:2007.04275 (replaced) [pdf, other]

Title: Graph Neural Networks for the Prediction of SubstrateSpecific Organic Reaction ConditionsAuthors: Serim Ryou, Michael R. Maser, Alexander Y. Cui, Travis J. DeLano, Yisong Yue, Sarah E. ReismanComments: 23 pages, 10 tables, 13 figures, to appear in the ICML 2020 Workshop on Graph Representation Learning and Beyond (GRLB)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2007, contact, help (Access key information)