Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Wed, 16 Jun 21
 [1] arXiv:2106.07695 [pdf, other]

Title: Adaptive normalization for IPW estimationComments: 32 pages, 7 figuresSubjects: Methodology (stat.ME)
Inverse probability weighting (IPW) is a general tool in survey sampling and causal inference, used both in HorvitzThompson estimators, which normalize by the sample size, and Haj\'ek/selfnormalized estimators, which normalize by the sum of the inverse probability weights. In this work we study a family of IPW estimators, first proposed by Trotter and Tukey in the context of Monte Carlo problems, that are normalized by an affine combination of these two terms. We show how selecting an estimator from this family in a datadependent way to minimize asymptotic variance leads to an iterative procedure that converges to an estimator with connections to regression control methods. We refer to this estimator as an adaptively normalized estimator. For mean estimation in survey sampling, this estimator has asymptotic variance that is never worse than the HorvitzThompson or Haj\'ek estimators, and is smaller except in edge cases. Going further, we show that adaptive normalization can be used to propose improvements of the augmented IPW (AIPW) estimator, average treatment effect (ATE) estimators, and policy learning objectives. Appealingly, these proposals preserve both the asymptotic efficiency of AIPW and the regret bounds for policy learning with IPW objectives, and deliver consistent finite sample improvements in simulations for all three of mean estimation, ATE estimation, and policy learning.
 [2] arXiv:2106.07717 [pdf, ps, other]

Title: Robust Inference for HighDimensional Linear Models via Residual RandomizationJournalref: International Conference on Machine Learning 2021Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
We propose a residual randomization procedure designed for robust Lassobased inference in the highdimensional setting. Compared to earlier work that focuses on subGaussian errors, the proposed procedure is designed to work robustly in settings that also include heavytailed covariates and errors. Moreover, our procedure can be valid under clustered errors, which is important in practice, but has been largely overlooked by earlier work. Through extensive simulations, we illustrate our method's wider range of applicability as suggested by theory. In particular, we show that our method outperforms stateofart methods in challenging, yet more realistic, settings where the distribution of covariates is heavytailed or the sample size is small, while it remains competitive in standard, ``well behaved" settings previously studied in the literature.
 [3] arXiv:2106.07725 [pdf, other]

Title: Generalized kernel distance covariance in high dimensions: nonnull CLTs and power universalitySubjects: Statistics Theory (math.ST)
Distance covariance is a popular dependence measure for two random vectors $X$ and $Y$ of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a highdimensional setting, with an exclusive emphasis on the null case that $X$ and $Y$ are independent. This paper derives the first nonnull central limit theorem for the sample distance covariance, and the more general sample (HilbertSchmidt) kernel distance covariance in high dimensions, primarily in the Gaussian case. The new nonnull central limit theorem yields an asymptotically exact firstorder power formula for the widely used generalized kernel distance correlation test of independence between $X$ and $Y$. The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by $n\cdot \text{dcor}^2(X,Y)/\sqrt{2}$ in the high dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the nonnull central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the nonnull Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimensiontosample ratio and the dependence between $X$ and $Y$.
 [4] arXiv:2106.07761 [pdf, other]

Title: LinearTime Probabilistic Solutions of Boundary Value ProblemsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)
We propose a fast algorithm for the probabilistic solution of boundary value problems (BVPs), which are ordinary differential equations subject to boundary conditions. In contrast to previous work, we introduce a GaussMarkov prior and tailor it specifically to BVPs, which allows computing a posterior distribution over the solution in linear time, at a quality and cost comparable to that of wellestablished, nonprobabilistic methods. Our model further delivers uncertainty quantification, mesh refinement, and hyperparameter adaptation. We demonstrate how these practical considerations positively impact the efficiency of the scheme. Altogether, this results in a practically usable probabilistic BVP solver that is (in contrast to nonprobabilistic algorithms) natively compatible with other parts of the statistical modelling toolchain.
 [5] arXiv:2106.07797 [pdf, other]

Title: Embracing Uncertainty in "Small Data" Problems: Estimating Earthquakes from Historical AnecdotesAuthors: Justin A. Krometis, Hayden Ringer, Jared P. Whitehead, Nathan E. GlattHoltz, Ronald A. HarrisSubjects: Applications (stat.AP); Geophysics (physics.geoph)
We apply the Bayesian inversion process to make principled estimates of the magnitude and location of a preinstrumental earthquake in Eastern Indonesia in the mid 19th century, by combining anecdotal historical accounts of the resultant tsunami with our modern understanding of the geology of the region. Quantifying the seismic record prior to modern instrumentation is critical to a more thorough understanding of the current risks in Eastern Indonesia. In particular, the occurrence of such a major earthquake in the 1850s provides evidence that this region is susceptible to future seismic hazards on the same order of magnitude. More importantly, the approach taken here gives evidence that even "small data" that is limited in scope and extremely uncertain can still be used to yield information on past seismic events, which is key to an increased understanding of the current seismic state. Moreover, sensitivity bounds indicate that the results obtained here are robust despite the inherent uncertainty in the observations.
 [6] arXiv:2106.07816 [pdf, other]

Title: TreeValues: selective inference for regression treesSubjects: Methodology (stat.ME); Machine Learning (stat.ML)
We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.
 [7] arXiv:2106.07834 [pdf, other]

Title: A Nonergodic Effective Amplitude GroundMotion Model for CaliforniaComments: 34 pages, 18 figuresSubjects: Applications (stat.AP)
A new nonergodic groundmotion model (GMM) for effective amplitude spectral ($EAS$) values for California is presented in this study. $EAS$, which is defined in Goulet et al. (2018), is a smoothed rotationindependent Fourier amplitude spectrum of the two horizontal components of an acceleration time history. The main motivation for developing a nonergodic $EAS$ GMM, rather than a spectral acceleration GMM, is that the scaling of $EAS$ does not depend on spectral shape, and therefore, the more frequent small magnitude events can be used in the estimation of the nonergodic terms.
The model is developed using the California subset of the NGAWest2 dataset Ancheta et al. (2013). The Bayless and Abrahamson (2019b) (BA18) ergodic $EAS$ GMM was used as backbone to constrain the average source, path, and site scaling. The nonergodic GMM is formulated as a Bayesian hierarchical model: the nonergodic source and site terms are modeled as spatially varying coefficients following the approach of Landwehr et al. (2016), and the nonergodic path effects are captured by the cellspecific anelastic attenuation attenuation following the approach of Dawood and RodriguezMarek (2013). Close to stations and past events, the mean values of the nonergodic terms deviate from zero to capture the systematic effects and their epistemic uncertainty is small. In areas with sparse data, the epistemic uncertainty of the nonergodic terms is large, as the systematic effects cannot be determined.
The nonergodic total aleatory standard deviation is approximately $30$ to $40\%$ smaller than the total aleatory standard deviation of BA18. This reduction in the aleatory variability has a significant impact on hazard calculations at large return periods. The epistemic uncertainty of the ground motion predictions is small in areas close to stations and past events.  [8] arXiv:2106.07875 [pdf, other]

Title: SLIME: StabilizedLIME for Model ExplanationComments: In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21), August 1418, 2021, Virtual Event, SingaporeSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these blackbox models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose SLIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.
 [9] arXiv:2106.07898 [pdf, other]

Title: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Level, and Frontier IntegralSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their ability to measure the qualitydiversity tradeoff inherent to deep generative modeling. However, the statistical behavior of divergence frontiers estimated from data remains unknown to this day. In this paper, we establish nonasymptotic bounds on the sample complexity of the plugin estimator of divergence frontiers. Along the way, we introduce a novel integral summary of divergence frontiers. We derive the corresponding nonasymptotic bounds and discuss the choice of the quantization level by balancing the two types of approximation errors arisen from its computation. We also augment the divergence frontier framework by investigating the statistical performance of smoothed distribution estimators such as the GoodTuring estimator. We illustrate the theoretical results with numerical examples from natural language processing and computer vision.
 [10] arXiv:2106.08086 [pdf, other]

Title: Decomposition of Global Feature Importance into Direct and Associative Components (DEDACT)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Global modelagnostic feature importance measures either quantify whether features are directly used for a model's predictions (direct importance) or whether they contain predictionrelevant information (associative importance). Direct importance provides causal insight into the model's mechanism, yet it fails to expose the leakage of information from associated but not directly used variables. In contrast, associative importance exposes information leakage but does not provide causal insight into the model's mechanism. We introduce DEDACT  a framework to decompose wellestablished direct and associative importance measures into their respective associative and direct components. DEDACT provides insight into both the sources of predictionrelevant information in the data and the direct and indirect feature pathways by which the information enters the model. We demonstrate the method's usefulness on simulated examples.
 [11] arXiv:2106.08105 [pdf, other]

Title: Employing an Adjusted Stability Measure for MultiCriteria Model Fitting on Data Sets with Similar FeaturesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Fitting models with high predictive accuracy that include all relevant but no irrelevant or redundant features is a challenging task on data sets with similar (e.g. highly correlated) features. We propose the approach of tuning the hyperparameters of a predictive model in a multicriteria fashion with respect to predictive accuracy and feature selection stability. We evaluate this approach based on both simulated and real data sets and we compare it to the standard approach of singlecriteria tuning of the hyperparameters as well as to the stateoftheart technique "stability selection". We conclude that our approach achieves the same or better predictive performance compared to the two established approaches. Considering the stability during tuning does not decrease the predictive accuracy of the resulting models. Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features. The singlecriteria approach fails at avoiding irrelevant or redundant features and the stability selection approach fails at selecting enough relevant features for achieving acceptable predictive accuracy. For our approach, for data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure, that is, a measure that considers similarities between features. For data sets with only few similar features, an unadjusted stability measure suffices and is faster to compute.
 [12] arXiv:2106.08161 [pdf, other]

Title: Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and FairnessSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (qbio.GN)
Learning meaningful representations of data that can address challenges such as batch effect correction, data integration and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we identify the mathematical principle that unites these challenges: learning a representation that is marginally independent of a condition variable. We therefore propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty to enforce this independence. This penalty is defined in terms of mixtures of the variational posteriors themselves, unlike prior work which uses external discrepancy measures such as MMD to ensure independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, especially when there is complex global structure in latent space. We further demonstrate state of the art performance on a number of realworld problems, including the challenging tasks of aligning human tumour samples with cancer celllines and performing counterfactual inference on singlecell RNA sequencing data. Incidentally, we find parallels with the fair representation learning literature, and demonstrate CoMP has competitive performance in learning fair yet expressive latent representations.
 [13] arXiv:2106.08185 [pdf, other]

Title: Kernel Identification Through TransformersAuthors: Fergus Simpson, Ian Davies, Vidhi Lalchand, Alessandro Vullo, Nicolas Durrande, Carl RasmussenComments: 12 pages, 5 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for highdimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformerbased architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the selfattention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.
 [14] arXiv:2106.08217 [pdf, other]

Title: RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted ForestsComments: 32 pages, 14 figures, 5 tablesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Like many predictive models, random forests provide a point prediction for a new observation. Besides the point prediction, it is important to quantify the uncertainty in the prediction. Prediction intervals provide information about the reliability of the point predictions. We have developed a comprehensive R package, RFpredInterval, that integrates 16 methods to build prediction intervals with random forests and boosted forests. The methods implemented in the package are a new method to build prediction intervals with boosted forests (PIBF) and 15 different variants to produce prediction intervals with random forests proposed by Roy and Larocque (2020). We perform an extensive simulation study and apply real data analyses to compare the performance of the proposed method to ten existing methods to build prediction intervals with random forests. The results show that the proposed method is very competitive and, globally, it outperforms the competing methods.
 [15] arXiv:2106.08247 [pdf, ps, other]

Title: CanonicalCorrelationBased Fast Feature SelectionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This paper proposes a canonicalcorrelationbased filter method for feature selection. The sum of squared canonical correlation coefficients is adopted as the feature ranking criterion. The proposed method boosts the computational speed of the ranking criterion in greedy search. The supporting theorems developed for the feature selection method are fundamental to the understanding of the canonical correlation analysis. In empirical studies, a synthetic dataset is used to demonstrate the speed advantage of the proposed method, and eight real datasets are applied to show the effectiveness of the proposed feature ranking criterion in both classification and regression. The results show that the proposed method is considerably faster than the definitionbased method, and the proposed ranking criterion is competitive compared with the seven mutualinformationbased criteria.
 [16] arXiv:2106.08277 [pdf, other]

Title: A Bayesian adaptive design for dualagent phase III cancer clinical trials combining efficacy data across stagesSubjects: Methodology (stat.ME)
Integrated phase III clinical trial designs are efficient approaches to accelerate drug development. In cases where efficacy cannot be ascertained in a short period of time, twostage approaches are usually employed. When different patient populations are involved across stages, it is worth of discussion about the use of efficacy data collected from both stages. In this paper, we focus on a twostage design that aims to estimate safe dose combinations with a certain level of efficacy. In stage I, conditional escalation with overdose control (EWOC) is used to allocate successive cohorts of patients. The maximum tolerated dose (MTD) curve is estimated based on a Bayesian dosetoxicity model. In stage II, we consider an adaptive allocation of patients to drug combinations that have a high probability of being efficacious along the obtained MTD curve. A robust Bayesian hierarchical model is proposed to allow sharing of information on the efficacy parameters across stages assuming the related parameters are either exchangeable or nonexchangeable. Under the assumption of exchangeability, a randomeffects distribution is specified for the main effects parameters to capture uncertainty about the betweenstage differences. The proposed methodology is assessed with extensive simulations motivated by a real phase III drug combination trial using continuous doses.
 [17] arXiv:2106.08281 [pdf, other]

Title: A Horseshoe Pit mixture model for Bayesian screening with an application to light sheet fluorescence microscopy in brain imagingAuthors: Francesco Denti, Ricardo Azevedo, Chelsie Lo, Damian Wheeler, Sunil P. Gandhi, Michele Guindani, Babak ShahbabaSubjects: Methodology (stat.ME)
Finding parsimonious models through variable selection is a fundamental problem in many areas of statistical inference. Here, we focus on Bayesian regression models, where variable selection can be implemented through a regularizing prior imposed on the distribution of the regression coefficients. In the Bayesian literature, there are two main types of priors used to accomplish this goal: the spikeandslab and the continuous scale mixtures of Gaussians. The former is a discrete mixture of two distributions characterized by low and high variance. In the latter, a continuous prior is elicited on the scale of a zeromean Gaussian distribution. In contrast to these existing methods, we propose a new class of priors based on discrete mixture of continuous scale mixtures providing a more general framework for Bayesian variable selection. To this end, we substitute the observationspecific local shrinkage parameters (typical of continuous mixtures) with mixture component shrinkage parameters. Our approach drastically reduces the number of parameters needed and allows sharing information across the coefficients, improving the shrinkage effect. By using halfCauchy distributions, this approach leads to a clustershrinkage version of the Horseshoe prior. We present the properties of our model and showcase its estimation and prediction performance in a simulation study. We then recast the model in a multiple hypothesis testing framework and apply it to a neurological dataset obtained using a novel wholebrain imaging technique.
 [18] arXiv:2106.08305 [pdf, ps, other]

Title: Markov Equivalence of MaxLinear Bayesian NetworksComments: 19 pages, 5 figures, accepted for the 37th conference on Uncertainty in Artificial Intelligence (UAI 2021)Subjects: Statistics Theory (math.ST); Algebraic Geometry (math.AG); Combinatorics (math.CO)
Maxlinear Bayesian networks have emerged as highly applicable models for causal inference via extreme value data. However, conditional independence (CI) for maxlinear Bayesian networks behaves differently than for classical Gaussian Bayesian networks. We establish the parallel between the two theories via tropicalization, and establish the surprising result that the Markov equivalence classes for maxlinear Bayesian networks coincide with the ones obtained by regular CI. Our paper opens up many problems at the intersection of extreme value statistics, causal inference and tropical geometry.
 [19] arXiv:2106.08320 [pdf, other]

Title: SelfSupervised Learning with Kernel Dependence MaximizationSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We approach selfsupervised learning of image representations from a statistical dependence perspective, proposing SelfSupervised Learning with the HilbertSchmidt Independence Criterion (SSLHSIC). SSLHSIC maximizes dependence between representations of transformed versions of an image and the image identity, while minimizing the kernelized variance of those features. This selfsupervised learning framework yields a new understanding of InfoNCE, a variational lower bound on the mutual information (MI) between different transformations. While the MI itself is known to have pathologies which can result in meaningless representations being learned, its bound is much better behaved: we show that it implicitly approximates SSLHSIC (with a slightly different regularizer). Our approach also gives us insight into BYOL, since SSLHSIC similarly learns local neighborhoods of samples. SSLHSIC allows us to directly optimize statistical dependence in time linear in the batch size, without restrictive data assumptions or indirect mutual information estimators. Trained with or without a target network, SSLHSIC matches the current stateoftheart for standard linear evaluation on ImageNet, semisupervised learning and transfer to other classification and vision tasks such as semantic segmentation, depth estimation and object recognition.
Crosslists for Wed, 16 Jun 21
 [20] arXiv:2106.07644 (crosslist from math.OC) [pdf, other]

Title: A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized GossipAuthors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien TaylorComments: arXiv admin note: substantial text overlap with arXiv:2102.06035Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Probability (math.PR); Machine Learning (stat.ML)
We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.
 [21] arXiv:2106.07682 (crosslist from cs.LG) [pdf, other]

Title: Revisiting Model Stitching to Compare Neural RepresentationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$, we consider a "stitched model'' formed by connecting the bottomlayers of $A$ to the toplayers of $B$, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps underappreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations'', by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. selfsupervised learning), can be stitched to each other without drop in performance. We also give evidence for the intuition that "more is better'' by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be "plugged in'' to weaker models to improve performance. Finally, our experiments reveal a new structural property of SGD which we call "stitching connectivity'', akin to modeconnectivity: typical minima reached by SGD can all be stitched to each other with minimal change in accuracy.
 [22] arXiv:2106.07724 (crosslist from cs.LG) [pdf, other]

Title: An Exponential Improvement on the Memorization Capacity of Deep Threshold NetworksSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$ is the minimum distance between the points. In this work, we improve the dependence on $\delta$ from exponential to almost linear, proving that $\widetilde{\mathcal{O}}(\frac{1}{\delta}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(\frac{d}{\delta}+n)$ weights are sufficient. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
 [23] arXiv:2106.07754 (crosslist from cs.AI) [pdf, other]

Title: Counterfactual Explanations as Interventions in Latent SpaceComments: 34 pages, 4 figures, 4 tablesSubjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)
Explainable Artificial Intelligence (XAI) is a set of techniques that allows the understanding of both technical and nontechnical aspects of Artificial Intelligence (AI) systems. XAI is crucial to help satisfying the increasingly important demand of \emph{trustworthy} Artificial Intelligence, characterized by fundamental characteristics such as respect of human autonomy, prevention of harm, transparency, accountability, etc. Within XAI techniques, counterfactual explanations aim to provide to end users a set of features (and their corresponding values) that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations, and in particular they fall short of considering the causal impact of such actions. In this paper, we present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations capturing by design the underlying causal relations from the data, and at the same time to provide feasible recommendations to reach the proposed profile. Moreover, our methodology has the advantage that it can be set on top of existing counterfactuals generator algorithms, thus minimising the complexity of imposing additional causal constrains. We demonstrate the effectiveness of our approach with a set of different experiments using synthetic and real datasets (including a proprietary dataset of the financial domain).
 [24] arXiv:2106.07767 (crosslist from cs.LG) [pdf, other]

Title: Improving Robustness of Graph Neural Networks with HeterophilyInspired DesignsComments: preprint with appendix; 30 pages, 1 figureSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Recent studies have exposed that many graph neural networks (GNNs) are sensitive to adversarial attacks, and can suffer from performance loss if the graph structure is intentionally perturbed. A different line of research has shown that many GNN architectures implicitly assume that the underlying graph displays homophily, i.e., connected nodes are more likely to have similar features and class labels, and perform poorly if this assumption is not fulfilled. In this work, we formalize the relation between these two seemingly different issues. We theoretically show that in the standard scenario in which node features exhibit homophily, impactful structural attacks always lead to increased levels of heterophily. Then, inspired by GNN architectures that target heterophily, we present two designs  (i) separate aggregators for ego and neighborembeddings, and (ii) a reduced scope of aggregation  that can significantly improve the robustness of GNNs. Our extensive empirical evaluations show that GNNs featuring merely these two designs can achieve significantly improved robustness compared to the bestperforming unvaccinated model with 24.99% gain in average performance under targeted attacks, while having smaller computational overhead than existing defense mechanisms. Furthermore, these designs can be readily combined with explicit defense mechanisms to yield stateoftheart robustness with up to 18.33% increase in performance under attacks compared to the bestperforming vaccinated model.
 [25] arXiv:2106.07769 (crosslist from cs.LG) [pdf, other]

Title: The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and RegularizationComments: 19 pages, 2 figures. Submitted to NeurIPS 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the socalled "$\eta$trick" that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.
 [26] arXiv:2106.07779 (crosslist from cs.LG) [pdf, ps, other]

Title: Boosting in the Presence of Massart NoiseAuthors: Ilias Diakonikolas, Russell Impagliazzo, Daniel Kane, Rex Lei, Jessica Sorrell, Christos TzamosSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the problem of boosting the accuracy of a weak learner in the (distributionindependent) PAC model with Massart noise. In the Massart noise model, the label of each example $x$ is independently misclassified with probability $\eta(x) \leq \eta$, where $\eta<1/2$. The Massart model lies between the random classification noise model and the agnostic model. Our main positive result is the first computationally efficient boosting algorithm in the presence of Massart noise that achieves misclassification error arbitrarily close to $\eta$. Prior to our work, no nontrivial booster was known in this setting. Moreover, we show that this error upper bound is best possible for polynomialtime blackbox boosters, under standard cryptographic assumptions. Our upper and lower bounds characterize the complexity of boosting in the distributionindependent PAC model with Massart noise. As a simple application of our positive result, we give the first efficient Massart learner for unions of highdimensional rectangles.
 [27] arXiv:2106.07804 (crosslist from cs.LG) [pdf, other]

Title: Controlling Neural Networks with Rule RepresentationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose a novel training method to integrate rules into deep learning, in a way their strengths are controllable at inference. Deep Neural Networks with Controllable Rule Representations (DeepCTRL) incorporates a rule encoder into the model coupled with a rulebased objective, enabling a shared representation for decision making. DeepCTRL is agnostic to data type and model architecture. It can be applied to any kind of rule defined for inputs and outputs. The key aspect of DeepCTRL is that it does not require retraining to adapt the rule strength  at inference, the user can adjust it based on the desired operation point on accuracy vs. rule verification ratio. In realworld domains where incorporating rules is critical  such as Physics, Retail and Healthcare  we show the effectiveness of DeepCTRL in teaching rules for deep learning. DeepCTRL improves the trust and reliability of the trained models by significantly increasing their rule verification ratio, while also providing accuracy gains at downstream tasks. Additionally, DeepCTRL enables novel use cases such as hypothesis testing of the rules on data samples, and unsupervised adaptation based on shared rules between datasets.
 [28] arXiv:2106.07814 (crosslist from cs.LG) [pdf, other]

Title: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond LinearityComments: ICML 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multilayer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
 [29] arXiv:2106.07830 (crosslist from cs.LG) [pdf, other]

Title: On the Convergence of Deep Learning with Differential PrivacySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In deep learning with differential privacy (DP), the neural network achieves the privacy usually at the cost of slower convergence (and thus lower performance) than its nonprivate counterpart. This work gives the first convergence analysis of the DP deep learning, through the lens of training dynamics and the neural tangent kernel (NTK). Our convergence theory successfully characterizes the effects of two key components in the DP training: the persample clipping (flat or layerwise) and the noise addition. Our analysis not only initiates a general principled framework to understand the DP deep learning with any network architecture and loss function, but also motivates a new clipping method  the global clipping, that significantly improves the convergence while preserving the same privacy guarantee as the existing local clipping.
In terms of theoretical results, we establish the precise connection between the persample clipping and NTK matrix. We show that in the gradient flow, i.e., with infinitesimal learning rate, the noise level of DP optimizers does not affect the convergence. We prove that DP gradient descent (GD) with global clipping guarantees the monotone convergence to zero loss, which can be violated by the existing DPGD with local clipping. Notably, our analysis framework easily extends to other optimizers, e.g., DPAdam. Empirically speaking, DP optimizers equipped with global clipping perform strongly on a wide range of classification and regression tasks. In particular, our global clipping is surprisingly effective at learning calibrated classifiers, in contrast to the existing DP classifiers which are oftentimes overconfident and unreliable. Implementationwise, the new clipping can be realized by adding one line of code into the Opacus library.  [30] arXiv:2106.07832 (crosslist from cs.LG) [pdf, other]

Title: Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient DescentSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We focus on the problem of efficient sampling and learning of probability densities by incorporating symmetries in probabilistic models. We first introduce Equivariant Stein Variational Gradient Descent algorithm  an equivariant sampling method based on Stein's identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through equivariant kernels which makes the resultant sampler efficient both in terms of sample complexity and the quality of generated samples. Subsequently, we define equivariant energy based models to model invariant densities that are learned using contrastive divergence. By utilizing our equivariant SVGD for training equivariant EBMs, we propose new ways of improving and scaling up training of energy based models. We apply these equivariant energy models for modelling joint densities in regression and classification tasks for image datasets, manybody particle systems and molecular structure generation.
 [31] arXiv:2106.07836 (crosslist from cs.LG) [pdf, other]

Title: Improved Regret Bounds for Online Submodular MaximizationSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
In this paper, we consider an online optimization problem over $T$ rounds where at each step $t\in[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $\mathcal{K}$. A utility function $f_t(\cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$. This problem has been previously studied under the assumption that the utilities are adversarially chosen monotone DRsubmodular functions and $\mathcal{O}(\sqrt{T})$ regret bounds have been derived. We first characterize the class of strongly DRsubmodular functions and then, we derive regret bounds for the following new online settings: $(1)$ $\{f_t\}_{t=1}^T$ are monotone strongly DRsubmodular and chosen adversarially, $(2)$ $\{f_t\}_{t=1}^T$ are monotone submodular (while the average $\frac{1}{T}\sum_{t=1}^T f_t$ is strongly DRsubmodular) and chosen by an adversary but they arrive in a uniformly random order, $(3)$ $\{f_t\}_{t=1}^T$ are drawn i.i.d. from some unknown distribution $f_t\sim \mathcal{D}$ where the expected function $f(\cdot)=\mathbb{E}_{f_t\sim\mathcal{D}}[f_t(\cdot)]$ is monotone DRsubmodular. For $(1)$, we obtain the first logarithmic regret bounds. In terms of the second framework, we show that it is possible to obtain similar logarithmic bounds with high probability. Finally, for the i.i.d. model, we provide algorithms with $\tilde{\mathcal{O}}(\sqrt{T})$ stochastic regret bound, both in expectation and with high probability. Experimental results demonstrate that our algorithms outperform the previous techniques in the aforementioned three settings.
 [32] arXiv:2106.07841 (crosslist from cs.LG) [pdf, other]

Title: Randomized Exploration for Reinforcement Learning with General Value Function ApproximationAuthors: Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. YangComments: 32 page, 5 figures, in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose a modelfree reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upperconfidencebound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCBstyle bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class $\mathcal{F}$, our algorithm achieves a worstcase regret bound of $\widetilde{O}(\mathrm{poly}(d_EH)\sqrt{T})$ where $T$ is the time elapsed, $H$ is the planning horizon and $d_E$ is the $\textit{eluder dimension}$ of $\mathcal{F}$. In the linear setting, our algorithm reduces to LSVIPHE, a variant of RLSVI, that enjoys an $\widetilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret. We complement the theory with an empirical evaluation across known difficult exploration tasks.
 [33] arXiv:2106.07847 (crosslist from cs.LG) [pdf, other]

Title: Learning Stable Classifiers by Transferring Unstable FeaturesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We study transfer learning in the presence of spurious correlations. We experimentally demonstrate that directly transferring the stable feature extractor learned on the source task may not eliminate these biases for the target task. However, we hypothesize that the unstable features in the source task and those in the target task are directly related. By explicitly informing the target classifier of the source task's unstable features, we can regularize the biases in the target task. Specifically, we derive a representation that encodes the unstable features by contrasting different data environments in the source task. On the target task, we cluster data from this representation, and achieve robustness by minimizing the worstcase risk across all clusters. We evaluate our method on both text and image classifications. Empirical results demonstrate that our algorithm is able to maintain robustness on the target task, outperforming the best baseline by 22.9% in absolute accuracy across 12 transfer settings. Our code is available at https://github.com/YujiaBao/Tofu.
 [34] arXiv:2106.07908 (crosslist from cs.LG) [pdf, ps, other]

Title: Machine learningbased conditional mean filter: a generalization of the ensemble Kalman filter for nonlinear data assimilationAuthors: TruongVinh Hoang (1), Sebastian Krumscheid (1), Hermann G. Matthies (2), Raúl Tempone (1 and 3) ((1) Chair of Mathematics for Uncertainty Quantification, RWTH Aachen University, (2) Technische Universität Braunschweig (3) Computer, Electrical and Mathematical Sciences and Engineering, KAUST, and Alexander von Humboldt professor in Mathematics of Uncertainty Quantification, RWTH Aachen University)Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO); Machine Learning (stat.ML)
Filtering is a data assimilation technique that performs the sequential inference of dynamical systems states from noisy observations. Herein, we propose a machine learningbased ensemble conditional mean filter (MLEnCMF) for tracking possibly highdimensional nonGaussian state models with nonlinear dynamics based on sparse observations. The proposed filtering method is developed based on the conditional expectation and numerically implemented using machine learning (ML) techniques combined with the ensemble method. The contribution of this work is twofold. First, we demonstrate that the ensembles assimilated using the ensemble conditional mean filter (EnCMF) provide an unbiased estimator of the Bayesian posterior mean, and their variance matches the expected conditional variance. Second, we implement the EnCMF using artificial neural networks, which have a significant advantage in representing nonlinear functions over highdimensional domains such as the conditional mean. Finally, we demonstrate the effectiveness of the MLEnCMF for tracking the states of Lorenz63 and Lorenz96 systems under the chaotic regime. Numerical results show that the MLEnCMF outperforms the ensemble Kalman filter.
 [35] arXiv:2106.07909 (crosslist from cs.SI) [pdf, other]

Title: Evaluating the Effect of the Financial Status to the Mobility CustomsSubjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Applications (stat.AP)
In this article, we explore the relationship between cellular phone data and housing prices in Budapest, Hungary. We determine mobility indicators from one months of Call Detail Records (CDR) data, while the property price data are used to characterize the socioeconomic status at the Capital of Hungary. First, we validated the proposed methodology by comparing the Home and Work locations estimation and the commuting patterns derived from the cellular network dataset with reports of the national mini census. We investigated the statistical relationships between mobile phone indicators, such as Radius of Gyration, the distance between Home and Work locations or the Entropy of visited cells, and measures of economic status based on housing prices. Our findings show that the mobility correlates significantly with the socioeconomic status. We performed Principal Component Analysis (PCA) on combined vectors of mobility indicators in order to characterize the dependence of mobility habits on socioeconomic status. The results of the PCA investigation showed remarkable correlation of housing prices and mobility customs.
 [36] arXiv:2106.07911 (crosslist from math.OC) [pdf, other]

Title: Nonasymptotic convergence bounds for Wasserstein approximation using point cloudsSubjects: Optimization and Control (math.OC); Machine Learning (stat.ML)
Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ points which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is nonconvex. Yet, in most cases, a suitably adjusted version of Lloyd's algorithm  in which Voronoi cells are replaced by Power cells  leads to configurations with small Wasserstein error. This is surprising because, again, of the nonconvex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloydtype algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified PoliakLojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.
 [37] arXiv:2106.07914 (crosslist from cs.LG) [pdf, other]

Title: Control Variates for Slate OffPolicy EvaluationSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
We study the problem of offpolicy evaluation from batched contextual bandit data with multidimensional actions, often termed slates. The problem is common to recommender systems and userinterface optimization, and it is particularly challenging because of the combinatoriallysized action space. Swaminathan et al. (2017) have proposed the pseudoinverse (PI) estimator under the assumption that the conditional mean rewards are additive in actions. Using control variates, we consider a large class of unbiased estimators that includes as specific cases the PI estimator and (asymptotically) its selfnormalized variant. By optimizing over this class, we obtain new estimators with risk improvement guarantees over both the PI and selfnormalized PI estimators. Experiments with realworld recommender data as well as synthetic data validate these improvements in practice.
 [38] arXiv:2106.07992 (crosslist from cs.LG) [pdf, other]

Title: Time Series Anomaly Detection for Cyberphysical Systems via Neural System Identification and Bayesian FilteringComments: Accepted to appear in KDD 2021Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Recent advances in AIoT technologies have led to an increasing popularity of utilizing machine learning algorithms to detect operational failures for cyberphysical systems (CPS). In its basic form, an anomaly detection module monitors the sensor measurements and actuator states from the physical plant, and detects anomalies in these measurements to identify abnormal operation status. Nevertheless, building effective anomaly detection models for CPS is rather challenging as the model has to accurately detect anomalies in presence of highly complicated system dynamics and unknown amount of sensor noise. In this work, we propose a novel time series anomaly detection method called Neural System Identification and Bayesian Filtering (NSIBF) in which a specially crafted neural network architecture is posed for system identification, i.e., capturing the dynamics of CPS in a dynamical statespace model; then a Bayesian filtering algorithm is naturally applied on top of the "identified" statespace model for robust anomaly detection by tracking the uncertainty of the hidden state of the system recursively over time. We provide qualitative as well as quantitative experiments with the proposed method on a synthetic and three realworld CPS datasets, showing that NSIBF compares favorably to the stateoftheart methods with considerable improvements on anomaly detection in CPS.
 [39] arXiv:2106.08027 (crosslist from cs.LG) [pdf, other]

Title: Multivariate Business Process Representation Learning utilizing Gramian Angular Fields and Convolutional Neural NetworksComments: Accepted at the Business Process Management Conference 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Learning meaningful representations of data is an important aspect of machine learning and has recently been successfully applied to many domains like language understanding or computer vision. Instead of training a model for one specific task, representation learning is about training a model to capture all useful information in the underlying data and make it accessible for a predictor. For predictive process analytics, it is essential to have all explanatory characteristics of a process instance available when making predictions about the future, as well as for clustering and anomaly detection. Due to the large variety of perspectives and types within business process data, generating a good representation is a challenging task. In this paper, we propose a novel approach for representation learning of business process instances which can process and combine most perspectives in an event log. In conjunction with a selfsupervised pretraining method, we show the capabilities of the approach through a visualization of the representation space and case retrieval. Furthermore, the pretrained model is finetuned to multiple process prediction tasks and demonstrates its effectiveness in comparison with existing approaches.
 [40] arXiv:2106.08048 (crosslist from qbio.PE) [pdf, other]

Title: Epidemic modelling of multiple virus strains:a case study of SARSCoV2 B.1.1.7 in MoscowSubjects: Populations and Evolution (qbio.PE); Machine Learning (cs.LG); Applications (stat.AP)
During a longrunning pandemic a pathogen can mutate, producing new strains with different epidemiological parameters. Existing approaches to epidemic modelling only consider one virus strain. We have developed a modified SEIR model to simulate multiple virus strains within the same population. As a case study, we investigate the potential effects of SARSCoV2 strain B.1.1.7 on the city of Moscow. Our analysis indicates a high risk of a new wave of infections in SeptemberOctober 2021 with up to 35 000 daily infections at peak. We opensource our code and data.
 [41] arXiv:2106.08056 (crosslist from cs.LG) [pdf, other]

Title: Coupled Gradient Estimators for Discrete Latent VariablesComments: Under ReviewSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators. While lowvariance reparameterization gradients of a continuous relaxation can provide an effective solution, a continuous relaxation is not always available or tractable. Dong et al. (2020) and Yin et al. (2020) introduced a performant estimator that does not rely on continuous relaxations; however, it is limited to binary random variables. We introduce a novel derivation of their estimator based on importance sampling and statistical couplings, which we extend to the categorical setting. Motivated by the construction of a stickbreaking coupling, we introduce gradient estimators based on reparameterizing categorical variables as sequences of binary variables and RaoBlackwellization. In systematic experiments, we show that our proposed categorical gradient estimators provide stateoftheart performance, whereas even with additional RaoBlackwellization, previous estimators (Yin et al., 2019) underperform a simpler REINFORCE with a leaveoneoutbaseline estimator (Kool et al., 2019).
 [42] arXiv:2106.08068 (crosslist from cs.LG) [pdf, other]

Title: An Analytical Theory of Curriculum Learning in TeacherStudent NetworksComments: 10 pages + appendixSubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); Machine Learning (stat.ML)
In humans and animals, curriculum learning  presenting data in a curated order  is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help?
In this work, we analyse a prototypical neural network model of curriculum learning in the highdimensional limit, employing statistical physics methods. Curricula could in principle change both the learning speed and asymptotic performance of a model. To study the former, we provide an exact description of the online learning setting, confirming the longstanding experimental observation that curricula can modestly speed up learning. To study the latter, we derive performance in a batch learning setting, in which a network trains to convergence in successive phases of learning on dataset slices of varying difficulty. With standard training losses, curriculum does not provide generalisation benefit, in line with empirical observations. However, we show that by connecting different learning phases through simple Gaussian priors, curriculum can yield a large improvement in test performance. Taken together, our reduced analytical descriptions help reconcile apparently conflicting empirical results and trace regimes where curriculum learning yields the largest gains. More broadly, our results suggest that fully exploiting a curriculum may require explicit changes to the loss function at curriculum boundaries.  [43] arXiv:2106.08077 (crosslist from cs.CV) [pdf, other]

Title: Computeraided Interpretable Features for Leaf Image ClassificationComments: 31 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
Plant species identification is time consuming, costly, and requires lots of efforts, and expertise knowledge. In recent, many researchers use deep learning methods to classify plants directly using plant images. While deep learning models have achieved a great success, the lack of interpretability limit their widespread application. To overcome this, we explore the use of interpretable, measurable and computeraided features extracted from plant leaf images. Image processing is one of the most challenging, and crucial steps in featureextraction. The purpose of image processing is to improve the leaf image by removing undesired distortion. The main image processing steps of our algorithm involves: i) Convert original image to RGB (RedGreenBlue) image, ii) Gray scaling, iii) Gaussian smoothing, iv) Binary thresholding, v) Remove stalk, vi) Closing holes, and vii) Resize image. The next step after image processing is to extract features from plant leaf images. We introduced 52 computationally efficient features to classify plant species. These features are mainly classified into four groups as: i) shapebased features, ii) colorbased features, iii) texturebased features, and iv) scagnostic features. Length, width, area, texture correlation, monotonicity and scagnostics are to name few of them. We explore the ability of features to discriminate the classes of interest under supervised learning and unsupervised learning settings. For that, supervised dimensionality reduction technique, Linear Discriminant Analysis (LDA), and unsupervised dimensionality reduction technique, Principal Component Analysis (PCA) are used to convert and visualize the images from digitalimage space to feature space. The results show that the features are sufficient to discriminate the classes of interest under both supervised and unsupervised learning settings.
 [44] arXiv:2106.08171 (crosslist from cs.LG) [pdf, other]

Title: Evaluating Modules in Graph Contrastive LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The recent emergence of contrastive learning approaches facilitates the research on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar sample pairs to encode the semantics into node or graph embeddings. However, most existing works only performed modellevel evaluation, and did not explore the combination space of modules for more comprehensive and systematic studies. For effective modulelevel evaluation, we propose a framework that decomposes GCL models into four modules: (1) a sampler to generate anchor, positive and negative data samples (nodes or graphs); (2) an encoder and a readout function to get sample embeddings; (3) a discriminator to score each sample pair (anchorpositive and anchornegative); and (4) an estimator to define the loss function. Based on this framework, we conduct controlled experiments over a wide range of architectural designs and hyperparameter settings on node and graph classification tasks. Specifically, we manage to quantify the impact of a single module, investigate the interaction between modules, and compare the overall performance with current model architectures. Our key findings include a set of modulelevel guidelines for GCL, e.g., simple samplers from LINE and DeepWalk are strong and robust; an MLP encoder associated with Sum readout could achieve competitive performance on graph classification. Finally, we release our implementations and results as OpenGCL, a modularized toolkit that allows convenient reproduction, standard model and module evaluation, and easy extension.
 [45] arXiv:2106.08285 (crosslist from cs.CV) [pdf, other]

Title: MultiStyleGAN: Towards ImageBased Simulation of TimeLapse LiveCell MicroscopyComments: accepted to MICCAI 2021. (Tim Prangemeier and Christoph Reich  both authors contributed equally)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (qbio.QM); Machine Learning (stat.ML)
Timelapse fluorescent microscopy (TLFM) combined with predictive mathematical modelling is a powerful tool to study the inherently dynamic processes of life on the singlecell level. Such experiments are costly, complex and labour intensive. A complimentary approach and a step towards completely in silico experiments, is to synthesise the imagery itself. Here, we propose MultiStyleGAN as a descriptive approach to simulate timelapse fluorescence microscopy imagery of living cells, based on a past experiment. This novel generative adversarial network synthesises a multidomain sequence of consecutive timesteps. We showcase MultiStyleGAN on imagery of multiple live yeast cells in microstructured environments and train on a dataset recorded in our laboratory. The simulation captures underlying biophysical factors and time dependencies, such as cell morphology, growth, physical interactions, as well as the intensity of a fluorescent reporter protein. An immediate application is to generate additional training and validation data for feature extraction algorithms or to aid and expedite development of advanced experimental techniques such as online monitoring or control of cells.
Code and dataset is available at https://git.rwthaachen.de/bcs/projects/tp/multistylegan.  [46] arXiv:2106.08297 (crosslist from math.PR) [pdf, ps, other]

Title: Diagonal sections of copulas, multivariate conditional hazard rates and distributions of order statistics for minimally stable lifetimesSubjects: Probability (math.PR); Statistics Theory (math.ST); Methodology (stat.ME)
As a motivating problem, we aim to study some special aspects of the marginal distributions of the order statistics for exchangeable and (more generally) for minimally stable nonnegative random variables $T_{1},...,T_{r}$. In any case, we assume that $T_{1},...,T_{r}$ are identically distributed, with a common survival function $\overline{G}$ and their survival copula is denoted by $K$. The diagonal's and subdiagonals' sections of $K$, along with $\overline{G}$, are possible tools to describe the information needed to recover the laws of order statistics.
When attention is restricted to the absolutely continuous case, such a joint distribution can be described in terms of the associated multivariate conditional hazard rate (m.c.h.r.) functions. We then study the distributions of the order statistics of $T_{1},...,T_{r}$ also in terms of the system of the m.c.h.r. functions. We compare and, in a sense, we combine the two different approaches in order to obtain different detailed formulas and to analyze some probabilistic aspects for the distributions of interest. This study also leads us to compare the two cases of exchangeable and minimally stable variables both in terms of copulas and of m.c.h.r. functions. The paper concludes with the analysis of two remarkable special cases of stochastic dependence, namely Archimedean copulas and load sharing models. This analysis will allow us to provide some illustrative examples, and some discussion about peculiar aspects of our results.
Replacements for Wed, 16 Jun 21
 [47] arXiv:1901.10002 (replaced) [pdf, other]

Title: A Framework for Understanding Sources of Harm throughout the Machine Learning Life CycleComments: 11 pages plus references; updated with corrections to text and figures, new examples, and a more thorough walkthrough of MLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [48] arXiv:1902.09653 (replaced) [pdf, other]

Title: Estimating Atmospheric Motion Winds from Satellite Image Data using Spacetime Drift ModelsSubjects: Applications (stat.AP); Computation (stat.CO); Methodology (stat.ME)
 [49] arXiv:1903.04556 (replaced) [pdf, other]

Title: Embarrassingly parallel MCMC using deep invertible transformationsComments: Accepted to UAI 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [50] arXiv:1903.05631 (replaced) [pdf, other]

Title: STUNet: A SpatioTemporal UNetwork for Graphstructured Time Series ModelingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [51] arXiv:1907.05689 (replaced) [pdf, other]

Title: Gittins' theorem under uncertaintySubjects: Optimization and Control (math.OC); Probability (math.PR); Statistics Theory (math.ST); Computational Finance (qfin.CP)
 [52] arXiv:1909.00453 (replaced) [pdf, other]

Title: Topics to Avoid: Demoting Latent Confounds in Text ClassificationComments: 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019)Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [53] arXiv:1910.10897 (replaced) [pdf, other]

Title: MetaWorld: A Benchmark and Evaluation for MultiTask and Meta Reinforcement LearningAuthors: Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Avnish Narayan, Hayden Shively, Adithya Bellathur, Karol Hausman, Chelsea Finn, Sergey LevineComments: This is an update version of a manuscript that originally appeared at CoRL 2019. Videos are here: metaworld.github.io, opensourced code are available at: this https URL, and the baselines can be found at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
 [54] arXiv:1910.12016 (replaced) [pdf, other]

Title: Tensor QRank: New Data Dependent Definition of Tensor RankSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [55] arXiv:1910.14215 (replaced) [pdf, other]

Title: Multivariate Uncertainty in Deep LearningComments: To be published in IEEE Transactions on Neural Networks and Learning SystemsSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO); Machine Learning (stat.ML)
 [56] arXiv:1912.08421 (replaced) [pdf, other]

Title: Learning to Prevent Leakage: PrivacyPreserving Inference in the Mobile CloudSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
 [57] arXiv:2001.10119 (replaced) [pdf, other]

Title: Unsupervised Program Synthesis for Images By Sampling Without ReplacementComments: Accepted to UAI 2021Journalref: UAI 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [58] arXiv:2002.03206 (replaced) [pdf, other]

Title: Characterizing Structural Regularities of Labeled Data in Overparameterized ModelsComments: 17 pages, 20 figures, ICML 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [59] arXiv:2002.11743 (replaced) [pdf, other]

Title: Composing Normalizing Flows for Inverse ProblemsSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
 [60] arXiv:2004.11231 (replaced) [pdf, other]

Title: Federated Stochastic Gradient Langevin DynamicsComments: Accepted to UAI 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [61] arXiv:2004.11468 (replaced) [pdf, other]

Title: How to find a unicorn: a novel modelfree, unsupervised anomaly detection method for time seriesSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
 [62] arXiv:2004.14180 (replaced) [pdf, other]

Title: Quantized Adam with Error FeedbackComments: Accepted to ACM Transactions on Intelligent Systems and TechnologySubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [63] arXiv:2005.08898 (replaced) [pdf, ps, other]

Title: Accelerating IllConditioned LowRank Matrix Estimation via Scaled Gradient DescentComments: Accepted to Journal of Machine Learning ResearchSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [64] arXiv:2006.01017 (replaced) [pdf, ps, other]

Title: Improved SVRG for quadratic functionsAuthors: Nabil KahaleComments: 14 pagesSubjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
 [65] arXiv:2006.07002 (replaced) [pdf, ps, other]

Title: Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression TasksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [66] arXiv:2006.10246 (replaced) [pdf, other]

Title: The Recurrent Neural Tangent KernelSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [67] arXiv:2006.14512 (replaced) [pdf, other]

Title: Uncovering the Connections Between Adversarial Transferability and Knowledge TransferabilityComments: Accepted to ICML 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [68] arXiv:2007.00674 (replaced) [pdf, other]

Title: Sliced Iterative Normalizing FlowsComments: 19 pages, 12 figures, 7 tables. Code available at this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [69] arXiv:2007.04441 (replaced) [pdf, other]

Title: Sparse Regression for Extreme ValuesComments: 4 figuresSubjects: Methodology (stat.ME)
 [70] arXiv:2007.05426 (replaced) [pdf, other]

Title: Variational Inference with ContinuouslyIndexed Normalizing FlowsComments: Accepted for publication at UAI 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [71] arXiv:2007.10306 (replaced) [pdf, other]

Title: An Empirical Characterization of Fair Machine Learning For Clinical Risk PredictionComments: Published in the Journal of Biomedical Informatics (this https URL). Version 3 updates acknowledgements and fixes typosJournalref: Journal of Biomedical Informatics, Volume 113, January 2021, 103621Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
 [72] arXiv:2007.10725 (replaced) [pdf, other]

Title: Majorisation as a theory for uncertaintySubjects: Statistics Theory (math.ST)
 [73] arXiv:2007.15588 (replaced) [pdf, other]

Title: Dataefficient Hindsight Offpolicy Option LearningAuthors: Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin RiedmillerComments: Published at ICML2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
 [74] arXiv:2008.07428 (replaced) [pdf, other]

Title: Fast decentralized nonconvex finitesum optimization with recursive variance reductionSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [75] arXiv:2009.04651 (replaced) [pdf, other]

Title: Universal consistency of Wasserstein $k$NN classifierAuthors: Donlapark PonnopratComments: 22 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [76] arXiv:2009.04832 (replaced) [pdf, other]

Title: A note on posttreatment selection in studying racial discrimination in policingComments: Accepted for publication in the American Political Science Review on 14th June, 2021Subjects: Applications (stat.AP); Methodology (stat.ME)
 [77] arXiv:2009.07101 (replaced) [pdf, ps, other]

Title: Approximate spectral clustering using both reference vectors and topology of the network generated by growing neural gasAuthors: Kazuhisa FujitaSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [78] arXiv:2009.08372 (replaced) [pdf, other]

Title: A Principle of Least Action for the Training of Neural NetworksComments: ECML PKDD 2020Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [79] arXiv:2009.09931 (replaced) [pdf, other]

Title: FieldEmbedded Factorization Machines for Clickthrough rate predictionAuthors: Harshit PandeComments: 13 pagesSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [80] arXiv:2009.13566 (replaced) [pdf, other]

Title: Graph Neural Networks with HeterophilyComments: Proceedings version of AAAI 2021 with appendix and additional typo fixes; 12 pages, 4 figuresJournalref: Proceedings of the AAAI Conference on Artificial Intelligence. 35, 12 (May 2021), 1116811176Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
 [81] arXiv:2010.00060 (replaced) [pdf, other]

Title: Constructions and Comparisons of Pooling Matrices for Pooled Testing of COVID19Subjects: Populations and Evolution (qbio.PE); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Methodology (stat.ME)
 [82] arXiv:2010.06147 (replaced) [pdf, other]

Title: Treed distributed lag nonlinear modelsComments: 31 pages, 1 table, 4 figuresSubjects: Methodology (stat.ME)
 [83] arXiv:2010.13511 (replaced) [pdf, ps, other]

Title: Efficient Optimization Methods for Extreme Similarity Learning with Nonlinear EmbeddingsComments: Published as a conference paper at KDD 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [84] arXiv:2010.14860 (replaced) [pdf, other]

Title: The Evidence Lower Bound of Variational Autoencoders Converges to a Sum of Three EntropiesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [85] arXiv:2010.15727 (replaced) [pdf, other]

Title: Amortized Probabilistic Detection of Communities in GraphsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [86] arXiv:2011.03639 (replaced) [pdf, other]

Title: Graph cuts always find a global optimum for Potts models (with a catch)Comments: Published at ICML 2021. 18 pages, 2 figuresSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
 [87] arXiv:2011.06931 (replaced) [pdf, other]

Title: The Safe Logrank Test: Error Control under Continuous Monitoring with Unlimited HorizonSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [88] arXiv:2012.02409 (replaced) [pdf, other]

Title: When does gradient descent with logistic loss find interpolating twolayer networks?Comments: 44 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
 [89] arXiv:2012.07941 (replaced) [pdf, other]

Title: Variable Selection with SecondGeneration PValuesSubjects: Methodology (stat.ME)
 [90] arXiv:2101.04408 (replaced) [pdf, other]

Title: Statistical analysis of periodic data in neuroscienceAuthors: Daniel H. BakerComments: 18 pages, 11 figuresSubjects: Methodology (stat.ME); Neurons and Cognition (qbio.NC)
 [91] arXiv:2102.07367 (replaced) [pdf, other]

Title: A NearOptimal Algorithm for Stochastic Bilevel Optimization via DoubleMomentumComments: 36 Pages, 10 FiguresSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [92] arXiv:2102.09030 (replaced) [pdf, other]

Title: Bringing Differential Private SGD to Practice: On the Independence of Gaussian Noise and the Number of Training RoundsComments: arXiv admin note: text overlap with arXiv:2007.09208Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [93] arXiv:2102.10473 (replaced) [pdf, other]

Title: Diagnostics for Conditional Density Models and Bayesian Inference AlgorithmsComments: cameraready version; accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021)Subjects: Methodology (stat.ME)
 [94] arXiv:2102.10769 (replaced) [pdf, other]

Title: MobILE: ModelBased Imitation Learning From Observation AloneComments: 27 pages, 5 figures, 2 tabular columnsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [95] arXiv:2102.11086 (replaced) [pdf, other]

Title: Improving Lossless Compression Rates via Monte Carlo BitsBack CodingAuthors: Yangjun Ruan, Karen Ullrich, Daniel Severo, James Townsend, Ashish Khisti, Arnaud Doucet, Alireza Makhzani, Chris J. MaddisonSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Computation (stat.CO)
 [96] arXiv:2102.11436 (replaced) [pdf, other]

Title: ModelBased Domain GeneralizationSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [97] arXiv:2102.12675 (replaced) [pdf]

Title: Computing Accurate Probabilistic Estimates of OneD Entropy from Equiprobable Random SamplesAuthors: Hoshin V Gupta, Mohammed Reza Ehsani, Tirthankar Roy, Maria A SansFuentes, Uwe Ehret, Ali BehrangiComments: 23 pages, 12 figuresSubjects: Methodology (stat.ME); Information Theory (cs.IT)
 [98] arXiv:2103.01400 (replaced) [pdf, other]

Title: Smoothness Analysis of Adversarial TrainingComments: 22 pages, 7 figures. In V3, we add the results of EntropySGD for adversarial trainingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [99] arXiv:2104.00995 (replaced) [pdf, other]

Title: Exponential Reduction in Sample Complexity with Learning of Ising Model DynamicsComments: Accepted to ICML 2021Subjects: Machine Learning (cs.LG); Statistical Mechanics (condmat.statmech); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
 [100] arXiv:2104.02095 (replaced) [pdf, ps, other]

Title: Analytic function approximation by path norm regularized deep networksAuthors: Aleksandr BeknazaryanSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [101] arXiv:2104.03279 (replaced) [pdf, other]

Title: Modern Hopfield Networks for Few and ZeroShot Reaction Template PredictionAuthors: Philipp Seidl, Philipp Renz, Natalia Dyubankova, Paulo Neves, Jonas Verhoeven, Marwin Segler, Jörg K. Wegner, Sepp Hochreiter, Günter KlambauerComments: 14 pages + 12 pages appendixSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (qbio.BM); Machine Learning (stat.ML)
 [102] arXiv:2104.04975 (replaced) [pdf, other]

Title: Scalable Marginal Likelihood Estimation for Model Selection in Deep LearningComments: ICML 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [103] arXiv:2104.05441 (replaced) [pdf, other]

Title: Unsuitability of NOTEARS for Causal Graph DiscoveryComments: 6 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [104] arXiv:2104.12672 (replaced) [pdf, other]

Title: A Novel Interactionbased Methodology Towards Explainable AI with Better Understanding of Pneumonia Chest Xray ImagesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
 [105] arXiv:2105.02381 (replaced) [pdf, other]

Title: The Effect of Medicaid Expansion on NonElderly Adult Uninsurance Rates Among States that did not Expand MedicaidSubjects: Applications (stat.AP)
 [106] arXiv:2105.04051 (replaced) [pdf, other]

Title: Aggregating From Multiple TargetShifted SourcesJournalref: ICML2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [107] arXiv:2105.13493 (replaced) [pdf, other]

Title: Efficient and Accurate Gradients for Neural SDEsComments: Submitted to NeurIPS 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Machine Learning (stat.ML)
 [108] arXiv:2106.00774 (replaced) [pdf, other]

Title: Optimizing Functionals on the Space of Probabilities with Input Convex Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)
 [109] arXiv:2106.03640 (replaced) [pdf, other]

Title: Making EfficientNet More Efficient: Exploring BatchIndependent Normalization, Group Convolutions and Reduced Resolution TrainingSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [110] arXiv:2106.06044 (replaced) [pdf, other]

Title: Convergence and Alignment of Gradient Descent with Random Back Propagation WeightsComments: 33 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [111] arXiv:2106.06885 (replaced) [pdf, other]

Title: Online Learning with Optimism and DelayAuthors: Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, Lester MackeyComments: ICML 2021. 9 pages of main paper and 26 pages of appendix textSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [112] arXiv:2106.06918 (replaced) [pdf, ps, other]

Title: A Phylogenetic Trees Analysis of SARSCoV2Comments: 22 pages, 16 figuresSubjects: Methodology (stat.ME); Populations and Evolution (qbio.PE)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2106, contact, help (Access key information)