We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 112 entries: 1-112 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 16 Jun 21

[1]  arXiv:2106.07695 [pdf, other]
Title: Adaptive normalization for IPW estimation
Comments: 32 pages, 7 figures
Subjects: Methodology (stat.ME)

Inverse probability weighting (IPW) is a general tool in survey sampling and causal inference, used both in Horvitz-Thompson estimators, which normalize by the sample size, and Haj\'ek/self-normalized estimators, which normalize by the sum of the inverse probability weights. In this work we study a family of IPW estimators, first proposed by Trotter and Tukey in the context of Monte Carlo problems, that are normalized by an affine combination of these two terms. We show how selecting an estimator from this family in a data-dependent way to minimize asymptotic variance leads to an iterative procedure that converges to an estimator with connections to regression control methods. We refer to this estimator as an adaptively normalized estimator. For mean estimation in survey sampling, this estimator has asymptotic variance that is never worse than the Horvitz--Thompson or Haj\'ek estimators, and is smaller except in edge cases. Going further, we show that adaptive normalization can be used to propose improvements of the augmented IPW (AIPW) estimator, average treatment effect (ATE) estimators, and policy learning objectives. Appealingly, these proposals preserve both the asymptotic efficiency of AIPW and the regret bounds for policy learning with IPW objectives, and deliver consistent finite sample improvements in simulations for all three of mean estimation, ATE estimation, and policy learning.

[2]  arXiv:2106.07717 [pdf, ps, other]
Title: Robust Inference for High-Dimensional Linear Models via Residual Randomization
Journal-ref: International Conference on Machine Learning 2021
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

We propose a residual randomization procedure designed for robust Lasso-based inference in the high-dimensional setting. Compared to earlier work that focuses on sub-Gaussian errors, the proposed procedure is designed to work robustly in settings that also include heavy-tailed covariates and errors. Moreover, our procedure can be valid under clustered errors, which is important in practice, but has been largely overlooked by earlier work. Through extensive simulations, we illustrate our method's wider range of applicability as suggested by theory. In particular, we show that our method outperforms state-of-art methods in challenging, yet more realistic, settings where the distribution of covariates is heavy-tailed or the sample size is small, while it remains competitive in standard, ``well behaved" settings previously studied in the literature.

[3]  arXiv:2106.07725 [pdf, other]
Title: Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality
Subjects: Statistics Theory (math.ST)

Distance covariance is a popular dependence measure for two random vectors $X$ and $Y$ of possibly different dimensions and types. Recent years have witnessed concentrated efforts in the literature to understand the distributional properties of the sample distance covariance in a high-dimensional setting, with an exclusive emphasis on the null case that $X$ and $Y$ are independent. This paper derives the first non-null central limit theorem for the sample distance covariance, and the more general sample (Hilbert-Schmidt) kernel distance covariance in high dimensions, primarily in the Gaussian case. The new non-null central limit theorem yields an asymptotically exact first-order power formula for the widely used generalized kernel distance correlation test of independence between $X$ and $Y$. The power formula in particular unveils an interesting universality phenomenon: the power of the generalized kernel distance correlation test is completely determined by $n\cdot \text{dcor}^2(X,Y)/\sqrt{2}$ in the high dimensional limit, regardless of a wide range of choices of the kernels and bandwidth parameters. Furthermore, this separation rate is also shown to be optimal in a minimax sense. The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows, among other things, that the non-null Gaussian approximation of the sample distance covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between $X$ and $Y$.

[4]  arXiv:2106.07761 [pdf, other]
Title: Linear-Time Probabilistic Solutions of Boundary Value Problems
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)

We propose a fast algorithm for the probabilistic solution of boundary value problems (BVPs), which are ordinary differential equations subject to boundary conditions. In contrast to previous work, we introduce a Gauss--Markov prior and tailor it specifically to BVPs, which allows computing a posterior distribution over the solution in linear time, at a quality and cost comparable to that of well-established, non-probabilistic methods. Our model further delivers uncertainty quantification, mesh refinement, and hyperparameter adaptation. We demonstrate how these practical considerations positively impact the efficiency of the scheme. Altogether, this results in a practically usable probabilistic BVP solver that is (in contrast to non-probabilistic algorithms) natively compatible with other parts of the statistical modelling tool-chain.

[5]  arXiv:2106.07797 [pdf, other]
Title: Embracing Uncertainty in "Small Data" Problems: Estimating Earthquakes from Historical Anecdotes
Subjects: Applications (stat.AP); Geophysics (physics.geo-ph)

We apply the Bayesian inversion process to make principled estimates of the magnitude and location of a pre-instrumental earthquake in Eastern Indonesia in the mid 19th century, by combining anecdotal historical accounts of the resultant tsunami with our modern understanding of the geology of the region. Quantifying the seismic record prior to modern instrumentation is critical to a more thorough understanding of the current risks in Eastern Indonesia. In particular, the occurrence of such a major earthquake in the 1850s provides evidence that this region is susceptible to future seismic hazards on the same order of magnitude. More importantly, the approach taken here gives evidence that even "small data" that is limited in scope and extremely uncertain can still be used to yield information on past seismic events, which is key to an increased understanding of the current seismic state. Moreover, sensitivity bounds indicate that the results obtained here are robust despite the inherent uncertainty in the observations.

[6]  arXiv:2106.07816 [pdf, other]
Title: Tree-Values: selective inference for regression trees
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)

We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.

[7]  arXiv:2106.07834 [pdf, other]
Title: A Non-ergodic Effective Amplitude Ground-Motion Model for California
Comments: 34 pages, 18 figures
Subjects: Applications (stat.AP)

A new non-ergodic ground-motion model (GMM) for effective amplitude spectral ($EAS$) values for California is presented in this study. $EAS$, which is defined in Goulet et al. (2018), is a smoothed rotation-independent Fourier amplitude spectrum of the two horizontal components of an acceleration time history. The main motivation for developing a non-ergodic $EAS$ GMM, rather than a spectral acceleration GMM, is that the scaling of $EAS$ does not depend on spectral shape, and therefore, the more frequent small magnitude events can be used in the estimation of the non-ergodic terms.
The model is developed using the California subset of the NGAWest2 dataset Ancheta et al. (2013). The Bayless and Abrahamson (2019b) (BA18) ergodic $EAS$ GMM was used as backbone to constrain the average source, path, and site scaling. The non-ergodic GMM is formulated as a Bayesian hierarchical model: the non-ergodic source and site terms are modeled as spatially varying coefficients following the approach of Landwehr et al. (2016), and the non-ergodic path effects are captured by the cell-specific anelastic attenuation attenuation following the approach of Dawood and Rodriguez-Marek (2013). Close to stations and past events, the mean values of the non-ergodic terms deviate from zero to capture the systematic effects and their epistemic uncertainty is small. In areas with sparse data, the epistemic uncertainty of the non-ergodic terms is large, as the systematic effects cannot be determined.
The non-ergodic total aleatory standard deviation is approximately $30$ to $40\%$ smaller than the total aleatory standard deviation of BA18. This reduction in the aleatory variability has a significant impact on hazard calculations at large return periods. The epistemic uncertainty of the ground motion predictions is small in areas close to stations and past events.

[8]  arXiv:2106.07875 [pdf, other]
Title: S-LIME: Stabilized-LIME for Model Explanation
Comments: In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21), August 14--18, 2021, Virtual Event, Singapore
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.

[9]  arXiv:2106.07898 [pdf, other]
Title: Divergence Frontiers for Generative Models: Sample Complexity, Quantization Level, and Frontier Integral
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their ability to measure the quality-diversity trade-off inherent to deep generative modeling. However, the statistical behavior of divergence frontiers estimated from data remains unknown to this day. In this paper, we establish non-asymptotic bounds on the sample complexity of the plug-in estimator of divergence frontiers. Along the way, we introduce a novel integral summary of divergence frontiers. We derive the corresponding non-asymptotic bounds and discuss the choice of the quantization level by balancing the two types of approximation errors arisen from its computation. We also augment the divergence frontier framework by investigating the statistical performance of smoothed distribution estimators such as the Good-Turing estimator. We illustrate the theoretical results with numerical examples from natural language processing and computer vision.

[10]  arXiv:2106.08086 [pdf, other]
Title: Decomposition of Global Feature Importance into Direct and Associative Components (DEDACT)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Global model-agnostic feature importance measures either quantify whether features are directly used for a model's predictions (direct importance) or whether they contain prediction-relevant information (associative importance). Direct importance provides causal insight into the model's mechanism, yet it fails to expose the leakage of information from associated but not directly used variables. In contrast, associative importance exposes information leakage but does not provide causal insight into the model's mechanism. We introduce DEDACT - a framework to decompose well-established direct and associative importance measures into their respective associative and direct components. DEDACT provides insight into both the sources of prediction-relevant information in the data and the direct and indirect feature pathways by which the information enters the model. We demonstrate the method's usefulness on simulated examples.

[11]  arXiv:2106.08105 [pdf, other]
Title: Employing an Adjusted Stability Measure for Multi-Criteria Model Fitting on Data Sets with Similar Features
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Fitting models with high predictive accuracy that include all relevant but no irrelevant or redundant features is a challenging task on data sets with similar (e.g. highly correlated) features. We propose the approach of tuning the hyperparameters of a predictive model in a multi-criteria fashion with respect to predictive accuracy and feature selection stability. We evaluate this approach based on both simulated and real data sets and we compare it to the standard approach of single-criteria tuning of the hyperparameters as well as to the state-of-the-art technique "stability selection". We conclude that our approach achieves the same or better predictive performance compared to the two established approaches. Considering the stability during tuning does not decrease the predictive accuracy of the resulting models. Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features. The single-criteria approach fails at avoiding irrelevant or redundant features and the stability selection approach fails at selecting enough relevant features for achieving acceptable predictive accuracy. For our approach, for data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure, that is, a measure that considers similarities between features. For data sets with only few similar features, an unadjusted stability measure suffices and is faster to compute.

[12]  arXiv:2106.08161 [pdf, other]
Title: Contrastive Mixture of Posteriors for Counterfactual Inference, Data Integration and Fairness
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (q-bio.GN)

Learning meaningful representations of data that can address challenges such as batch effect correction, data integration and counterfactual inference is a central problem in many domains including computational biology. Adopting a Conditional VAE framework, we identify the mathematical principle that unites these challenges: learning a representation that is marginally independent of a condition variable. We therefore propose the Contrastive Mixture of Posteriors (CoMP) method that uses a novel misalignment penalty to enforce this independence. This penalty is defined in terms of mixtures of the variational posteriors themselves, unlike prior work which uses external discrepancy measures such as MMD to ensure independence in latent space. We show that CoMP has attractive theoretical properties compared to previous approaches, especially when there is complex global structure in latent space. We further demonstrate state of the art performance on a number of real-world problems, including the challenging tasks of aligning human tumour samples with cancer cell-lines and performing counterfactual inference on single-cell RNA sequencing data. Incidentally, we find parallels with the fair representation learning literature, and demonstrate CoMP has competitive performance in learning fair yet expressive latent representations.

[13]  arXiv:2106.08185 [pdf, other]
Title: Kernel Identification Through Transformers
Comments: 12 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models, as the chosen kernel determines both the inductive biases and prior support of functions under the GP prior. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. Drawing inspiration from recent progress in deep learning, we introduce a novel approach named KITT: Kernel Identification Through Transformers. KITT exploits a transformer-based architecture to generate kernel recommendations in under 0.1 seconds, which is several orders of magnitude faster than conventional kernel search algorithms. We train our model using synthetic data generated from priors over a vocabulary of known kernels. By exploiting the nature of the self-attention mechanism, KITT is able to process datasets with inputs of arbitrary dimension. We demonstrate that kernels chosen by KITT yield strong performance over a diverse collection of regression benchmarks.

[14]  arXiv:2106.08217 [pdf, other]
Title: RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests
Comments: 32 pages, 14 figures, 5 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Like many predictive models, random forests provide a point prediction for a new observation. Besides the point prediction, it is important to quantify the uncertainty in the prediction. Prediction intervals provide information about the reliability of the point predictions. We have developed a comprehensive R package, RFpredInterval, that integrates 16 methods to build prediction intervals with random forests and boosted forests. The methods implemented in the package are a new method to build prediction intervals with boosted forests (PIBF) and 15 different variants to produce prediction intervals with random forests proposed by Roy and Larocque (2020). We perform an extensive simulation study and apply real data analyses to compare the performance of the proposed method to ten existing methods to build prediction intervals with random forests. The results show that the proposed method is very competitive and, globally, it outperforms the competing methods.

[15]  arXiv:2106.08247 [pdf, ps, other]
Title: Canonical-Correlation-Based Fast Feature Selection
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

This paper proposes a canonical-correlation-based filter method for feature selection. The sum of squared canonical correlation coefficients is adopted as the feature ranking criterion. The proposed method boosts the computational speed of the ranking criterion in greedy search. The supporting theorems developed for the feature selection method are fundamental to the understanding of the canonical correlation analysis. In empirical studies, a synthetic dataset is used to demonstrate the speed advantage of the proposed method, and eight real datasets are applied to show the effectiveness of the proposed feature ranking criterion in both classification and regression. The results show that the proposed method is considerably faster than the definition-based method, and the proposed ranking criterion is competitive compared with the seven mutual-information-based criteria.

[16]  arXiv:2106.08277 [pdf, other]
Title: A Bayesian adaptive design for dual-agent phase I-II cancer clinical trials combining efficacy data across stages
Subjects: Methodology (stat.ME)

Integrated phase I-II clinical trial designs are efficient approaches to accelerate drug development. In cases where efficacy cannot be ascertained in a short period of time, two-stage approaches are usually employed. When different patient populations are involved across stages, it is worth of discussion about the use of efficacy data collected from both stages. In this paper, we focus on a two-stage design that aims to estimate safe dose combinations with a certain level of efficacy. In stage I, conditional escalation with overdose control (EWOC) is used to allocate successive cohorts of patients. The maximum tolerated dose (MTD) curve is estimated based on a Bayesian dose-toxicity model. In stage II, we consider an adaptive allocation of patients to drug combinations that have a high probability of being efficacious along the obtained MTD curve. A robust Bayesian hierarchical model is proposed to allow sharing of information on the efficacy parameters across stages assuming the related parameters are either exchangeable or nonexchangeable. Under the assumption of exchangeability, a random-effects distribution is specified for the main effects parameters to capture uncertainty about the between-stage differences. The proposed methodology is assessed with extensive simulations motivated by a real phase I-II drug combination trial using continuous doses.

[17]  arXiv:2106.08281 [pdf, other]
Title: A Horseshoe Pit mixture model for Bayesian screening with an application to light sheet fluorescence microscopy in brain imaging
Subjects: Methodology (stat.ME)

Finding parsimonious models through variable selection is a fundamental problem in many areas of statistical inference. Here, we focus on Bayesian regression models, where variable selection can be implemented through a regularizing prior imposed on the distribution of the regression coefficients. In the Bayesian literature, there are two main types of priors used to accomplish this goal: the spike-and-slab and the continuous scale mixtures of Gaussians. The former is a discrete mixture of two distributions characterized by low and high variance. In the latter, a continuous prior is elicited on the scale of a zero-mean Gaussian distribution. In contrast to these existing methods, we propose a new class of priors based on discrete mixture of continuous scale mixtures providing a more general framework for Bayesian variable selection. To this end, we substitute the observation-specific local shrinkage parameters (typical of continuous mixtures) with mixture component shrinkage parameters. Our approach drastically reduces the number of parameters needed and allows sharing information across the coefficients, improving the shrinkage effect. By using half-Cauchy distributions, this approach leads to a cluster-shrinkage version of the Horseshoe prior. We present the properties of our model and showcase its estimation and prediction performance in a simulation study. We then recast the model in a multiple hypothesis testing framework and apply it to a neurological dataset obtained using a novel whole-brain imaging technique.

[18]  arXiv:2106.08305 [pdf, ps, other]
Title: Markov Equivalence of Max-Linear Bayesian Networks
Comments: 19 pages, 5 figures, accepted for the 37th conference on Uncertainty in Artificial Intelligence (UAI 2021)
Subjects: Statistics Theory (math.ST); Algebraic Geometry (math.AG); Combinatorics (math.CO)

Max-linear Bayesian networks have emerged as highly applicable models for causal inference via extreme value data. However, conditional independence (CI) for max-linear Bayesian networks behaves differently than for classical Gaussian Bayesian networks. We establish the parallel between the two theories via tropicalization, and establish the surprising result that the Markov equivalence classes for max-linear Bayesian networks coincide with the ones obtained by regular CI. Our paper opens up many problems at the intersection of extreme value statistics, causal inference and tropical geometry.

[19]  arXiv:2106.08320 [pdf, other]
Title: Self-Supervised Learning with Kernel Dependence Maximization
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC). SSL-HSIC maximizes dependence between representations of transformed versions of an image and the image identity, while minimizing the kernelized variance of those features. This self-supervised learning framework yields a new understanding of InfoNCE, a variational lower bound on the mutual information (MI) between different transformations. While the MI itself is known to have pathologies which can result in meaningless representations being learned, its bound is much better behaved: we show that it implicitly approximates SSL-HSIC (with a slightly different regularizer). Our approach also gives us insight into BYOL, since SSL-HSIC similarly learns local neighborhoods of samples. SSL-HSIC allows us to directly optimize statistical dependence in time linear in the batch size, without restrictive data assumptions or indirect mutual information estimators. Trained with or without a target network, SSL-HSIC matches the current state-of-the-art for standard linear evaluation on ImageNet, semi-supervised learning and transfer to other classification and vision tasks such as semantic segmentation, depth estimation and object recognition.

Cross-lists for Wed, 16 Jun 21

[20]  arXiv:2106.07644 (cross-list from math.OC) [pdf, other]
Title: A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip
Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Probability (math.PR); Machine Learning (stat.ML)

We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.

[21]  arXiv:2106.07682 (cross-list from cs.LG) [pdf, other]
Title: Revisiting Model Stitching to Compare Neural Representations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$, we consider a "stitched model'' formed by connecting the bottom-layers of $A$ to the top-layers of $B$, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations'', by demonstrating that good networks of the same architecture, but trained in very different ways (e.g.: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also give evidence for the intuition that "more is better'' by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be "plugged in'' to weaker models to improve performance. Finally, our experiments reveal a new structural property of SGD which we call "stitching connectivity'', akin to mode-connectivity: typical minima reached by SGD can all be stitched to each other with minimal change in accuracy.

[22]  arXiv:2106.07724 (cross-list from cs.LG) [pdf, other]
Title: An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$ is the minimum distance between the points. In this work, we improve the dependence on $\delta$ from exponential to almost linear, proving that $\widetilde{\mathcal{O}}(\frac{1}{\delta}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(\frac{d}{\delta}+n)$ weights are sufficient. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.

[23]  arXiv:2106.07754 (cross-list from cs.AI) [pdf, other]
Title: Counterfactual Explanations as Interventions in Latent Space
Comments: 34 pages, 4 figures, 4 tables
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)

Explainable Artificial Intelligence (XAI) is a set of techniques that allows the understanding of both technical and non-technical aspects of Artificial Intelligence (AI) systems. XAI is crucial to help satisfying the increasingly important demand of \emph{trustworthy} Artificial Intelligence, characterized by fundamental characteristics such as respect of human autonomy, prevention of harm, transparency, accountability, etc. Within XAI techniques, counterfactual explanations aim to provide to end users a set of features (and their corresponding values) that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations, and in particular they fall short of considering the causal impact of such actions. In this paper, we present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations capturing by design the underlying causal relations from the data, and at the same time to provide feasible recommendations to reach the proposed profile. Moreover, our methodology has the advantage that it can be set on top of existing counterfactuals generator algorithms, thus minimising the complexity of imposing additional causal constrains. We demonstrate the effectiveness of our approach with a set of different experiments using synthetic and real datasets (including a proprietary dataset of the financial domain).

[24]  arXiv:2106.07767 (cross-list from cs.LG) [pdf, other]
Title: Improving Robustness of Graph Neural Networks with Heterophily-Inspired Designs
Comments: preprint with appendix; 30 pages, 1 figure
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent studies have exposed that many graph neural networks (GNNs) are sensitive to adversarial attacks, and can suffer from performance loss if the graph structure is intentionally perturbed. A different line of research has shown that many GNN architectures implicitly assume that the underlying graph displays homophily, i.e., connected nodes are more likely to have similar features and class labels, and perform poorly if this assumption is not fulfilled. In this work, we formalize the relation between these two seemingly different issues. We theoretically show that in the standard scenario in which node features exhibit homophily, impactful structural attacks always lead to increased levels of heterophily. Then, inspired by GNN architectures that target heterophily, we present two designs -- (i) separate aggregators for ego- and neighbor-embeddings, and (ii) a reduced scope of aggregation -- that can significantly improve the robustness of GNNs. Our extensive empirical evaluations show that GNNs featuring merely these two designs can achieve significantly improved robustness compared to the best-performing unvaccinated model with 24.99% gain in average performance under targeted attacks, while having smaller computational overhead than existing defense mechanisms. Furthermore, these designs can be readily combined with explicit defense mechanisms to yield state-of-the-art robustness with up to 18.33% increase in performance under attacks compared to the best-performing vaccinated model.

[25]  arXiv:2106.07769 (cross-list from cs.LG) [pdf, other]
Title: The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization
Comments: 19 pages, 2 figures. Submitted to NeurIPS 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Among the most successful methods for sparsifying deep (neural) networks are those that adaptively mask the network weights throughout training. By examining this masking, or dropout, in the linear case, we uncover a duality between such adaptive methods and regularization through the so-called "$\eta$-trick" that casts both as iteratively reweighted optimizations. We show that any dropout strategy that adapts to the weights in a monotonic way corresponds to an effective subquadratic regularization penalty, and therefore leads to sparse solutions. We obtain the effective penalties for several popular sparsification strategies, which are remarkably similar to classical penalties commonly used in sparse optimization. Considering variational dropout as a case study, we demonstrate similar empirical behavior between the adaptive dropout method and classical methods on the task of deep network sparsification, validating our theory.

[26]  arXiv:2106.07779 (cross-list from cs.LG) [pdf, ps, other]
Title: Boosting in the Presence of Massart Noise
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of boosting the accuracy of a weak learner in the (distribution-independent) PAC model with Massart noise. In the Massart noise model, the label of each example $x$ is independently misclassified with probability $\eta(x) \leq \eta$, where $\eta<1/2$. The Massart model lies between the random classification noise model and the agnostic model. Our main positive result is the first computationally efficient boosting algorithm in the presence of Massart noise that achieves misclassification error arbitrarily close to $\eta$. Prior to our work, no non-trivial booster was known in this setting. Moreover, we show that this error upper bound is best possible for polynomial-time black-box boosters, under standard cryptographic assumptions. Our upper and lower bounds characterize the complexity of boosting in the distribution-independent PAC model with Massart noise. As a simple application of our positive result, we give the first efficient Massart learner for unions of high-dimensional rectangles.

[27]  arXiv:2106.07804 (cross-list from cs.LG) [pdf, other]
Title: Controlling Neural Networks with Rule Representations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a novel training method to integrate rules into deep learning, in a way their strengths are controllable at inference. Deep Neural Networks with Controllable Rule Representations (DeepCTRL) incorporates a rule encoder into the model coupled with a rule-based objective, enabling a shared representation for decision making. DeepCTRL is agnostic to data type and model architecture. It can be applied to any kind of rule defined for inputs and outputs. The key aspect of DeepCTRL is that it does not require retraining to adapt the rule strength -- at inference, the user can adjust it based on the desired operation point on accuracy vs. rule verification ratio. In real-world domains where incorporating rules is critical -- such as Physics, Retail and Healthcare -- we show the effectiveness of DeepCTRL in teaching rules for deep learning. DeepCTRL improves the trust and reliability of the trained models by significantly increasing their rule verification ratio, while also providing accuracy gains at downstream tasks. Additionally, DeepCTRL enables novel use cases such as hypothesis testing of the rules on data samples, and unsupervised adaptation based on shared rules between datasets.

[28]  arXiv:2106.07814 (cross-list from cs.LG) [pdf, other]
Title: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
Comments: ICML 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we show that many classic Atari games satisfy this condition. We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.

[29]  arXiv:2106.07830 (cross-list from cs.LG) [pdf, other]
Title: On the Convergence of Deep Learning with Differential Privacy
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In deep learning with differential privacy (DP), the neural network achieves the privacy usually at the cost of slower convergence (and thus lower performance) than its non-private counterpart. This work gives the first convergence analysis of the DP deep learning, through the lens of training dynamics and the neural tangent kernel (NTK). Our convergence theory successfully characterizes the effects of two key components in the DP training: the per-sample clipping (flat or layerwise) and the noise addition. Our analysis not only initiates a general principled framework to understand the DP deep learning with any network architecture and loss function, but also motivates a new clipping method -- the global clipping, that significantly improves the convergence while preserving the same privacy guarantee as the existing local clipping.
In terms of theoretical results, we establish the precise connection between the per-sample clipping and NTK matrix. We show that in the gradient flow, i.e., with infinitesimal learning rate, the noise level of DP optimizers does not affect the convergence. We prove that DP gradient descent (GD) with global clipping guarantees the monotone convergence to zero loss, which can be violated by the existing DP-GD with local clipping. Notably, our analysis framework easily extends to other optimizers, e.g., DP-Adam. Empirically speaking, DP optimizers equipped with global clipping perform strongly on a wide range of classification and regression tasks. In particular, our global clipping is surprisingly effective at learning calibrated classifiers, in contrast to the existing DP classifiers which are oftentimes over-confident and unreliable. Implementation-wise, the new clipping can be realized by adding one line of code into the Opacus library.

[30]  arXiv:2106.07832 (cross-list from cs.LG) [pdf, other]
Title: Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We focus on the problem of efficient sampling and learning of probability densities by incorporating symmetries in probabilistic models. We first introduce Equivariant Stein Variational Gradient Descent algorithm -- an equivariant sampling method based on Stein's identity for sampling from densities with symmetries. Equivariant SVGD explicitly incorporates symmetry information in a density through equivariant kernels which makes the resultant sampler efficient both in terms of sample complexity and the quality of generated samples. Subsequently, we define equivariant energy based models to model invariant densities that are learned using contrastive divergence. By utilizing our equivariant SVGD for training equivariant EBMs, we propose new ways of improving and scaling up training of energy based models. We apply these equivariant energy models for modelling joint densities in regression and classification tasks for image datasets, many-body particle systems and molecular structure generation.

[31]  arXiv:2106.07836 (cross-list from cs.LG) [pdf, other]
Title: Improved Regret Bounds for Online Submodular Maximization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

In this paper, we consider an online optimization problem over $T$ rounds where at each step $t\in[T]$, the algorithm chooses an action $x_t$ from the fixed convex and compact domain set $\mathcal{K}$. A utility function $f_t(\cdot)$ is then revealed and the algorithm receives the payoff $f_t(x_t)$. This problem has been previously studied under the assumption that the utilities are adversarially chosen monotone DR-submodular functions and $\mathcal{O}(\sqrt{T})$ regret bounds have been derived. We first characterize the class of strongly DR-submodular functions and then, we derive regret bounds for the following new online settings: $(1)$ $\{f_t\}_{t=1}^T$ are monotone strongly DR-submodular and chosen adversarially, $(2)$ $\{f_t\}_{t=1}^T$ are monotone submodular (while the average $\frac{1}{T}\sum_{t=1}^T f_t$ is strongly DR-submodular) and chosen by an adversary but they arrive in a uniformly random order, $(3)$ $\{f_t\}_{t=1}^T$ are drawn i.i.d. from some unknown distribution $f_t\sim \mathcal{D}$ where the expected function $f(\cdot)=\mathbb{E}_{f_t\sim\mathcal{D}}[f_t(\cdot)]$ is monotone DR-submodular. For $(1)$, we obtain the first logarithmic regret bounds. In terms of the second framework, we show that it is possible to obtain similar logarithmic bounds with high probability. Finally, for the i.i.d. model, we provide algorithms with $\tilde{\mathcal{O}}(\sqrt{T})$ stochastic regret bound, both in expectation and with high probability. Experimental results demonstrate that our algorithms outperform the previous techniques in the aforementioned three settings.

[32]  arXiv:2106.07841 (cross-list from cs.LG) [pdf, other]
Title: Randomized Exploration for Reinforcement Learning with General Value Function Approximation
Comments: 32 page, 5 figures, in Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class $\mathcal{F}$, our algorithm achieves a worst-case regret bound of $\widetilde{O}(\mathrm{poly}(d_EH)\sqrt{T})$ where $T$ is the time elapsed, $H$ is the planning horizon and $d_E$ is the $\textit{eluder dimension}$ of $\mathcal{F}$. In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an $\widetilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret. We complement the theory with an empirical evaluation across known difficult exploration tasks.

[33]  arXiv:2106.07847 (cross-list from cs.LG) [pdf, other]
Title: Learning Stable Classifiers by Transferring Unstable Features
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We study transfer learning in the presence of spurious correlations. We experimentally demonstrate that directly transferring the stable feature extractor learned on the source task may not eliminate these biases for the target task. However, we hypothesize that the unstable features in the source task and those in the target task are directly related. By explicitly informing the target classifier of the source task's unstable features, we can regularize the biases in the target task. Specifically, we derive a representation that encodes the unstable features by contrasting different data environments in the source task. On the target task, we cluster data from this representation, and achieve robustness by minimizing the worst-case risk across all clusters. We evaluate our method on both text and image classifications. Empirical results demonstrate that our algorithm is able to maintain robustness on the target task, outperforming the best baseline by 22.9% in absolute accuracy across 12 transfer settings. Our code is available at https://github.com/YujiaBao/Tofu.

[34]  arXiv:2106.07908 (cross-list from cs.LG) [pdf, ps, other]
Title: Machine learning-based conditional mean filter: a generalization of the ensemble Kalman filter for nonlinear data assimilation
Authors: Truong-Vinh Hoang (1), Sebastian Krumscheid (1), Hermann G. Matthies (2), Raúl Tempone (1 and 3) ((1) Chair of Mathematics for Uncertainty Quantification, RWTH Aachen University, (2) Technische Universität Braunschweig (3) Computer, Electrical and Mathematical Sciences and Engineering, KAUST, and Alexander von Humboldt professor in Mathematics of Uncertainty Quantification, RWTH Aachen University)
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO); Machine Learning (stat.ML)

Filtering is a data assimilation technique that performs the sequential inference of dynamical systems states from noisy observations. Herein, we propose a machine learning-based ensemble conditional mean filter (ML-EnCMF) for tracking possibly high-dimensional non-Gaussian state models with nonlinear dynamics based on sparse observations. The proposed filtering method is developed based on the conditional expectation and numerically implemented using machine learning (ML) techniques combined with the ensemble method. The contribution of this work is twofold. First, we demonstrate that the ensembles assimilated using the ensemble conditional mean filter (EnCMF) provide an unbiased estimator of the Bayesian posterior mean, and their variance matches the expected conditional variance. Second, we implement the EnCMF using artificial neural networks, which have a significant advantage in representing nonlinear functions over high-dimensional domains such as the conditional mean. Finally, we demonstrate the effectiveness of the ML-EnCMF for tracking the states of Lorenz-63 and Lorenz-96 systems under the chaotic regime. Numerical results show that the ML-EnCMF outperforms the ensemble Kalman filter.

[35]  arXiv:2106.07909 (cross-list from cs.SI) [pdf, other]
Title: Evaluating the Effect of the Financial Status to the Mobility Customs
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Applications (stat.AP)

In this article, we explore the relationship between cellular phone data and housing prices in Budapest, Hungary. We determine mobility indicators from one months of Call Detail Records (CDR) data, while the property price data are used to characterize the socioeconomic status at the Capital of Hungary. First, we validated the proposed methodology by comparing the Home and Work locations estimation and the commuting patterns derived from the cellular network dataset with reports of the national mini census. We investigated the statistical relationships between mobile phone indicators, such as Radius of Gyration, the distance between Home and Work locations or the Entropy of visited cells, and measures of economic status based on housing prices. Our findings show that the mobility correlates significantly with the socioeconomic status. We performed Principal Component Analysis (PCA) on combined vectors of mobility indicators in order to characterize the dependence of mobility habits on socioeconomic status. The results of the PCA investigation showed remarkable correlation of housing prices and mobility customs.

[36]  arXiv:2106.07911 (cross-list from math.OC) [pdf, other]
Title: Non-asymptotic convergence bounds for Wasserstein approximation using point clouds
Authors: Quentin Merigot (LMO, IUF), Filippo Santambrogio (ICJ, IUF), Clément Sarrazin (LMO)
Subjects: Optimization and Control (math.OC); Machine Learning (stat.ML)

Several issues in machine learning and inverse problems require to generate discrete data, as if sampled from a model probability distribution. A common way to do so relies on the construction of a uniform probability distribution over a set of $N$ points which minimizes the Wasserstein distance to the model distribution. This minimization problem, where the unknowns are the positions of the atoms, is non-convex. Yet, in most cases, a suitably adjusted version of Lloyd's algorithm -- in which Voronoi cells are replaced by Power cells -- leads to configurations with small Wasserstein error. This is surprising because, again, of the non-convex nature of the problem, as well as the existence of spurious critical points. We provide explicit upper bounds for the convergence speed of this Lloyd-type algorithm, starting from a cloud of points sufficiently far from each other. This already works after one step of the iteration procedure, and similar bounds can be deduced, for the corresponding gradient descent. These bounds naturally lead to a modified Poliak-Lojasiewicz inequality for the Wasserstein distance cost, with an error term depending on the distances between Dirac masses in the discrete distribution.

[37]  arXiv:2106.07914 (cross-list from cs.LG) [pdf, other]
Title: Control Variates for Slate Off-Policy Evaluation
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions, often termed slates. The problem is common to recommender systems and user-interface optimization, and it is particularly challenging because of the combinatorially-sized action space. Swaminathan et al. (2017) have proposed the pseudoinverse (PI) estimator under the assumption that the conditional mean rewards are additive in actions. Using control variates, we consider a large class of unbiased estimators that includes as specific cases the PI estimator and (asymptotically) its self-normalized variant. By optimizing over this class, we obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators. Experiments with real-world recommender data as well as synthetic data validate these improvements in practice.

[38]  arXiv:2106.07992 (cross-list from cs.LG) [pdf, other]
Title: Time Series Anomaly Detection for Cyber-physical Systems via Neural System Identification and Bayesian Filtering
Comments: Accepted to appear in KDD 2021
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

Recent advances in AIoT technologies have led to an increasing popularity of utilizing machine learning algorithms to detect operational failures for cyber-physical systems (CPS). In its basic form, an anomaly detection module monitors the sensor measurements and actuator states from the physical plant, and detects anomalies in these measurements to identify abnormal operation status. Nevertheless, building effective anomaly detection models for CPS is rather challenging as the model has to accurately detect anomalies in presence of highly complicated system dynamics and unknown amount of sensor noise. In this work, we propose a novel time series anomaly detection method called Neural System Identification and Bayesian Filtering (NSIBF) in which a specially crafted neural network architecture is posed for system identification, i.e., capturing the dynamics of CPS in a dynamical state-space model; then a Bayesian filtering algorithm is naturally applied on top of the "identified" state-space model for robust anomaly detection by tracking the uncertainty of the hidden state of the system recursively over time. We provide qualitative as well as quantitative experiments with the proposed method on a synthetic and three real-world CPS datasets, showing that NSIBF compares favorably to the state-of-the-art methods with considerable improvements on anomaly detection in CPS.

[39]  arXiv:2106.08027 (cross-list from cs.LG) [pdf, other]
Title: Multivariate Business Process Representation Learning utilizing Gramian Angular Fields and Convolutional Neural Networks
Comments: Accepted at the Business Process Management Conference 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Learning meaningful representations of data is an important aspect of machine learning and has recently been successfully applied to many domains like language understanding or computer vision. Instead of training a model for one specific task, representation learning is about training a model to capture all useful information in the underlying data and make it accessible for a predictor. For predictive process analytics, it is essential to have all explanatory characteristics of a process instance available when making predictions about the future, as well as for clustering and anomaly detection. Due to the large variety of perspectives and types within business process data, generating a good representation is a challenging task. In this paper, we propose a novel approach for representation learning of business process instances which can process and combine most perspectives in an event log. In conjunction with a self-supervised pre-training method, we show the capabilities of the approach through a visualization of the representation space and case retrieval. Furthermore, the pre-trained model is fine-tuned to multiple process prediction tasks and demonstrates its effectiveness in comparison with existing approaches.

[40]  arXiv:2106.08048 (cross-list from q-bio.PE) [pdf, other]
Title: Epidemic modelling of multiple virus strains:a case study of SARS-CoV-2 B.1.1.7 in Moscow
Subjects: Populations and Evolution (q-bio.PE); Machine Learning (cs.LG); Applications (stat.AP)

During a long-running pandemic a pathogen can mutate, producing new strains with different epidemiological parameters. Existing approaches to epidemic modelling only consider one virus strain. We have developed a modified SEIR model to simulate multiple virus strains within the same population. As a case study, we investigate the potential effects of SARS-CoV-2 strain B.1.1.7 on the city of Moscow. Our analysis indicates a high risk of a new wave of infections in September-October 2021 with up to 35 000 daily infections at peak. We open-source our code and data.

[41]  arXiv:2106.08056 (cross-list from cs.LG) [pdf, other]
Title: Coupled Gradient Estimators for Discrete Latent Variables
Comments: Under Review
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators. While low-variance reparameterization gradients of a continuous relaxation can provide an effective solution, a continuous relaxation is not always available or tractable. Dong et al. (2020) and Yin et al. (2020) introduced a performant estimator that does not rely on continuous relaxations; however, it is limited to binary random variables. We introduce a novel derivation of their estimator based on importance sampling and statistical couplings, which we extend to the categorical setting. Motivated by the construction of a stick-breaking coupling, we introduce gradient estimators based on reparameterizing categorical variables as sequences of binary variables and Rao-Blackwellization. In systematic experiments, we show that our proposed categorical gradient estimators provide state-of-the-art performance, whereas even with additional Rao-Blackwellization, previous estimators (Yin et al., 2019) underperform a simpler REINFORCE with a leave-one-out-baseline estimator (Kool et al., 2019).

[42]  arXiv:2106.08068 (cross-list from cs.LG) [pdf, other]
Title: An Analytical Theory of Curriculum Learning in Teacher-Student Networks
Comments: 10 pages + appendix
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)

In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help?
In this work, we analyse a prototypical neural network model of curriculum learning in the high-dimensional limit, employing statistical physics methods. Curricula could in principle change both the learning speed and asymptotic performance of a model. To study the former, we provide an exact description of the online learning setting, confirming the long-standing experimental observation that curricula can modestly speed up learning. To study the latter, we derive performance in a batch learning setting, in which a network trains to convergence in successive phases of learning on dataset slices of varying difficulty. With standard training losses, curriculum does not provide generalisation benefit, in line with empirical observations. However, we show that by connecting different learning phases through simple Gaussian priors, curriculum can yield a large improvement in test performance. Taken together, our reduced analytical descriptions help reconcile apparently conflicting empirical results and trace regimes where curriculum learning yields the largest gains. More broadly, our results suggest that fully exploiting a curriculum may require explicit changes to the loss function at curriculum boundaries.

[43]  arXiv:2106.08077 (cross-list from cs.CV) [pdf, other]
Title: Computer-aided Interpretable Features for Leaf Image Classification
Comments: 31 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)

Plant species identification is time consuming, costly, and requires lots of efforts, and expertise knowledge. In recent, many researchers use deep learning methods to classify plants directly using plant images. While deep learning models have achieved a great success, the lack of interpretability limit their widespread application. To overcome this, we explore the use of interpretable, measurable and computer-aided features extracted from plant leaf images. Image processing is one of the most challenging, and crucial steps in feature-extraction. The purpose of image processing is to improve the leaf image by removing undesired distortion. The main image processing steps of our algorithm involves: i) Convert original image to RGB (Red-Green-Blue) image, ii) Gray scaling, iii) Gaussian smoothing, iv) Binary thresholding, v) Remove stalk, vi) Closing holes, and vii) Resize image. The next step after image processing is to extract features from plant leaf images. We introduced 52 computationally efficient features to classify plant species. These features are mainly classified into four groups as: i) shape-based features, ii) color-based features, iii) texture-based features, and iv) scagnostic features. Length, width, area, texture correlation, monotonicity and scagnostics are to name few of them. We explore the ability of features to discriminate the classes of interest under supervised learning and unsupervised learning settings. For that, supervised dimensionality reduction technique, Linear Discriminant Analysis (LDA), and unsupervised dimensionality reduction technique, Principal Component Analysis (PCA) are used to convert and visualize the images from digital-image space to feature space. The results show that the features are sufficient to discriminate the classes of interest under both supervised and unsupervised learning settings.

[44]  arXiv:2106.08171 (cross-list from cs.LG) [pdf, other]
Title: Evaluating Modules in Graph Contrastive Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The recent emergence of contrastive learning approaches facilitates the research on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar sample pairs to encode the semantics into node or graph embeddings. However, most existing works only performed model-level evaluation, and did not explore the combination space of modules for more comprehensive and systematic studies. For effective module-level evaluation, we propose a framework that decomposes GCL models into four modules: (1) a sampler to generate anchor, positive and negative data samples (nodes or graphs); (2) an encoder and a readout function to get sample embeddings; (3) a discriminator to score each sample pair (anchor-positive and anchor-negative); and (4) an estimator to define the loss function. Based on this framework, we conduct controlled experiments over a wide range of architectural designs and hyperparameter settings on node and graph classification tasks. Specifically, we manage to quantify the impact of a single module, investigate the interaction between modules, and compare the overall performance with current model architectures. Our key findings include a set of module-level guidelines for GCL, e.g., simple samplers from LINE and DeepWalk are strong and robust; an MLP encoder associated with Sum readout could achieve competitive performance on graph classification. Finally, we release our implementations and results as OpenGCL, a modularized toolkit that allows convenient reproduction, standard model and module evaluation, and easy extension.

[45]  arXiv:2106.08285 (cross-list from cs.CV) [pdf, other]
Title: Multi-StyleGAN: Towards Image-Based Simulation of Time-Lapse Live-Cell Microscopy
Comments: accepted to MICCAI 2021. (Tim Prangemeier and Christoph Reich --- both authors contributed equally)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

Time-lapse fluorescent microscopy (TLFM) combined with predictive mathematical modelling is a powerful tool to study the inherently dynamic processes of life on the single-cell level. Such experiments are costly, complex and labour intensive. A complimentary approach and a step towards completely in silico experiments, is to synthesise the imagery itself. Here, we propose Multi-StyleGAN as a descriptive approach to simulate time-lapse fluorescence microscopy imagery of living cells, based on a past experiment. This novel generative adversarial network synthesises a multi-domain sequence of consecutive timesteps. We showcase Multi-StyleGAN on imagery of multiple live yeast cells in microstructured environments and train on a dataset recorded in our laboratory. The simulation captures underlying biophysical factors and time dependencies, such as cell morphology, growth, physical interactions, as well as the intensity of a fluorescent reporter protein. An immediate application is to generate additional training and validation data for feature extraction algorithms or to aid and expedite development of advanced experimental techniques such as online monitoring or control of cells.
Code and dataset is available at https://git.rwth-aachen.de/bcs/projects/tp/multi-stylegan.

[46]  arXiv:2106.08297 (cross-list from math.PR) [pdf, ps, other]
Title: Diagonal sections of copulas, multivariate conditional hazard rates and distributions of order statistics for minimally stable lifetimes
Subjects: Probability (math.PR); Statistics Theory (math.ST); Methodology (stat.ME)

As a motivating problem, we aim to study some special aspects of the marginal distributions of the order statistics for exchangeable and (more generally) for minimally stable non-negative random variables $T_{1},...,T_{r}$. In any case, we assume that $T_{1},...,T_{r}$ are identically distributed, with a common survival function $\overline{G}$ and their survival copula is denoted by $K$. The diagonal's and subdiagonals' sections of $K$, along with $\overline{G}$, are possible tools to describe the information needed to recover the laws of order statistics.
When attention is restricted to the absolutely continuous case, such a joint distribution can be described in terms of the associated multivariate conditional hazard rate (m.c.h.r.) functions. We then study the distributions of the order statistics of $T_{1},...,T_{r}$ also in terms of the system of the m.c.h.r. functions. We compare and, in a sense, we combine the two different approaches in order to obtain different detailed formulas and to analyze some probabilistic aspects for the distributions of interest. This study also leads us to compare the two cases of exchangeable and minimally stable variables both in terms of copulas and of m.c.h.r. functions. The paper concludes with the analysis of two remarkable special cases of stochastic dependence, namely Archimedean copulas and load sharing models. This analysis will allow us to provide some illustrative examples, and some discussion about peculiar aspects of our results.

Replacements for Wed, 16 Jun 21

[47]  arXiv:1901.10002 (replaced) [pdf, other]
Title: A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle
Comments: 11 pages plus references; updated with corrections to text and figures, new examples, and a more thorough walkthrough of ML
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[48]  arXiv:1902.09653 (replaced) [pdf, other]
Title: Estimating Atmospheric Motion Winds from Satellite Image Data using Space-time Drift Models
Subjects: Applications (stat.AP); Computation (stat.CO); Methodology (stat.ME)
[49]  arXiv:1903.04556 (replaced) [pdf, other]
Title: Embarrassingly parallel MCMC using deep invertible transformations
Comments: Accepted to UAI 2019
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[50]  arXiv:1903.05631 (replaced) [pdf, other]
Title: ST-UNet: A Spatio-Temporal U-Network for Graph-structured Time Series Modeling
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[51]  arXiv:1907.05689 (replaced) [pdf, other]
Title: Gittins' theorem under uncertainty
Subjects: Optimization and Control (math.OC); Probability (math.PR); Statistics Theory (math.ST); Computational Finance (q-fin.CP)
[52]  arXiv:1909.00453 (replaced) [pdf, other]
Title: Topics to Avoid: Demoting Latent Confounds in Text Classification
Comments: 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[53]  arXiv:1910.10897 (replaced) [pdf, other]
Title: Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning
Comments: This is an update version of a manuscript that originally appeared at CoRL 2019. Videos are here: meta-world.github.io, open-sourced code are available at: this https URL, and the baselines can be found at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
[54]  arXiv:1910.12016 (replaced) [pdf, other]
Title: Tensor Q-Rank: New Data Dependent Definition of Tensor Rank
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[55]  arXiv:1910.14215 (replaced) [pdf, other]
Title: Multivariate Uncertainty in Deep Learning
Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO); Machine Learning (stat.ML)
[56]  arXiv:1912.08421 (replaced) [pdf, other]
Title: Learning to Prevent Leakage: Privacy-Preserving Inference in the Mobile Cloud
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[57]  arXiv:2001.10119 (replaced) [pdf, other]
Title: Unsupervised Program Synthesis for Images By Sampling Without Replacement
Comments: Accepted to UAI 2021
Journal-ref: UAI 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[58]  arXiv:2002.03206 (replaced) [pdf, other]
Title: Characterizing Structural Regularities of Labeled Data in Overparameterized Models
Comments: 17 pages, 20 figures, ICML 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[59]  arXiv:2002.11743 (replaced) [pdf, other]
Title: Composing Normalizing Flows for Inverse Problems
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
[60]  arXiv:2004.11231 (replaced) [pdf, other]
Title: Federated Stochastic Gradient Langevin Dynamics
Comments: Accepted to UAI 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[61]  arXiv:2004.11468 (replaced) [pdf, other]
Title: How to find a unicorn: a novel model-free, unsupervised anomaly detection method for time series
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
[62]  arXiv:2004.14180 (replaced) [pdf, other]
Title: Quantized Adam with Error Feedback
Comments: Accepted to ACM Transactions on Intelligent Systems and Technology
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
[63]  arXiv:2005.08898 (replaced) [pdf, ps, other]
Title: Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent
Comments: Accepted to Journal of Machine Learning Research
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
[64]  arXiv:2006.01017 (replaced) [pdf, ps, other]
Title: Improved SVRG for quadratic functions
Authors: Nabil Kahale
Comments: 14 pages
Subjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
[65]  arXiv:2006.07002 (replaced) [pdf, ps, other]
Title: Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[66]  arXiv:2006.10246 (replaced) [pdf, other]
Title: The Recurrent Neural Tangent Kernel
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[67]  arXiv:2006.14512 (replaced) [pdf, other]
Title: Uncovering the Connections Between Adversarial Transferability and Knowledge Transferability
Comments: Accepted to ICML 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[68]  arXiv:2007.00674 (replaced) [pdf, other]
Title: Sliced Iterative Normalizing Flows
Comments: 19 pages, 12 figures, 7 tables. Code available at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[69]  arXiv:2007.04441 (replaced) [pdf, other]
Title: Sparse Regression for Extreme Values
Comments: 4 figures
Subjects: Methodology (stat.ME)
[70]  arXiv:2007.05426 (replaced) [pdf, other]
Title: Variational Inference with Continuously-Indexed Normalizing Flows
Comments: Accepted for publication at UAI 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[71]  arXiv:2007.10306 (replaced) [pdf, other]
Title: An Empirical Characterization of Fair Machine Learning For Clinical Risk Prediction
Comments: Published in the Journal of Biomedical Informatics (this https URL). Version 3 updates acknowledgements and fixes typos
Journal-ref: Journal of Biomedical Informatics, Volume 113, January 2021, 103621
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)
[72]  arXiv:2007.10725 (replaced) [pdf, other]
Title: Majorisation as a theory for uncertainty
Subjects: Statistics Theory (math.ST)
[73]  arXiv:2007.15588 (replaced) [pdf, other]
Title: Data-efficient Hindsight Off-policy Option Learning
Comments: Published at ICML2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
[74]  arXiv:2008.07428 (replaced) [pdf, other]
Title: Fast decentralized non-convex finite-sum optimization with recursive variance reduction
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Machine Learning (stat.ML)
[75]  arXiv:2009.04651 (replaced) [pdf, other]
Title: Universal consistency of Wasserstein $k$-NN classifier
Comments: 22 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[76]  arXiv:2009.04832 (replaced) [pdf, other]
Title: A note on post-treatment selection in studying racial discrimination in policing
Comments: Accepted for publication in the American Political Science Review on 14th June, 2021
Subjects: Applications (stat.AP); Methodology (stat.ME)
[77]  arXiv:2009.07101 (replaced) [pdf, ps, other]
Title: Approximate spectral clustering using both reference vectors and topology of the network generated by growing neural gas
Authors: Kazuhisa Fujita
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[78]  arXiv:2009.08372 (replaced) [pdf, other]
Title: A Principle of Least Action for the Training of Neural Networks
Comments: ECML PKDD 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[79]  arXiv:2009.09931 (replaced) [pdf, other]
Title: Field-Embedded Factorization Machines for Click-through rate prediction
Authors: Harshit Pande
Comments: 13 pages
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[80]  arXiv:2009.13566 (replaced) [pdf, other]
Title: Graph Neural Networks with Heterophily
Comments: Proceedings version of AAAI 2021 with appendix and additional typo fixes; 12 pages, 4 figures
Journal-ref: Proceedings of the AAAI Conference on Artificial Intelligence. 35, 12 (May 2021), 11168-11176
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[81]  arXiv:2010.00060 (replaced) [pdf, other]
Title: Constructions and Comparisons of Pooling Matrices for Pooled Testing of COVID-19
Subjects: Populations and Evolution (q-bio.PE); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Methodology (stat.ME)
[82]  arXiv:2010.06147 (replaced) [pdf, other]
Title: Treed distributed lag nonlinear models
Comments: 31 pages, 1 table, 4 figures
Subjects: Methodology (stat.ME)
[83]  arXiv:2010.13511 (replaced) [pdf, ps, other]
Title: Efficient Optimization Methods for Extreme Similarity Learning with Nonlinear Embeddings
Comments: Published as a conference paper at KDD 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[84]  arXiv:2010.14860 (replaced) [pdf, other]
Title: The Evidence Lower Bound of Variational Autoencoders Converges to a Sum of Three Entropies
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[85]  arXiv:2010.15727 (replaced) [pdf, other]
Title: Amortized Probabilistic Detection of Communities in Graphs
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[86]  arXiv:2011.03639 (replaced) [pdf, other]
Title: Graph cuts always find a global optimum for Potts models (with a catch)
Comments: Published at ICML 2021. 18 pages, 2 figures
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[87]  arXiv:2011.06931 (replaced) [pdf, other]
Title: The Safe Logrank Test: Error Control under Continuous Monitoring with Unlimited Horizon
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[88]  arXiv:2012.02409 (replaced) [pdf, other]
Title: When does gradient descent with logistic loss find interpolating two-layer networks?
Comments: 44 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[89]  arXiv:2012.07941 (replaced) [pdf, other]
Title: Variable Selection with Second-Generation P-Values
Subjects: Methodology (stat.ME)
[90]  arXiv:2101.04408 (replaced) [pdf, other]
Title: Statistical analysis of periodic data in neuroscience
Authors: Daniel H. Baker
Comments: 18 pages, 11 figures
Subjects: Methodology (stat.ME); Neurons and Cognition (q-bio.NC)
[91]  arXiv:2102.07367 (replaced) [pdf, other]
Title: A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum
Comments: 36 Pages, 10 Figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[92]  arXiv:2102.09030 (replaced) [pdf, other]
Title: Bringing Differential Private SGD to Practice: On the Independence of Gaussian Noise and the Number of Training Rounds
Comments: arXiv admin note: text overlap with arXiv:2007.09208
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[93]  arXiv:2102.10473 (replaced) [pdf, other]
Title: Diagnostics for Conditional Density Models and Bayesian Inference Algorithms
Comments: camera-ready version; accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021)
Subjects: Methodology (stat.ME)
[94]  arXiv:2102.10769 (replaced) [pdf, other]
Title: MobILE: Model-Based Imitation Learning From Observation Alone
Comments: 27 pages, 5 figures, 2 tabular columns
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[95]  arXiv:2102.11086 (replaced) [pdf, other]
Title: Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Computation (stat.CO)
[96]  arXiv:2102.11436 (replaced) [pdf, other]
Title: Model-Based Domain Generalization
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[97]  arXiv:2102.12675 (replaced) [pdf]
Title: Computing Accurate Probabilistic Estimates of One-D Entropy from Equiprobable Random Samples
Comments: 23 pages, 12 figures
Subjects: Methodology (stat.ME); Information Theory (cs.IT)
[98]  arXiv:2103.01400 (replaced) [pdf, other]
Title: Smoothness Analysis of Adversarial Training
Comments: 22 pages, 7 figures. In V3, we add the results of EntropySGD for adversarial training
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[99]  arXiv:2104.00995 (replaced) [pdf, other]
Title: Exponential Reduction in Sample Complexity with Learning of Ising Model Dynamics
Comments: Accepted to ICML 2021
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
[100]  arXiv:2104.02095 (replaced) [pdf, ps, other]
Title: Analytic function approximation by path norm regularized deep networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[101]  arXiv:2104.03279 (replaced) [pdf, other]
Title: Modern Hopfield Networks for Few- and Zero-Shot Reaction Template Prediction
Comments: 14 pages + 12 pages appendix
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Machine Learning (stat.ML)
[102]  arXiv:2104.04975 (replaced) [pdf, other]
Title: Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning
Comments: ICML 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[103]  arXiv:2104.05441 (replaced) [pdf, other]
Title: Unsuitability of NOTEARS for Causal Graph Discovery
Comments: 6 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[104]  arXiv:2104.12672 (replaced) [pdf, other]
Title: A Novel Interaction-based Methodology Towards Explainable AI with Better Understanding of Pneumonia Chest X-ray Images
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
[105]  arXiv:2105.02381 (replaced) [pdf, other]
Title: The Effect of Medicaid Expansion on Non-Elderly Adult Uninsurance Rates Among States that did not Expand Medicaid
Subjects: Applications (stat.AP)
[106]  arXiv:2105.04051 (replaced) [pdf, other]
Title: Aggregating From Multiple Target-Shifted Sources
Journal-ref: ICML2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[107]  arXiv:2105.13493 (replaced) [pdf, other]
Title: Efficient and Accurate Gradients for Neural SDEs
Comments: Submitted to NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Machine Learning (stat.ML)
[108]  arXiv:2106.00774 (replaced) [pdf, other]
Title: Optimizing Functionals on the Space of Probabilities with Input Convex Neural Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA)
[109]  arXiv:2106.03640 (replaced) [pdf, other]
Title: Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[110]  arXiv:2106.06044 (replaced) [pdf, other]
Title: Convergence and Alignment of Gradient Descent with Random Back Propagation Weights
Comments: 33 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[111]  arXiv:2106.06885 (replaced) [pdf, other]
Title: Online Learning with Optimism and Delay
Comments: ICML 2021. 9 pages of main paper and 26 pages of appendix text
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[112]  arXiv:2106.06918 (replaced) [pdf, ps, other]
Title: A Phylogenetic Trees Analysis of SARS-CoV-2
Comments: 22 pages, 16 figures
Subjects: Methodology (stat.ME); Populations and Evolution (q-bio.PE)
[ total of 112 entries: 1-112 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2106, contact, help  (Access key information)