Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 19 Oct 21
 [1] arXiv:2110.08418 [pdf, ps, other]

Title: Nuances in Margin Conditions Determine Gains in Active LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We consider nonparametric classification with smooth regression functions, where it is well known that notions of margin in $E[YX]$ determine fast or slow rates in both active and passive learning. Here we elucidate a striking distinction between the two settings. Namely, we show that some seemingly benign nuances in notions of margin  involving the uniqueness of the Bayes classifier, and which have no apparent effect on rates in passive learning  determine whether or not any active learner can outperform passive learning rates. In particular, for AudibertTsybakov's margin condition (allowing general situations with nonunique Bayes classifiers), no active learner can gain over passive learning in commonly studied settings where the marginal on $X$ is near uniform. Our results thus negate the usual intuition from past literature that active rates should improve over passive rates in nonparametric settings.
 [2] arXiv:2110.08449 [pdf, other]

Title: Adversarial Attacks on Gaussian Process BanditsSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Gaussian processes (GP) are a widelyadopted tool used to sequentially optimize blackbox functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. In this paper, we study this problem from the attacker's perspective, proposing various adversarial attack methods with differing assumptions on the attacker's strength and prior information. Our goal is to understand adversarial attacks on GP bandits from both a theoretical and practical perspective. We focus primarily on targeted attacks on the popular GPUCB algorithm and a related eliminationbased algorithm, based on adversarially perturbing the function $f$ to produce another function $\tilde{f}$ whose optima are in some region $\mathcal{R}_{\rm target}$. Based on our theoretical analysis, we devise both whitebox attacks (known $f$) and blackbox attacks (unknown $f$), with the former including a Subtraction attack and Clipping attack, and the latter including an Aggressive subtraction attack. We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards $\mathcal{R}_{\rm target}$ even with a low attack budget, and we compare our attacks' performance and efficiency on several real and synthetic functions.
 [3] arXiv:2110.08500 [pdf, other]

Title: On Model Selection Consistency of Lasso for HighDimensional Ising Models on Treelike GraphsComments: 30 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We consider the problem of highdimensional Ising model selection using neighborhoodbased least absolute shrinkage and selection operator (Lasso). It is rigorously proved that under some mild coherence conditions on the population covariance matrix of the Ising model, consistent model selection can be achieved with sample sizes $n=\Omega{(d^3\log{p})}$ for any treelike graph in the paramagnetic phase, where $p$ is the number of variables and $d$ is the maximum node degree. When the same conditions are imposed directly on the sample covariance matrices, it is shown that a reduced sample size $n=\Omega{(d^2\log{p})}$ suffices. The obtained sufficient conditions for consistent model selection with Lasso are the same in the scaling of the sample complexity as that of $\ell_1$regularized logistic regression. Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.
 [4] arXiv:2110.08505 [pdf, other]

Title: Mode and Ridge Estimation in Euclidean and Directional Product Spaces: A Mean Shift ApproachComments: 51 pages, 10 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
The set of local modes and the ridge lines estimated from a dataset are important summary characteristics of the datagenerating distribution. In this work, we consider estimating the local modes and ridges from point cloud data in a product space with two or more Euclidean/directional metric spaces. Specifically, we generalize the wellknown (subspace constrained) mean shift algorithm to the product space setting and illuminate some pitfalls in such generalization. We derive the algorithmic convergence of the proposed method, provide practical guidelines on the implementation, and demonstrate its effectiveness on both simulated and real datasets.
 [5] arXiv:2110.08676 [pdf, other]

Title: NoiseAugmented PrivacyPreserving Empirical Risk Minimization with Dualpurpose Regularizer and Privacy Budget Retrieval and RecyclingSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
We propose NoiseAugmented PrivacyPreserving Empirical Risk Minimization (NAPPERM) that solves ERM with differential privacy guarantees. Existing privacypreserving ERM approaches may be subject to overregularization with the employment of an l2 term to achieve strong convexity on top of the target regularization. NAPPERM improves over the current approaches and mitigates overregularization by iteratively realizing target regularization through appropriately designed augmented data and delivering strong convexity via a single adaptively weighted dualpurpose l2 regularizer. When the target regularization is for variable selection, we propose a new regularizer that achieves both privacy and sparsity guarantees simultaneously. Finally, we propose a strategy to retrieve privacy budget when the strong convexity requirement is met, which can be returned to users such that the DP of ERM is guaranteed at a lower privacy cost than originally planned, or be recycled to the ERM optimization procedure to reduce the injected DP noise and improve the utility of DPERM. From an implementation perspective, NAPPERM can be achieved by optimizing a nonperturbed object function given noiseaugmented data and can thus leverage existing tools for nonprivate ERM optimization. We illustrate through extensive experiments the mitigation effect of the overregularization and private budget retrieval by NAPPERM on variable selection and prediction.
 [6] arXiv:2110.08884 [pdf, other]

Title: Persuasion by Dimension ReductionComments: arXiv admin note: text overlap with arXiv:2102.10909Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); General Economics (econ.GN); Statistics Theory (math.ST); Methodology (stat.ME)
How should an agent (the sender) observing multidimensional data (the state vector) persuade another agent to take the desired action? We show that it is always optimal for the sender to perform a (nonlinear) dimension reduction by projecting the state vector onto a lowerdimensional object that we call the "optimal information manifold." We characterize geometric properties of this manifold and link them to the sender's preferences. Optimal policy splits information into "good" and "bad" components. When the sender's marginal utility is linear, revealing the full magnitude of good information is always optimal. In contrast, with concave marginal utility, optimal information design conceals the extreme realizations of good information and only reveals its direction (sign). We illustrate these effects by explicitly solving several multidimensional Bayesian persuasion problems.
 [7] arXiv:2110.08936 [pdf, ps, other]

Title: Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment RulesJournalref: Journal of the American Statistical Association, 116:534, 699707 (2021)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We thank the opportunity offered by editors for this discussion and the discussants for their insightful comments and thoughtful contributions. We also want to congratulate Kallus (2020) for his inspiring work in improving the efficiency of policy learning by retargeting. Motivated from the discussion in Dukes and Vansteelandt (2020), we first point out interesting connections and distinctions between our work and Kallus (2020) in Section 1. In particular, the assumptions and sources of variation for consideration in these two papers lead to different research problems with different scopes and focuses. In Section 2, following the discussions in Li et al. (2020); Liang and Zhao (2020), we also consider the efficient policy evaluation problem when we have some data from the testing distribution available at the training stage. We show that under the assumption that the sample sizes from training and testing are growing in the same order, efficient value function estimates can deliver competitive performance. We further show some connections of these estimates with existing literature. However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore. In contrast, the requirement of the testing sample size for DRITR is not as strong as that of efficient policy evaluation using the combined data. Finally, we highlight the general applicability and usefulness of DRITR in Section 3.
 [8] arXiv:2110.08989 [pdf, other]

Title: Valid and Exact Statistical Inference for Multidimensional Multiple ChangePoints by Selective InferenceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In this paper, we study statistical inference of changepoints (CPs) in multidimensional sequence. In CP detection from a multidimensional sequence, it is often desirable not only to detect the location, but also to identify the subset of the components in which the change occurs. Several algorithms have been proposed for such problems, but no valid exact inference method has been established to evaluate the statistical reliability of the detected locations and components. In this study, we propose a method that can guarantee the statistical reliability of both the location and the components of the detected changes. We demonstrate the effectiveness of the proposed method by applying it to the problems of genomic abnormality identification and human behavior analysis.
 [9] arXiv:2110.09167 [pdf, other]

Title: RKHSSHAP: Shapley Values for Kernel MethodsComments: 11 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values, a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing Shapley values from a functional perspective, we propose \textsc{RKHSSHAP}, an attribution method for kernel machines that can efficiently compute both \emph{Interventional} and \emph{Observational Shapley values} using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations  a key yet often overlooked desideratum for interpretability. Further, we propose \emph{Shapley regulariser}, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature's contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the Shapley values of sensitive features.
 [10] arXiv:2110.09360 [pdf, other]

Title: Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learningAuthors: Rodolfo S. M. Freitas, Ágatha P. F. Lima, Cheng Chen, Fernando A. Rochinha, Daniel Mira, Xi JiangComments: 22 pages, 13 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Accurate determination of fuel properties of complex mixtures over a wide range of pressure and temperature conditions is essential to utilizing alternative fuels. The present work aims to construct cheaptocompute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations and/or experimental measurements in a datafusionfidelity approach. Here, Gaussian Process (GP) and probabilistic generative models are adopted. GP is a popular nonparametric Bayesian approach to build surrogate models mainly due to its capacity to handle the aleatory and epistemic uncertainties. Generative models have shown the ability of deep neural networks employed with the same intent. In this work, ML analysis is focused on a particular property, the fuel density, but it can also be extended to other physicochemical properties. This study explores the versatility of the ML models to handle multifidelity data. The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
 [11] arXiv:2110.09361 [pdf, other]

Title: Efficient Exploration in Binary and Preferential Bayesian OptimizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
Bayesian optimization (BO) is an effective approach to optimize expensive blackbox functions, that seeks to tradeoff between exploitation (selecting parameters where the maximum is likely) and exploration (selecting parameters where we are uncertain about the objective function). In many realworld situations, direct measurements of the objective function are not possible, and only binary measurements such as success/failure or pairwise comparisons are available. To perform efficient exploration in this setting, we show that it is important for BO algorithms to distinguish between different types of uncertainty: epistemic uncertainty, about the unknown objective function, and aleatoric uncertainty, which comes from noisy observations and cannot be reduced. In effect, only the former is important for efficient exploration. Based on this, we propose several new acquisition functions that outperform stateoftheart heuristics in binary and preferential BO, while being fast to compute and easy to implement. We then generalize these acquisition rules to batch learning, where multiple queries are performed simultaneously.
Crosslists for Tue, 19 Oct 21
 [12] arXiv:2008.08342 (crosslist from condmat.disnn) [pdf, other]

Title: Structure Learning in Inverse Ising Problems Using $\ell_2$Regularized Linear EstimatorComments: 35 pages, 8 figuresSubjects: Disordered Systems and Neural Networks (condmat.disnn); Machine Learning (cs.LG); Machine Learning (stat.ML)
The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the $\ell_2$regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacherstudent scenario under the assumption that the teacher couplings are sparse, the analysis is conducted using the replica and cavity methods, with a special focus on whether the presence/absence of teacher couplings is correctly inferred or not. The result indicates that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization when the number of spins $N$ is smaller than the dataset size $M$, in the thermodynamic limit $N\to \infty$. Further, to access the underdetermined region $M < N$, we examine the effect of the $\ell_2$ regularization, and find that biases appear in all the coupling estimates, preventing the perfect identification of the network structure. We, however, find that the biases are shown to decay exponentially fast as the distance from the center spin chosen in the pseudolikelihood method grows. Based on this finding, we propose a twostage estimator: In the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold; in the second stage the naive linear regression is conducted only on the remaining couplings, and the resultant estimates are again pruned by another relatively large threshold. This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in $0<M/N<1$. Results of extensive numerical experiments support these findings.
 [13] arXiv:2110.08363 (crosslist from stat.AP) [pdf, other]

Title: Spatiotemporal extreme event modeling of terror insurgenciesSubjects: Applications (stat.AP); Machine Learning (stat.ML)
Extreme events with potential deadly outcomes, such as those organized by terror groups, are highly unpredictable in nature and an imminent threat to society. In particular, quantifying the likelihood of a terror attack occurring in an arbitrary spacetime region and its relative societal risk, would facilitate informed measures that would strengthen national security. This paper introduces a novel selfexciting marked spatiotemporal model for attacks whose inhomogeneous baseline intensity is written as a function of covariates. Its triggering intensity is succinctly modeled with a Gaussian Process prior distribution to flexibly capture intricate spatiotemporal dependencies between an arbitrary attack and previous terror events. By inferring the parameters of this model, we highlight specific spacetime areas in which attacks are likely to occur. Furthermore, by measuring the outcome of an attack in terms of the number of casualties it produces, we introduce a novel mixture distribution for the number of casualties. This distribution flexibly handles low and high number of casualties and the discrete nature of the data through a {\it Generalized ZipF} distribution. We rely on a customized Markov chain Monte Carlo (MCMC) method to estimate the model parameters. We illustrate the methodology with data from the open source Global Terrorism Database (GTD) that correspond to attacks in Afghanistan from 20132018. We show that our model is able to predict the intensity of future attacks for 20192021 while considering various covariates of interest such as population density, number of regional languages spoken, and the density of population supporting the opposing government.
 [14] arXiv:2110.08577 (crosslist from math.OC) [pdf, other]

Title: NysCurve: NyströmApproximated Curvature for Stochastic OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
The quasiNewton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the firstorder derivatives. In this study, we propose an approximate Newton stepbased stochastic optimization algorithm for largescale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size ($d\times k$) with $k\ll d$ randomly selected variables, then use the \textit{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. Furthermore, to address largescale scenarios in which even computing a partial Hessian may require significant time, we used distributionpreserving (DP) subsampling to compute a partial Hessian. The DP subsampling generates $p$ subsamples with similar first and secondorder distribution statistics and selects a single subsample at each epoch in a roundrobin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variancereduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method with performance competitive with the stateoftheart firstorder and the stochastic quasiNewton methods.
 [15] arXiv:2110.08607 (crosslist from cs.LG) [pdf, other]

Title: Physicsguided Deep Markov Models for Learning Nonlinear Dynamical Systems with UncertaintySubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)
In this paper, we propose a probabilistic physicsguided framework, termed Physicsguided Deep Markov Model (PgDMM). The framework is especially targeted to the inference of the characteristics and latent structure of nonlinear dynamical systems from measurement data, where it is typically intractable to perform exact inference of latent variables. A recently surfaced option pertains to leveraging variational inference to perform approximate inference. In such a scheme, transition and emission functions of the system are parameterized via feedforward neural networks (deep generative models). However, due to the generalized and highly versatile formulation of neural network functions, the learned latent space is often prone to lack physical interpretation and structured representation. To address this, we bridge physicsbased state space models with Deep Markov Models, thus delivering a hybrid modeling framework for unsupervised learning and identification for nonlinear dynamical systems. Specifically, the transition process can be modeled as a physicsbased model enhanced with an additive neural network component, which aims to learn the discrepancy between the physicsbased model and the actual dynamical system being monitored. The proposed framework takes advantage of the expressive power of deep learning, while retaining the driving physics of the dynamical system by imposing physicsdriven restrictions on the side of the latent space. We demonstrate the benefits of such a fusion in terms of achieving improved performance on illustrative simulation examples and experimental case studies of nonlinear systems. Our results indicate that the physicsbased models involved in the employed transition and emission functions essentially enforce a more structured and physically interpretable latent space, which is essential to generalization and prediction capabilities.
 [16] arXiv:2110.08627 (crosslist from cs.LG) [pdf, other]

Title: On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic BanditsComments: 27 pages, 8 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
We study the Pareto frontier of two archetypal objectives in stochastic bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To make this precise, we first design and analyze the BoBWlil'UCB$({\gamma})$ algorithm, which achieves orderwise optimal performance for RM or BAI under different values of ${\gamma}$. Complementarily, we show that no algorithm can simultaneously perform optimally for both the RM and BAI objectives. More precisely, we establish nontrivial lower bounds on the regret achievable by any algorithm with a given BAI failure probability. This analysis shows that in some regimes BoBWlil'UCB$({\gamma})$ achieves Paretooptimality up to constant or small terms. Numerical experiments further demonstrate that when applied to difficult instances, BoBWlil'UCB outperforms a close competitor UCB$_{\alpha}$ (Degenne et al., 2019), which is designed for RM and BAI with a fixed confidence.
 [17] arXiv:2110.08634 (crosslist from cs.SD) [pdf, other]

Title: Towards Robust WaveformBased Acoustic ModelsSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
We propose an approach for learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions. This problem is of paramount importance for the deployment of speech recognition systems that need to perform well in unseen environments. Our approach is an instance of vicinal risk minimization, which aims to improve risk estimates during training by replacing the delta functions that define the empirical density over the input space with an approximation of the marginal population density in the vicinity of the training samples. More specifically, we assume that local neighborhoods centered at training samples can be approximated using a mixture of Gaussians, and demonstrate theoretically that this can incorporate robust inductive bias into the learning process. We characterize the individual mixture components implicitly via data augmentation schemes, designed to address common sources of spurious correlations in acoustic models. To avoid potential confounding effects on robustness due to information loss, which has been associated with standard feature extraction techniques (e.g., FBANK and MFCC features), we focus our evaluation on the waveformbased setting. Our empirical results show that the proposed approach can generalize to unseen noise conditions, with 150% relative improvement in outofdistribution generalization compared to training using the standard risk minimization principle. Moreover, the results demonstrate competitive performance relative to models learned using a training sample designed to match the acoustic conditions characteristic of test utterances (i.e., optimal vicinal densities).
 [18] arXiv:2110.08678 (crosslist from cs.LG) [pdf, other]

Title: Transformer with a Mixture of Gaussian KeysAuthors: Tam Nguyen, Tan M. Nguyen, Dung Le, Khuong Nguyen, Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. OsherComments: 21 pages, 8 figures, 4 tablesSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Multihead attention is a driving force behind stateoftheart transformers which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (TransformerMGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head. These mixtures of keys follow a Gaussian mixture model and allow each attention head to focus on different parts of the input sequence efficiently. Compared to its conventional transformer counterpart, TransformerMGK accelerates training and inference, has fewer parameters, and requires less FLOPs to compute while achieving comparable or better accuracy across tasks. TransformerMGK can also be easily extended to use with linear attentions. We empirically demonstrate the advantage of TransformerMGK in a range of practical applications including language modeling and tasks that involve very long sequences. On the Wikitext103 and Long Range Arena benchmark, TransformerMGKs with 4 heads attain comparable or better performance to the baseline transformers with 8 heads.
 [19] arXiv:2110.08691 (crosslist from cs.DS) [pdf, ps, other]

Title: Terminal Embeddings in Sublinear TimeComments: Accepted to FOCS 2021Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Machine Learning (cs.LG); Machine Learning (stat.ML)
Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying
\begin{equation*}
\forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) .
\end{equation*}
In the case that $X,Y$ are both Euclidean metrics with $Y$ being $m$dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{2}\log n)$ for $n := T$. This generalizes the JohnsonLindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside is that evaluating the embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in $m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to preprocess $T$ to obtain an almost linearspace data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $n^{1\Theta(\epsilon^2)+o(1)} + dn^{o(1)}$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.  [20] arXiv:2110.08693 (crosslist from cs.LG) [pdf, other]

Title: On the Statistical Analysis of Complex Treeshaped 3D ObjectsSubjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Graphics (cs.GR); Machine Learning (stat.ML)
How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such treelike 3D objects. A hierarchical organization of subtrees characterizes these objects  each subtree has the main branch with some side branches attached  and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the SquareRoot Velocity Function (SRVF), initially developed for Euclidean curves, to treeshaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one treeshaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of treeshaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of treeshaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel treeshaped 3D objects through random sampling from estimated probability distributions.
 [21] arXiv:2110.08695 (crosslist from cs.LG) [pdf, other]

Title: Towards InstanceOptimal Offline Reinforcement Learning with PessimismComments: NeurIPS, 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a rewardmaximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for finitehorizon MDPs. Prior works study this problem based on different datacoverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches \[ O\left(\sum_{h=1}^H\sum_{s_h,a_h}d^{\pi^\star}_h(s_h,a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h,a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h,a_h)}}\sqrt{\frac{1}{n}}\right). \] In complementary, we also prove a perinstance informationtheoretical lower bound under the weak assumption that $d^\mu_h(s_h,a_h)>0$ if $d^{\pi^\star}_h(s_h,a_h)>0$. Different from the previous minimax lower bounds, the perinstance lower bound (via local minimaxity) is a much stronger criterion as it applies to individual instances separately. Here $\pi^\star$ is a optimal policy, $\mu$ is the behavior policy and $d_h^\mu$ is the marginal stateaction probability. We call the above equation the intrinsic offline reinforcement learning bound since it directly implies all the existing optimal results: minimax rate under uniform datacoverage assumption, horizonfree setting, single policy concentrability, and the tight problemdependent results. Later, we extend the result to the assumptionfree regime (where we make no assumption on $ \mu$) and obtain the assumptionfree intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.
 [22] arXiv:2110.08710 (crosslist from cs.LG) [pdf, ps, other]

Title: NeuralArTS: Structuring Neural Architecture Search with Type TheorySubjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL); Machine Learning (stat.ML)
Neural Architecture Search (NAS) algorithms automate the task of finding optimal deep learning architectures given an initial search space of possible operations. Developing these search spaces is usually a manual affair with preoptimized search spaces being more efficient, rather than searching from scratch. In this paper we present a new framework called Neural Architecture Type System (NeuralArTS) that categorizes the infinite set of network operations in a structured type system. We further demonstrate how NeuralArTS can be applied to convolutional layers and propose several future directions.
 [23] arXiv:2110.08720 (crosslist from cs.LG) [pdf, other]

Title: Centroid Approximation for BootstrapSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in largescale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality "centroid" points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.
 [24] arXiv:2110.08850 (crosslist from physics.socph) [pdf]

Title: Understanding the network formation pattern for better link predictionComments: 21 pages, 3 figures, 18 tables, and 29 referencesSubjects: Physics and Society (physics.socph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Molecular Networks (qbio.MN); Machine Learning (stat.ML)
As a classical problem in the field of complex networks, link prediction has attracted much attention from researchers, which is of great significance to help us understand the evolution and dynamic development mechanisms of networks. Although various network typespecific algorithms have been proposed to tackle the link prediction problem, most of them suppose that the network structure is dominated by the Triadic Closure Principle. We still lack an adaptive and comprehensive understanding of network formation patterns for predicting potential links. In addition, it is valuable to investigate how network local information can be better utilized. To this end, we proposed a novel method named Link prediction using Multiple Order Local Information (MOLI) that exploits the local information from the neighbors of different distances, with parameters that can be a priordriven based on prior knowledge, or datadriven by solving an optimization problem on observed networks. MOLI defined a local network diffusion process via random walks on the graph, resulting in better use of network information. We show that MOLI outperforms the other 11 widely used link prediction algorithms on 11 different types of simulated and realworld networks. We also conclude that there are different patterns of local information utilization for different networks, including social networks, communication networks, biological networks, etc. In particular, the classical common neighborbased algorithm is not as adaptable to all social networks as it is perceived to be; instead, some of the social networks obey the Quadrilateral Closure Principle which preferentially connects paths of length three.
 [25] arXiv:2110.08871 (crosslist from cs.LG) [pdf, ps, other]

Title: Noiserobust ClusteringSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper presents noiserobust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical $K$means and $K$medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted $W_2$) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distributionbased $K$means and $K$medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution.
 [26] arXiv:2110.08922 (crosslist from cs.LG) [pdf, other]

Title: Explaining generalization in deep learning: progress and fundamental limitsAuthors: Vaishnavh NagarajanComments: arXiv admin note: text overlap with arXiv:1902.04742Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error?
In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em datadependent} {\em uniformconvergencebased} generalization bounds with improved dependencies on the parameter count.
Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound.
With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniformconvergecebased complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision.
We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.  [27] arXiv:2110.08984 (crosslist from cs.LG) [pdf, ps, other]

Title: Optimistic Policy Optimization is Provably Efficient in Nonstationary MDPsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study episodic reinforcement learning (RL) in nonstationary linear kernel Markov decision processes (MDPs). In this setting, both the reward function and the transition kernel are linear with respect to the given feature maps and are allowed to vary over time, as long as their respective parameter variations do not exceed certain variation budgets. We propose the $\underline{\text{p}}$eriodically $\underline{\text{r}}$estarted $\underline{\text{o}}$ptimistic $\underline{\text{p}}$olicy $\underline{\text{o}}$ptimization algorithm (PROPO), which is an optimistic policy optimization algorithm with linear function approximation. PROPO features two mechanisms: slidingwindowbased policy evaluation and periodicrestartbased policy improvement, which are tailored for policy optimization in a nonstationary environment. In addition, only utilizing the technique of sliding window, we propose a valueiteration algorithm. We establish dynamic upper bounds for the proposed methods and a matching minimax lower bound which shows the (near) optimality of the proposed methods. To our best knowledge, PROPO is the first provably efficient policy optimization algorithm that handles nonstationarity.
 [28] arXiv:2110.08985 (crosslist from cs.CV) [pdf, other]

Title: StyleNeRF: A Stylebased 3DAware Generator for Highresolution Image SynthesisComments: 24 pages, 19 figures. Project page: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose StyleNeRF, a 3Daware generative model for photorealistic highresolution image synthesis with high multiview consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize highresolution images with fine details or yield noticeable 3Dinconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a stylebased generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for highresolution image generation. We perform volume rendering only to produce a lowresolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize highresolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoomin andout, style mixing, inversion, and semantic editing.
 [29] arXiv:2110.09006 (crosslist from cs.CV) [pdf, other]

Title: Natural Image Reconstruction from fMRI using Deep Learning: A SurveySubjects: Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (qbio.NC); Machine Learning (stat.ML)
With the advent of brain imaging techniques and machine learning tools, much effort has been devoted to building computational models to capture the encoding of visual information in the human brain. One of the most challenging brain decoding tasks is the accurate reconstruction of the perceived natural images from brain activities measured by functional magnetic resonance imaging (fMRI). In this work, we survey the most recent deep learning methods for natural image reconstruction from fMRI. We examine these methods in terms of architectural design, benchmark datasets, and evaluation metrics and present a fair performance evaluation across standardized evaluation metrics. Finally, we discuss the strengths and limitations of existing studies and present potential future directions.
 [30] arXiv:2110.09040 (crosslist from stat.ME) [pdf, ps, other]

Title: A Bayesian approach to multitask learning with network lassoSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
Network lasso is a method for solving a multitask learning problem through the regularized maximum likelihood method. A characteristic of network lasso is setting a different model for each sample. The relationships among the models are represented by relational coefficients. A crucial issue in network lasso is to provide appropriate values for these relational coefficients. In this paper, we propose a Bayesian approach to solve multitask learning problems by network lasso. This approach allows us to objectively determine the relational coefficients by Bayesian estimation. The effectiveness of the proposed method is shown in a simulation study and a real data analysis.
 [31] arXiv:2110.09042 (crosslist from math.ST) [pdf, other]

Title: Kernelbased estimation for partially functional linear model: Minimax rates and randomized sketchesSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
This paper considers the partially functional linear model (PFLM) where all predictive features consist of a functional covariate and a high dimensional scalar vector. Over an infinite dimensional reproducing kernel Hilbert space, the proposed estimation for PFLM is a least square approach with two mixed regularizations of a functionnorm and an $\ell_1$norm. Our main task in this paper is to establish the minimax rates for PFLM under high dimensional setting, and the optimal minimax rates of estimation is established by using various techniques in empirical process theory for analyzing kernel classes. In addition, we propose an efficient numerical algorithm based on randomized sketches of the kernel matrix. Several numerical experiments are implemented to support our method and optimization strategy.
 [32] arXiv:2110.09140 (crosslist from cs.LG) [pdf, other]

Title: Learning Prototypeoriented Set Representations for MetaLearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning from setstructured data is a fundamental problem that has recently attracted increasing attention, where a series of summary networks are introduced to deal with the set input. In fact, many metalearning problems can be treated as setinput tasks. Most existing summary networks aim to design different architectures for the input set in order to enforce permutation invariance. However, scant attention has been paid to the common cases where different sets in a metadistribution are closely related and share certain statistical properties. Viewing each set as a distribution over a set of global prototypes, this paper provides a novel optimal transport (OT) based way to improve existing summary networks. To learn the distribution over the global prototypes, we minimize its OT distance to the set empirical distribution over data points, providing a natural unsupervised way to improve the summary network. Since our plugandplay framework can be applied to many metalearning problems, we further instantiate it to the cases of fewshot classification and implicit meta generative modeling. Extensive experiments demonstrate that our framework significantly improves the existing summary networks on learning more powerful summary statistics from sets and can be successfully integrated into metricbased fewshot classification and generative modeling applications, providing a promising tool for addressing setinput and metalearning problems.
 [33] arXiv:2110.09192 (crosslist from cs.LG) [pdf, other]

Title: Learning Optimal Conformal ClassifiersSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME); Machine Learning (stat.ML)
Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in highstake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's probability estimates to predict confidence sets containing the true class with a userspecified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper endtoend. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on minibatches during training. We show that CT outperforms stateoftheart CP methods for classification by reducing the average confidence set size (inefficiency). Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.
 [34] arXiv:2110.09253 (crosslist from cs.CY) [pdf]

Title: A Sociotechnical View of Algorithmic FairnessComments: Accepted at Information Systems JournalSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)
Algorithmic fairness has been framed as a newly emerging technology that mitigates systemic discrimination in automated decisionmaking, providing opportunities to improve fairness in information systems (IS). However, based on a stateoftheart literature review, we argue that fairness is an inherently social concept and that technologies for algorithmic fairness should therefore be approached through a sociotechnical lens. We advance the discourse on algorithmic fairness as a sociotechnical phenomenon. Our research objective is to embed AF in the sociotechnical view of IS. Specifically, we elaborate on why outcomes of a system that uses algorithmic means to assure fairness depends on mutual influences between technical and social structures. This perspective can generate new insights that integrate knowledge from both technical fields and social studies. Further, it spurs new directions for IS debates. We contribute as follows: First, we problematize fundamental assumptions in the current discourse on algorithmic fairness based on a systematic analysis of 310 articles. Second, we respond to these assumptions by theorizing algorithmic fairness as a sociotechnical construct. Third, we propose directions for IS researchers to enhance their impacts by pursuing a unique understanding of sociotechnical algorithmic fairness. We call for and undertake a holistic approach to AF. A sociotechnical perspective on algorithmic fairness can yield holistic solutions to systemic biases and discrimination.
 [35] arXiv:2110.09327 (crosslist from cs.LG) [pdf, other]

Title: SelfSupervised Representation Learning: Introduction, Advances and ChallengesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Selfsupervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets, thus alleviating the annotation bottleneck that is one of the main barriers to practical deployment of deep learning today. These methods have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pretraining alternatives across a variety of data modalities including image, video, sound, text and graphs. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how selfsupervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and compute cost. Finally, we survey the major open challenges in the field that provide fertile ground for future work.
 [36] arXiv:2110.09333 (crosslist from math.ST) [pdf, other]

Title: Regression with Missing Data, a Comparison Study of TechniquesBased on Random ForestsSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and describe our new algorithm performanceas well as its algorithmic complexity. A variety of missing value mechanisms (such as MCAR,MAR, MNAR) are considered and simulated. We study the quadratic errors and the bias ofour algorithm and compare it to the most popular missing values random forests algorithms inthe literature. In particular, we compare those techniques for both a regression and predictionpurpose. This work follows a first paper GomezMendez and Joly (2020) on the consistency ofthis new algorithm.
 [37] arXiv:2110.09334 (crosslist from math.OC) [pdf, other]

Title: A portfolio approach to massively parallel Bayesian optimizationSubjects: Optimization and Control (math.OC); Machine Learning (stat.ML)
One way to reduce the time of conducting optimization studies is to evaluate designs in parallel rather than just oneatatime. For expensivetoevaluate blackboxes, batch versions of Bayesian optimization have been proposed. They work by building a surrogate model of the blackbox that can be used to select the designs to evaluate efficiently via an infill criterion. Still, with higher levels of parallelization becoming available, the strategies that work for a few tens of parallel evaluations become limiting, in particular due to the complexity of selecting more evaluations. It is even more crucial when the blackbox is noisy, necessitating more evaluations as well as repeating experiments. Here we propose a scalable strategy that can keep up with massive batching natively, focused on the exploration/exploitation tradeoff and a portfolio allocation. We compare the approach with related methods on deterministic and noisy functions, for mono and multiobjective optimization tasks. These experiments show similar or better performance than existing methods, while being orders of magnitude faster.
 [38] arXiv:2110.09356 (crosslist from cs.LG) [pdf, other]

Title: Towards Federated Bayesian Network Structure Learning with Continuous OptimizationComments: 16 pages; 5 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we present a crosssilo federated learning approach to estimate the structure of Bayesian network from data that is horizontally partitioned across different parties. We develop a distributed structure learning method based on continuous optimization, using the alternating direction method of multipliers (ADMM), such that only the model parameters have to be exchanged during the optimization process. We demonstrate the flexibility of our approach by adopting it for both linear and nonlinear cases. Experimental results on synthetic and real datasets show that it achieves an improved performance over the other methods, especially when there is a relatively large number of clients and each has a limited sample size.
 [39] arXiv:2110.09443 (crosslist from cs.LG) [pdf, other]

Title: Beltrami Flow and Neural Diffusion on GraphsAuthors: Benjamin Paul Chamberlain, James Rowbottom, Davide Eynard, Francesco Di Giovanni, Xiaowen Dong, Michael M BronsteinComments: 21 pages, 5 figures. Proceedings of the Thirtyfifth Conference on Neural Information Processing Systems (NeurIPS) 2021Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
We propose a novel class of graph neural networks based on the discretised Beltrami flow, a nonEuclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves stateoftheart results on several benchmarks.
 [40] arXiv:2110.09468 (crosslist from cs.LG) [pdf, other]

Title: Improving Robustness using Generated DataAuthors: Sven Gowal, SylvestreAlvise Rebuffi, Olivia Wiles, Florian Stimberg, Dan Andrei Calian, Timothy MannComments: Accepted at NeurIPS 2021Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recent work argues that robust training requires substantially larger datasets than those required for standard classification. On CIFAR10 and CIFAR100, this translates into a sizable robustaccuracy gap between models trained solely on data from the original training set and those trained with additional data extracted from the "80 Million Tiny Images" dataset (TI80M). In this paper, we explore how generative models trained solely on the original training set can be leveraged to artificially increase the size of the original training set and improve adversarial robustness to $\ell_p$ normbounded perturbations. We identify the sufficient conditions under which incorporating additional generated data can improve robustness, and demonstrate that it is possible to significantly reduce the robustaccuracy gap to models trained with additional real data. Surprisingly, we even show that even the addition of nonrealistic random data (generated by Gaussian sampling) can improve robustness. We evaluate our approach on CIFAR10, CIFAR100, SVHN and TinyImageNet against $\ell_\infty$ and $\ell_2$ normbounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements in robust accuracy compared to previous stateoftheart methods. Against $\ell_\infty$ normbounded perturbations of size $\epsilon = 8/255$, our models achieve 66.10% and 33.49% robust accuracy on CIFAR10 and CIFAR100, respectively (improving upon the stateoftheart by +8.96% and +3.29%). Against $\ell_2$ normbounded perturbations of size $\epsilon = 128/255$, our model achieves 78.31% on CIFAR10 (+3.81%). These results beat most prior works that use external data.
 [41] arXiv:2110.09476 (crosslist from cs.LG) [pdf, other]

Title: Recovery Guarantees for Kernelbased Clustering under Nonparametric Mixture ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Despite the ubiquity of kernelbased clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this gap by studying the statistical performance of kernelbased clustering algorithms under nonparametric mixture models. We provide necessary and sufficient separability conditions under which these algorithms can consistently recover the underlying true clustering. Our analysis provides guarantees for kernel clustering approaches without structural assumptions on the form of the component distributions. Additionally, we establish a key equivalence between kernelbased dataclustering and kernel densitybased clustering. This enables us to provide consistency guarantees for kernelbased estimators of nonparametric mixture models. Along with theoretical implications, this connection could have practical implications, including in the systematic choice of the bandwidth of the Gaussian kernel in the context of clustering.
 [42] arXiv:2110.09502 (crosslist from math.ST) [pdf, other]

Title: Minimum $\ell_{1}$norm interpolators: Precise asymptotics and multiple descentSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators  the ones that achieve zero training error  may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum $\ell_{1}$norm interpolator, which is motivated by the observation that several learning algorithms favor low $\ell_1$norm solutions in the overparameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and highdimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size).
We observe, and provide rigorous theoretical justification for, a curious multidescent phenomenon; that is, the generalization risk of the minimum $\ell_1$norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum $\ell_1$norm interpolator as well as the delicate interplay between the overparameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum $\ell_2$norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two nonlinear equations with two unknowns.  [43] arXiv:2110.09507 (crosslist from cs.LG) [pdf, other]

Title: Provable HierarchyBased MetaReinforcement LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing work either assume access to expertconstructed hierarchies, or use hierarchylearning heuristics with no provable guarantees. To address this gap, we analyze HRL in the metaRL setting, where a learner learns latent hierarchical structure during metatraining for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised metalearning theory, we provide "diversity conditions" which, together with a tractable optimismbased algorithm, guarantee sampleefficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a metatest task. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
 [44] arXiv:2110.09514 (crosslist from cs.LG) [pdf, other]

Title: Discovering and Achieving Goals via World ModelsComments: NeurIPS 2021. First two authors contributed equally. Website at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zeroshot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goalreaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/
Replacements for Tue, 19 Oct 21
 [45] arXiv:1901.08057 (replaced) [pdf, ps, other]

Title: Large dimensional analysis of general margin based classification methodsComments: 33 pages, 5 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
 [46] arXiv:2007.02794 (replaced) [pdf, other]

Title: Efficient Connected and Automated Driving Systemwith Multiagent Graph Reinforcement LearningComments: the paper is not even readySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [47] arXiv:2007.04803 (replaced) [pdf, other]

Title: A Global Stochastic Optimization Particle Filter AlgorithmComments: 61 pages, 4 figuresSubjects: Machine Learning (stat.ML); Statistics Theory (math.ST); Computation (stat.CO)
 [48] arXiv:2007.14052 (replaced) [pdf, other]

Title: Multioutput Gaussian Processes with Functional Data: A Study on Coastal Flood Hazard AssessmentSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
 [49] arXiv:2007.14861 (replaced) [pdf, ps, other]

Title: Efficient Sparse Secure Aggregation for Federated LearningSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
 [50] arXiv:2008.13443 (replaced) [pdf, other]

Title: On the Quality Requirements of Demand Prediction for Dynamic Public TransportComments: 26 pages, 9 tables, 6 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)
 [51] arXiv:2010.00373 (replaced) [pdf, other]

Title: Task Agnostic Continual Learning Using Online Variational Bayes with FixedPoint UpdatesComments: The arXiv paper "Task Agnostic Continual Learning Using Online Variational Bayes" is a preliminary preprint of this paper. The main differences between the versions are: 1. We develop new algorithmic framework (FOOVB). 2. We add multivariate Gaussian and matrix variate Gaussian versions of the algorithm. 3. We demonstrate the new algorithm performance in task agnostic scenariosJournalref: Neural Comput 2021; 33 (11)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [52] arXiv:2105.03425 (replaced) [pdf, other]

Title: Kernel TwoSample Tests for Manifold DataSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [53] arXiv:2105.08532 (replaced) [pdf, other]

Title: Robust Learning in Heterogeneous ContextsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [54] arXiv:2106.03227 (replaced) [pdf, other]

Title: Neural Tangent Kernel Maximum Mean DiscrepancySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [55] arXiv:2106.03762 (replaced) [pdf, other]

Title: Frustratingly Easy Uncertainty Estimation for Distribution ShiftComments: 17 pages, 4 Tables, 9 FiguresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [56] arXiv:2106.05565 (replaced) [pdf, other]

Title: Identifiability of interaction kernels in meanfield equations of interacting particlesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
 [57] arXiv:2106.09215 (replaced) [pdf, other]

Title: Optimumstatistical Collaboration Towards General and Efficient Blackbox OptimizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [58] arXiv:2106.15358 (replaced) [pdf, other]

Title: Towards SampleOptimal Compressive Phase Retrieval with Sparse and Generative PriorsComments: Accepted to NeurIPS 2021Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
 [59] arXiv:2107.10884 (replaced) [pdf, other]

Title: Structured secondorder methods via natural gradient descentComments: Fixed some typos. ICML workshop paper. A short version of arXiv:2102.07405 with a focus on optimization tasksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [60] arXiv:2109.05578 (replaced) [pdf, other]

Title: Kernel PCA with the Nyström methodAuthors: Fredrik HallgrenComments: 44 pages, 6 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [61] arXiv:2109.05583 (replaced) [pdf, ps, other]

Title: Automatic Componentwise Boosting: An Interpretable AutoML SystemComments: 6 pages, 4 figures, ECMLPKDD Workshop on Automating Data Science 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [62] arXiv:2110.00629 (replaced) [pdf, other]

Title: Factored couplings in multimarginal optimal transport via difference of convex programmingComments: Fix typo and correct the corollary 3.3Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
 [63] arXiv:2110.01593 (replaced) [pdf, other]

Title: Generalized Kernel ThinningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
 [64] arXiv:2110.06021 (replaced) [pdf, other]

Title: Embeddedmodel flows: Combining the inductive biases of modelfree deep learning and explicit probabilistic modelingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [65] arXiv:1712.01145 (replaced) [pdf, other]

Title: Learning Fast and Slow: PROPEDEUTICA for Realtime Malware DetectionAuthors: Ruimin Sun, Xiaoyong Yuan, Pan He, Qile Zhu, Aokun Chen, Andre Gregio, Daniela Oliveira, Xiaolin LiComments: 12 pages, 4 figures. This paper has been accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [66] arXiv:1902.04742 (replaced) [pdf, other]

Title: Uniform convergence may be unable to explain generalization in deep learningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [67] arXiv:1910.09714 (replaced) [pdf, other]

Title: SmoothnessAdaptive Contextual BanditsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [68] arXiv:1911.02319 (replaced) [pdf, other]

Title: Improving reinforcement learning algorithms: towards optimal learning rate policiesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [69] arXiv:2003.00660 (replaced) [pdf, ps, other]

Title: Upper Confidence PrimalDual Reinforcement Learning for CMDP with Adversarial LossSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [70] arXiv:2003.06566 (replaced) [pdf, other]

Title: On the benefits of defining vicinal distributions in latent spaceComments: Accepted at Elsevier Pattern Recognition Letters (2021), Best Paper Award at CVPR 2021 Workshop on Adversarial Machine Learning in RealWorld Computer Vision (AMLCV), Also accepted at ICLR 2021 Workshops on RobustReliable Machine Learning (Oral) and Generalization beyond the training distribution (Abstract)Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [71] arXiv:2005.03566 (replaced) [pdf, other]

Title: Noisy Differentiable Architecture SearchComments: BMVC 2021Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [72] arXiv:2006.05842 (replaced) [pdf, other]

Title: The Emergence of IndividualityComments: The extended version of ICML 2021 paperSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
 [73] arXiv:2007.00823 (replaced) [pdf, other]

Title: Dropout as a Regularizer of Interaction EffectsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [74] arXiv:2007.03408 (replaced) [pdf, other]

Title: A Generative Model for Texture Synthesis based on Optimal Transport between Feature DistributionsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [75] arXiv:2009.06087 (replaced) [pdf, other]

Title: Neural Networks Enhancement with Logical KnowledgeSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Machine Learning (stat.ML)
 [76] arXiv:2010.01777 (replaced) [pdf, other]

Title: A Unified View on Graph Neural Networks as Graph Signal DenoisingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [77] arXiv:2011.05348 (replaced) [pdf, other]

Title: SALR: Sharpnessaware Learning Rate Scheduler for Improved GeneralizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [78] arXiv:2101.12353 (replaced) [pdf, other]

Title: On the capacity of deep generative networks for approximating distributionsSubjects: Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [79] arXiv:2104.07084 (replaced) [pdf, other]

Title: Grouped Variable Selection with Discrete Optimization: Computational and Statistical PerspectivesSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)
 [80] arXiv:2104.11734 (replaced) [pdf, other]

Title: Exact marginal prior distributions of finite Bayesian neural networksComments: 12+9 pages, 4 figures; v3: Accepted as NeurIPS 2021 SpotlightSubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn); Machine Learning (stat.ML)
 [81] arXiv:2105.07025 (replaced) [pdf, other]

Title: Minimal Cycle Representatives in Persistent Homology using Linear Programming: an Empirical Study with User's GuideSubjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG); Machine Learning (stat.ML)
 [82] arXiv:2105.08024 (replaced) [pdf, other]

Title: SampleEfficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited RevisitingSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [83] arXiv:2106.05232 (replaced) [pdf, ps, other]

Title: Realizing GANs via a Tunable Loss FunctionComments: Extended version of a paper accepted to ITW 2021. 8 pages, 2 figuresSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
 [84] arXiv:2106.06134 (replaced) [pdf, other]

Title: Is Homophily a Necessity for Graph Neural Networks?Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [85] arXiv:2106.10065 (replaced) [pdf, other]

Title: Being a Bit Frequentist Improves Bayesian Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [86] arXiv:2106.12423 (replaced) [pdf, other]

Title: AliasFree Generative Adversarial NetworksAuthors: Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, Timo AilaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [87] arXiv:2106.13423 (replaced) [pdf, other]
 [88] arXiv:2107.00520 (replaced) [pdf, other]

Title: Predictive Modeling in the Presence of NuisanceInduced Spurious CorrelationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [89] arXiv:2107.00758 (replaced) [pdf, other]

Title: The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning ModelsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [90] arXiv:2107.05686 (replaced) [pdf, other]

Title: The Role of Pretrained Representations for the OOD Generalization of RL AgentsAuthors: Andrea Dittadi, Frederik Träuble, Manuel Wüthrich, Felix Widmaier, Peter Gehler, Ole Winther, Francesco Locatello, Olivier Bachem, Bernhard Schölkopf, Stefan BauerSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [91] arXiv:2108.08987 (replaced) [pdf, other]

Title: Uniformity Testing in the Shuffle Model: Simpler, Better, FasterComments: Accepted to the SIAM Symposium on Simplicity in Algorithms (SOSA 2022). Added some details and discussionsSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
 [92] arXiv:2108.09676 (replaced) [pdf, other]

Title: Efficient Gaussian Neural Processes for RegressionComments: 6 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [93] arXiv:2108.10566 (replaced) [pdf, other]

Title: sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel ClassificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [94] arXiv:2109.05675 (replaced) [pdf, other]

Title: Online Unsupervised Learning of Visual Representations and CategoriesComments: Technical report, 28 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [95] arXiv:2110.07959 (replaced) [pdf, other]

Title: Lowrank Matrix Recovery With Unknown CorrespondenceSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
 [96] arXiv:2110.08211 (replaced) [pdf, other]

Title: Astronomical source finding services for the CIRASA visual analytic platformAuthors: S. Riggi, C. Bordiu, F. Vitello, G. Tudisco, E. Sciacca, D. Magro, R. Sortino, C. Pino, M. Molinaro, M. Benedettini, S.Leurini, F. Bufano, M. Raciti, U. BeccianiComments: 16 pages, 6 figuresSubjects: Instrumentation and Methods for Astrophysics (astroph.IM); Computation (stat.CO); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2110, contact, help (Access key information)