We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 30 entries: 1-30 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 30 Jul 21

[1]  arXiv:2107.13721 [pdf, other]
Title: Amplitude Mean of Functional Data on $\mathbb{S}^2$
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Functional Analysis (math.FA); Applications (stat.AP)

Mainfold-valued functional data analysis (FDA) recently becomes an active area of research motivated by the raising availability of trajectories or longitudinal data observed on non-linear manifolds. The challenges of analyzing such data comes from many aspects, including infinite dimensionality and nonlinearity, as well as time domain or phase variability. In this paper, we study the amplitude part of manifold-valued functions on $\S^2$, which is invariant to random time warping or re-parameterization of the function. Utilizing the nice geometry of $\S^2$, we develop a set of efficient and accurate tools for temporal alignment of functions, geodesic and sample mean calculation. At the heart of these tools, they rely on gradient descent algorithms with carefully derived gradients. We show the advantages of these newly developed tools over its competitors with extensive simulations and real data, and demonstrate the importance of considering the amplitude part of functions instead of mixing it with phase variability in mainfold-valued FDA.

[2]  arXiv:2107.13735 [pdf, other]
Title: Learning the temporal evolution of multivariate densities via normalizing flows
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Dynamical Systems (math.DS); Probability (math.PR)

In this work, we propose a method to learn probability distributions using sample path data from stochastic differential equations. Specifically, we consider temporally evolving probability distributions (e.g., those produced by integrating local or nonlocal Fokker-Planck equations). We analyze this evolution through machine learning assisted construction of a time-dependent mapping that takes a reference distribution (say, a Gaussian) to each and every instance of our evolving distribution. If the reference distribution is the initial condition of a Fokker-Planck equation, what we learn is the time-T map of the corresponding solution. Specifically, the learned map is a normalizing flow that deforms the support of the reference density to the support of each and every density snapshot in time. We demonstrate that this approach can learn solutions to non-local Fokker-Planck equations, such as those arising in systems driven by both Brownian and L\'evy noise. We present examples with two- and three-dimensional, uni- and multimodal distributions to validate the method.

[3]  arXiv:2107.14203 [pdf, other]
Title: Did the Model Change? Efficiently Assessing Machine Learning API Shifts
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)

Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance shifts from 2020 to 2021 of popular ML APIs from Google, Microsoft, Amazon, and others on a variety of datasets. We identified significant model shifts in 12 out of 36 cases we investigated. Interestingly, we found several datasets where the API's predictions became significantly worse over time. This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant. Monitoring confusion matrix shifts using standard random sampling can require a large number of samples, which is expensive as each API call costs a fee. We propose a principled adaptive sampling algorithm, MASA, to efficiently estimate confusion matrix shifts. MASA can accurately estimate the confusion matrix shifts in commercial ML APIs using up to 90% fewer samples compared to random sampling. This work establishes ML API shifts as an important problem to study and provides a cost-effective approach to monitor such shifts.

Cross-lists for Fri, 30 Jul 21

[4]  arXiv:2107.13610 (cross-list from cs.LG) [pdf, other]
Title: Large sample spectral analysis of graph-based multi-manifold clustering
Subjects: Machine Learning (cs.LG); Differential Geometry (math.DG); Spectral Theory (math.SP); Machine Learning (stat.ML)

In this work we study statistical properties of graph-based algorithms for multi-manifold clustering (MMC). In MMC the goal is to retrieve the multi-manifold structure underlying a given Euclidean data set when this one is assumed to be obtained by sampling a distribution on a union of manifolds $\mathcal{M} = \mathcal{M}_1 \cup\dots \cup \mathcal{M}_N$ that may intersect with each other and that may have different dimensions. We investigate sufficient conditions that similarity graphs on data sets must satisfy in order for their corresponding graph Laplacians to capture the right geometric information to solve the MMC problem. Precisely, we provide high probability error bounds for the spectral approximation of a tensorized Laplacian on $\mathcal{M}$ with a suitable graph Laplacian built from the observations; the recovered tensorized Laplacian contains all geometric information of all the individual underlying manifolds. We provide an example of a family of similarity graphs, which we call annular proximity graphs with angle constraints, satisfying these sufficient conditions. We contrast our family of graphs with other constructions in the literature based on the alignment of tangent planes. Extensive numerical experiments expand the insights that our theory provides on the MMC problem.

[5]  arXiv:2107.13656 (cross-list from cs.LG) [pdf, ps, other]
Title: Characterizing the Generalization Error of Gibbs Algorithm with Symmetrized KL information
Comments: The first and second author have contributed equally to the paper. This paper is accepted in the ICML-21 Workshop on Information-Theoretic Methods for Rigorous, Responsible, and Reliable Machine Learning: this https URL
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)

Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm in terms of symmetrized KL information between the input training samples and the output hypothesis. Such a result can be applied to tighten existing expected generalization error bound. Our analysis provides more insight on the fundamental role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.

[6]  arXiv:2107.13772 (cross-list from cs.LG) [pdf, ps, other]
Title: Bayesian Optimization for Min Max Optimization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

A solution that is only reliable under favourable conditions is hardly a safe solution. Min Max Optimization is an approach that returns optima that are robust against worst case conditions. We propose algorithms that perform Min Max Optimization in a setting where the function that should be optimized is not known a priori and hence has to be learned by experiments. Therefore we extend the Bayesian Optimization setting, which is tailored to maximization problems, to Min Max Optimization problems. While related work extends the two acquisition functions Expected Improvement and Gaussian Process Upper Confidence Bound; we extend the two acquisition functions Entropy Search and Knowledge Gradient. These acquisition functions are able to gain knowledge about the optimum instead of just looking for points that are supposed to be optimal. In our evaluation we show that these acquisition functions allow for better solutions - converging faster to the optimum than the benchmark settings.

[7]  arXiv:2107.13944 (cross-list from cs.LG) [pdf]
Title: Lyapunov-based uncertainty-aware safe reinforcement learning
Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Reinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks. However, in many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety (e.g., avoiding collisions in autonomous driving). While RL problems are commonly formalized as Markov decision processes (MDPs), safety constraints are incorporated via constrained Markov decision processes (CMDPs). Although recent advances in safe RL have enabled learning safe policies in CMDPs, these safety requirements should be satisfied during both training and in the deployment process. Furthermore, it is shown that in memory-based and partially observable environments, these methods fail to maintain safety over unseen out-of-distribution observations. To address these limitations, we propose a Lyapunov-based uncertainty-aware safe RL model. The introduced model adopts a Lyapunov function that converts trajectory-based constraints to a set of local linear constraints. Furthermore, to ensure the safety of the agent in highly uncertain environments, an uncertainty quantification method is developed that enables identifying risk-averse actions through estimating the probability of constraint violations. Moreover, a Transformers model is integrated to provide the agent with memory to process long time horizons of information via the self-attention mechanism. The proposed model is evaluated in grid-world navigation tasks where safety is defined as avoiding static and dynamic obstacles in fully and partially observable environments. The results of these experiments show a significant improvement in the performance of the agent both in achieving optimality and satisfying safety constraints.

[8]  arXiv:2107.14151 (cross-list from stat.ME) [pdf, other]
Title: Modern Non-Linear Function-on-Function Regression
Comments: 6 figures, 5 tables (including supplementary material), 16 pages (including supplementary material). arXiv admin note: text overlap with arXiv:2104.09371
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

We introduce a new class of non-linear function-on-function regression models for functional data using neural networks. We propose a framework using a hidden layer consisting of continuous neurons, called a continuous hidden layer, for functional response modeling and give two model fitting strategies, Functional Direct Neural Network (FDNN) and Functional Basis Neural Network (FBNN). Both are designed explicitly to exploit the structure inherent in functional data and capture the complex relations existing between the functional predictors and the functional response. We fit these models by deriving functional gradients and implement regularization techniques for more parsimonious results. We demonstrate the power and flexibility of our proposed method in handling complex functional models through extensive simulation studies as well as real data examples.

[9]  arXiv:2107.14226 (cross-list from cs.LG) [pdf, other]
Title: Learning more skills through optimistic exploration
Comments: Steven Hansen and DJ Strouse contributed equally to this work
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latent-conditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agent to explore and master the environment by encouraging each skill (latent) to reliably reach different states. However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective. To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves training an ensemble of discriminators and rewarding the policy for their disagreement. Our objective directly estimates the epistemic uncertainty that comes from the discriminator not having seen enough training examples, thus providing an intrinsic reward more tailored to the true objective compared to pseudocount-based methods (Burda et al., 2019). We call this exploration bonus discriminator disagreement intrinsic reward, or DISDAIN. We demonstrate empirically that DISDAIN improves skill learning both in a tabular grid world (Four Rooms) and the 57 games of the Atari Suite (from pixels). Thus, we encourage researchers to treat pessimism with DISDAIN.

Replacements for Fri, 30 Jul 21

[10]  arXiv:1811.11891 (replaced) [pdf, other]
Title: Manifold Coordinates with Physical Meaning
Comments: Submitted to JMLR. Improved over v2 (added appendix). Improved over v1 (revisions)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[11]  arXiv:1905.10848 (replaced) [pdf, ps, other]
Title: Learning Gaussian DAGs from Network Data
Comments: 14 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[12]  arXiv:1910.10692 (replaced) [pdf, other]
Title: Deterministic tensor completion with hypergraph expanders
Comments: 35 pages, 4 figures. To appear in SIAM Journal on Mathematics of Data Science
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
[13]  arXiv:2101.11815 (replaced) [pdf, other]
Title: Interpolating Classifiers Make Few Mistakes
Comments: 23 pages, 2 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Statistics Theory (math.ST)
[14]  arXiv:2107.10970 (replaced) [pdf, other]
Title: The decomposition of the higher-order homology embedding constructed from the $k$-Laplacian
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[15]  arXiv:1801.02982 (replaced) [pdf, other]
Title: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD
Authors: Zeyuan Allen-Zhu
Comments: V2 added two applications to nonconvex stochastic optimization, and V3 corrects a citation. arXiv admin note: text overlap with arXiv:1708.08694
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
[16]  arXiv:1906.01005 (replaced) [pdf, other]
Title: Gated recurrent units viewed through the lens of continuous time dynamical systems
Journal-ref: Frontiers in Computational Neuroscience, 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[17]  arXiv:1908.01656 (replaced) [pdf, ps, other]
Title: Distributed Deep Convolutional Neural Networks for the Internet-of-Things
Journal-ref: in IEEE Transactions on Computers, vol. 70, no. 8, pp. 1239-1252, 1 Aug. 2021
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[18]  arXiv:1909.03194 (replaced) [pdf, other]
Title: On Sample Complexity Upper and Lower Bounds for Exact Ranking from Noisy Comparisons
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[19]  arXiv:1910.01845 (replaced) [pdf, ps, other]
Title: The Complexity of Finding Stationary Points with Stochastic Gradient Descent
Comments: Corrected the attribution of Ghadimi and Lan's result
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[20]  arXiv:2003.08773 (replaced) [pdf, other]
Title: Do CNNs Encode Data Augmentations?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[21]  arXiv:2006.09858 (replaced) [pdf, other]
Title: Geometry of Similarity Comparisons
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[22]  arXiv:2012.14415 (replaced) [pdf, other]
Title: Stochastic Approximation for Online Tensorial Independent Component Analysis
Comments: To appear in Conference on Learning Theory (COLT), 2021
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[23]  arXiv:2102.06387 (replaced) [pdf, other]
Title: The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation
Comments: Accepted for publication at the 38th International Conference on Machine Learning (ICML 2021).1
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[24]  arXiv:2105.14397 (replaced) [pdf, ps, other]
Title: The Sample Fréchet Mean (or Median) Graph of Sparse Graphs is Sparse
Comments: 21 pages
Subjects: Combinatorics (math.CO); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.data-an); Applications (stat.AP); Machine Learning (stat.ML)
[25]  arXiv:2106.04156 (replaced) [pdf, other]
Title: Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[26]  arXiv:2106.07832 (replaced) [pdf, other]
Title: Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient Descent
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[27]  arXiv:2107.10731 (replaced) [pdf, other]
Title: Neural Variational Gradient Descent
Subjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
[28]  arXiv:2107.10960 (replaced) [pdf, other]
Title: Implicit Rate-Constrained Optimization of Non-decomposable Objectives
Comments: ICML 2021; Code available at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[29]  arXiv:2107.12525 (replaced) [pdf, ps, other]
Title: Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates
Subjects: Statistics Theory (math.ST); Databases (cs.DB); Machine Learning (cs.LG); Machine Learning (stat.ML)
[30]  arXiv:2107.13494 (replaced) [pdf, ps, other]
Title: Limit Distribution Theory for the Smooth 1-Wasserstein Distance with Applications
Subjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)
[ total of 30 entries: 1-30 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2107, contact, help  (Access key information)