Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 30 Jul 21
 [1] arXiv:2107.13721 [pdf, other]

Title: Amplitude Mean of Functional Data on $\mathbb{S}^2$Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Functional Analysis (math.FA); Applications (stat.AP)
Mainfoldvalued functional data analysis (FDA) recently becomes an active area of research motivated by the raising availability of trajectories or longitudinal data observed on nonlinear manifolds. The challenges of analyzing such data comes from many aspects, including infinite dimensionality and nonlinearity, as well as time domain or phase variability. In this paper, we study the amplitude part of manifoldvalued functions on $\S^2$, which is invariant to random time warping or reparameterization of the function. Utilizing the nice geometry of $\S^2$, we develop a set of efficient and accurate tools for temporal alignment of functions, geodesic and sample mean calculation. At the heart of these tools, they rely on gradient descent algorithms with carefully derived gradients. We show the advantages of these newly developed tools over its competitors with extensive simulations and real data, and demonstrate the importance of considering the amplitude part of functions instead of mixing it with phase variability in mainfoldvalued FDA.
 [2] arXiv:2107.13735 [pdf, other]

Title: Learning the temporal evolution of multivariate densities via normalizing flowsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Dynamical Systems (math.DS); Probability (math.PR)
In this work, we propose a method to learn probability distributions using sample path data from stochastic differential equations. Specifically, we consider temporally evolving probability distributions (e.g., those produced by integrating local or nonlocal FokkerPlanck equations). We analyze this evolution through machine learning assisted construction of a timedependent mapping that takes a reference distribution (say, a Gaussian) to each and every instance of our evolving distribution. If the reference distribution is the initial condition of a FokkerPlanck equation, what we learn is the timeT map of the corresponding solution. Specifically, the learned map is a normalizing flow that deforms the support of the reference density to the support of each and every density snapshot in time. We demonstrate that this approach can learn solutions to nonlocal FokkerPlanck equations, such as those arising in systems driven by both Brownian and L\'evy noise. We present examples with two and threedimensional, uni and multimodal distributions to validate the method.
 [3] arXiv:2107.14203 [pdf, other]

Title: Did the Model Change? Efficiently Assessing Machine Learning API ShiftsSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applications (stat.AP)
Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance shifts from 2020 to 2021 of popular ML APIs from Google, Microsoft, Amazon, and others on a variety of datasets. We identified significant model shifts in 12 out of 36 cases we investigated. Interestingly, we found several datasets where the API's predictions became significantly worse over time. This motivated us to formulate the API shift assessment problem at a more finegrained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant. Monitoring confusion matrix shifts using standard random sampling can require a large number of samples, which is expensive as each API call costs a fee. We propose a principled adaptive sampling algorithm, MASA, to efficiently estimate confusion matrix shifts. MASA can accurately estimate the confusion matrix shifts in commercial ML APIs using up to 90% fewer samples compared to random sampling. This work establishes ML API shifts as an important problem to study and provides a costeffective approach to monitor such shifts.
Crosslists for Fri, 30 Jul 21
 [4] arXiv:2107.13610 (crosslist from cs.LG) [pdf, other]

Title: Large sample spectral analysis of graphbased multimanifold clusteringSubjects: Machine Learning (cs.LG); Differential Geometry (math.DG); Spectral Theory (math.SP); Machine Learning (stat.ML)
In this work we study statistical properties of graphbased algorithms for multimanifold clustering (MMC). In MMC the goal is to retrieve the multimanifold structure underlying a given Euclidean data set when this one is assumed to be obtained by sampling a distribution on a union of manifolds $\mathcal{M} = \mathcal{M}_1 \cup\dots \cup \mathcal{M}_N$ that may intersect with each other and that may have different dimensions. We investigate sufficient conditions that similarity graphs on data sets must satisfy in order for their corresponding graph Laplacians to capture the right geometric information to solve the MMC problem. Precisely, we provide high probability error bounds for the spectral approximation of a tensorized Laplacian on $\mathcal{M}$ with a suitable graph Laplacian built from the observations; the recovered tensorized Laplacian contains all geometric information of all the individual underlying manifolds. We provide an example of a family of similarity graphs, which we call annular proximity graphs with angle constraints, satisfying these sufficient conditions. We contrast our family of graphs with other constructions in the literature based on the alignment of tangent planes. Extensive numerical experiments expand the insights that our theory provides on the MMC problem.
 [5] arXiv:2107.13656 (crosslist from cs.LG) [pdf, ps, other]

Title: Characterizing the Generalization Error of Gibbs Algorithm with Symmetrized KL informationComments: The first and second author have contributed equally to the paper. This paper is accepted in the ICML21 Workshop on InformationTheoretic Methods for Rigorous, Responsible, and Reliable Machine Learning: this https URLSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory, and various approaches have been developed. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contribution is an exact characterization of the expected generalization error of the wellknown Gibbs algorithm in terms of symmetrized KL information between the input training samples and the output hypothesis. Such a result can be applied to tighten existing expected generalization error bound. Our analysis provides more insight on the fundamental role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm.
 [6] arXiv:2107.13772 (crosslist from cs.LG) [pdf, ps, other]

Title: Bayesian Optimization for Min Max OptimizationSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
A solution that is only reliable under favourable conditions is hardly a safe solution. Min Max Optimization is an approach that returns optima that are robust against worst case conditions. We propose algorithms that perform Min Max Optimization in a setting where the function that should be optimized is not known a priori and hence has to be learned by experiments. Therefore we extend the Bayesian Optimization setting, which is tailored to maximization problems, to Min Max Optimization problems. While related work extends the two acquisition functions Expected Improvement and Gaussian Process Upper Confidence Bound; we extend the two acquisition functions Entropy Search and Knowledge Gradient. These acquisition functions are able to gain knowledge about the optimum instead of just looking for points that are supposed to be optimal. In our evaluation we show that these acquisition functions allow for better solutions  converging faster to the optimum than the benchmark settings.
 [7] arXiv:2107.13944 (crosslist from cs.LG) [pdf]

Title: Lyapunovbased uncertaintyaware safe reinforcement learningComments: Submitted to IEEE Transactions on Neural Networks and Learning SystemsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Reinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decisionmaking tasks. However, in many realworld RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety (e.g., avoiding collisions in autonomous driving). While RL problems are commonly formalized as Markov decision processes (MDPs), safety constraints are incorporated via constrained Markov decision processes (CMDPs). Although recent advances in safe RL have enabled learning safe policies in CMDPs, these safety requirements should be satisfied during both training and in the deployment process. Furthermore, it is shown that in memorybased and partially observable environments, these methods fail to maintain safety over unseen outofdistribution observations. To address these limitations, we propose a Lyapunovbased uncertaintyaware safe RL model. The introduced model adopts a Lyapunov function that converts trajectorybased constraints to a set of local linear constraints. Furthermore, to ensure the safety of the agent in highly uncertain environments, an uncertainty quantification method is developed that enables identifying riskaverse actions through estimating the probability of constraint violations. Moreover, a Transformers model is integrated to provide the agent with memory to process long time horizons of information via the selfattention mechanism. The proposed model is evaluated in gridworld navigation tasks where safety is defined as avoiding static and dynamic obstacles in fully and partially observable environments. The results of these experiments show a significant improvement in the performance of the agent both in achieving optimality and satisfying safety constraints.
 [8] arXiv:2107.14151 (crosslist from stat.ME) [pdf, other]

Title: Modern NonLinear FunctiononFunction RegressionComments: 6 figures, 5 tables (including supplementary material), 16 pages (including supplementary material). arXiv admin note: text overlap with arXiv:2104.09371Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
We introduce a new class of nonlinear functiononfunction regression models for functional data using neural networks. We propose a framework using a hidden layer consisting of continuous neurons, called a continuous hidden layer, for functional response modeling and give two model fitting strategies, Functional Direct Neural Network (FDNN) and Functional Basis Neural Network (FBNN). Both are designed explicitly to exploit the structure inherent in functional data and capture the complex relations existing between the functional predictors and the functional response. We fit these models by deriving functional gradients and implement regularization techniques for more parsimonious results. We demonstrate the power and flexibility of our proposed method in handling complex functional models through extensive simulation studies as well as real data examples.
 [9] arXiv:2107.14226 (crosslist from cs.LG) [pdf, other]

Title: Learning more skills through optimistic explorationComments: Steven Hansen and DJ Strouse contributed equally to this workSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach et al., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards. They work by simultaneously training a policy to produce distinguishable latentconditioned trajectories, and a discriminator to evaluate distinguishability by trying to infer latents from trajectories. The hope is for the agent to explore and master the environment by encouraging each skill (latent) to reliably reach different states. However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective. To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves training an ensemble of discriminators and rewarding the policy for their disagreement. Our objective directly estimates the epistemic uncertainty that comes from the discriminator not having seen enough training examples, thus providing an intrinsic reward more tailored to the true objective compared to pseudocountbased methods (Burda et al., 2019). We call this exploration bonus discriminator disagreement intrinsic reward, or DISDAIN. We demonstrate empirically that DISDAIN improves skill learning both in a tabular grid world (Four Rooms) and the 57 games of the Atari Suite (from pixels). Thus, we encourage researchers to treat pessimism with DISDAIN.
Replacements for Fri, 30 Jul 21
 [10] arXiv:1811.11891 (replaced) [pdf, other]

Title: Manifold Coordinates with Physical MeaningComments: Submitted to JMLR. Improved over v2 (added appendix). Improved over v1 (revisions)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [11] arXiv:1905.10848 (replaced) [pdf, ps, other]

Title: Learning Gaussian DAGs from Network DataComments: 14 pages, 5 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [12] arXiv:1910.10692 (replaced) [pdf, other]

Title: Deterministic tensor completion with hypergraph expandersComments: 35 pages, 4 figures. To appear in SIAM Journal on Mathematics of Data ScienceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
 [13] arXiv:2101.11815 (replaced) [pdf, other]

Title: Interpolating Classifiers Make Few MistakesComments: 23 pages, 2 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Statistics Theory (math.ST)
 [14] arXiv:2107.10970 (replaced) [pdf, other]

Title: The decomposition of the higherorder homology embedding constructed from the $k$LaplacianSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [15] arXiv:1801.02982 (replaced) [pdf, other]

Title: How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGDAuthors: Zeyuan AllenZhuComments: V2 added two applications to nonconvex stochastic optimization, and V3 corrects a citation. arXiv admin note: text overlap with arXiv:1708.08694Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [16] arXiv:1906.01005 (replaced) [pdf, other]

Title: Gated recurrent units viewed through the lens of continuous time dynamical systemsJournalref: Frontiers in Computational Neuroscience, 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [17] arXiv:1908.01656 (replaced) [pdf, ps, other]

Title: Distributed Deep Convolutional Neural Networks for the InternetofThingsJournalref: in IEEE Transactions on Computers, vol. 70, no. 8, pp. 12391252, 1 Aug. 2021Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
 [18] arXiv:1909.03194 (replaced) [pdf, other]

Title: On Sample Complexity Upper and Lower Bounds for Exact Ranking from Noisy ComparisonsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [19] arXiv:1910.01845 (replaced) [pdf, ps, other]

Title: The Complexity of Finding Stationary Points with Stochastic Gradient DescentComments: Corrected the attribution of Ghadimi and Lan's resultSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [20] arXiv:2003.08773 (replaced) [pdf, other]

Title: Do CNNs Encode Data Augmentations?Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
 [21] arXiv:2006.09858 (replaced) [pdf, other]

Title: Geometry of Similarity ComparisonsSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
 [22] arXiv:2012.14415 (replaced) [pdf, other]

Title: Stochastic Approximation for Online Tensorial Independent Component AnalysisComments: To appear in Conference on Learning Theory (COLT), 2021Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [23] arXiv:2102.06387 (replaced) [pdf, other]

Title: The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure AggregationComments: Accepted for publication at the 38th International Conference on Machine Learning (ICML 2021).1Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
 [24] arXiv:2105.14397 (replaced) [pdf, ps, other]

Title: The Sample Fréchet Mean (or Median) Graph of Sparse Graphs is SparseComments: 21 pagesSubjects: Combinatorics (math.CO); Social and Information Networks (cs.SI); Data Analysis, Statistics and Probability (physics.dataan); Applications (stat.AP); Machine Learning (stat.ML)
 [25] arXiv:2106.04156 (replaced) [pdf, other]

Title: Provable Guarantees for SelfSupervised Deep Learning with Spectral Contrastive LossSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [26] arXiv:2106.07832 (replaced) [pdf, other]

Title: Learning Equivariant Energy Based Models with Equivariant Stein Variational Gradient DescentSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [27] arXiv:2107.10731 (replaced) [pdf, other]

Title: Neural Variational Gradient DescentSubjects: Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
 [28] arXiv:2107.10960 (replaced) [pdf, other]

Title: Implicit RateConstrained Optimization of Nondecomposable ObjectivesComments: ICML 2021; Code available at this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [29] arXiv:2107.12525 (replaced) [pdf, ps, other]

Title: Proof: Accelerating Approximate Aggregation Queries with Expensive PredicatesSubjects: Statistics Theory (math.ST); Databases (cs.DB); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [30] arXiv:2107.13494 (replaced) [pdf, ps, other]

Title: Limit Distribution Theory for the Smooth 1Wasserstein Distance with ApplicationsSubjects: Statistics Theory (math.ST); Probability (math.PR); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2107, contact, help (Access key information)