We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics

New submissions

[ total of 95 entries: 1-95 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 11 Dec 19

[1]  arXiv:1912.04406 [pdf, other]
Title: Semiparametric Regression for Dual Population Mortality
Comments: 28 pages, 8 graphs
Subjects: Applications (stat.AP)

Parameter shrinkage applied optimally can always reduce error and projection variances from those of maximum likelihood estimation. Many variables that actuaries use are on numerical scales, like age or year, which require parameters at each point. Rather than shrinking these towards zero, nearby parameters are better shrunk towards each other. Semiparametric regression is a statistical discipline for building curves across parameter classes using shrinkage methodology. It is similar to but more parsimonious than cubic splines. We introduce it in the context of Bayesian shrinkage and apply it to joint mortality modeling for related populations, with Swedish and Danish mortality as an illustration. Bayesian shrinkage of slope changes of linear splines is an approach to semiparametric modeling that evolved in the actuarial literature. It has some theoretical and practical advantages, like closed-form curves, direct and transparent determination of degree of shrinkage and of placing knots for the splines, and quantifying goodness of fit. It is also relatively easy to apply to the many nonlinear models that arise in actuarial work.

[2]  arXiv:1912.04432 [pdf]
Title: Variable selection for transportability
Comments: Under Review
Subjects: Methodology (stat.ME); Other Statistics (stat.OT)

Transportability provides a principled framework to address the problem of applying study results to new populations. Here, we consider the problem of selecting variables to include in transport estimators. We provide a brief overview of the transportability framework and illustrate that while selection diagrams are a vital first step in variable selection, these graphs alone identify a sufficient but not strictly necessary set of variables for generating an unbiased transport estimate. Next, we conduct a simulation experiment assessing the impact of including unnecessary variables on the performance of the parametric g-computation transport estimator. Our results highlight that the types of variables included can affect the bias, variance, and mean squared error of the estimates. We find that addition of variables that are not causes of the outcome but whose distributions differ between the source and target populations can increase the variance and mean squared error of the transported estimates. On the other hand, inclusion of variables that are causes of the outcome (regardless of whether they modify the causal contrast of interest or differ in distribution between the populations) reduces the variance of the estimates without increasing the bias. Finally, exclusion of variables that cause the outcome but do not modify the causal contrast of interest does not increase bias. These findings suggest that variable selection approaches for transport should prioritize identifying and including all causes of the outcome in the study population rather than focusing on variables whose distribution may differ between the study sample and target population.

[3]  arXiv:1912.04435 [pdf, other]
Title: Stylised Choropleth Maps for New Zealand Regions and District Health Boards
Authors: Thomas Lumley
Subjects: Applications (stat.AP)

New Zealand has two top-level sets of administrative divisions: the District Health Boards and the Regions. In this note I describe a hexagonal layout for creating stylised maps of these divisions, and using colour, size, and triangular subdivisions to compare data between divisions and across multiple variables. I present an implementation in the DHBins package for R using both base graphics and ggplot2; the concepts and specific hexagonal layout could be used in any software.

[4]  arXiv:1912.04439 [pdf, other]
Title: Privacy-preserving data sharing via probabilistic modelling
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Differential privacy allows quantifying privacy loss from computations on sensitive personal data. This loss grows with the number of accesses to the data, making it hard to open the use of such data while respecting privacy. To avoid this limitation, we propose privacy-preserving release of a synthetic version of a data set, which can be used for an unlimited number of analyses with any methods, without affecting the privacy guarantees. The synthetic data generation is based on differentially private learning of a generative probabilistic model which can capture the probability distribution of the original data. We demonstrate empirically that we can reliably reproduce statistical discoveries from the synthetic data. We expect the method to have broad use in sharing anonymized versions of key data sets for research.

[5]  arXiv:1912.04542 [pdf, ps, other]
Title: What is the best predictor that you can compute in five minutes using a given Bayesian hierarchical model?
Subjects: Methodology (stat.ME)

The goal of this paper is to provide a way for statisticians to answer the question posed in the title of this article using any Bayesian hierarchical model of their choosing and without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of ``big data'' has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a ``data subset model'' to the popular ``data model, process model, and parameter model'' framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies pre-defined computational constraints. Thus, these hyperparameters effectively calibrates the statistical model to the computer itself to obtain predictions/estimations in a pre-specified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semi-parametric properties. Furthermore, we show that subsets of normally distributed data are asymptotically partially sufficient under reasonable constraints. Results from a simulated dataset will be presented across different computers, to show the effect of the computer on the statistical analysis. Additionally, we provide a joint spatial analysis of two different environmental datasets.

[6]  arXiv:1912.04571 [pdf, other]
Title: Spatial hierarchical modeling of threshold exceedances using rate mixtures
Subjects: Methodology (stat.ME)

We develop new flexible univariate models for light-tailed and heavy-tailed data, which extend a hierarchical representation of the generalized Pareto (GP) limit for threshold exceedances. These models can accommodate departure from asymptotic threshold stability in finite samples while keeping the asymptotic GP distribution as a special (or boundary) case and can capture the tails and the bulk jointly without losing much flexibility. Spatial dependence is modeled through a latent process, while the data are assumed to be conditionally independent. We design penalized complexity priors for crucial model parameters, shrinking our proposed spatial Bayesian hierarchical model toward a simpler reference whose marginal distributions are GP with moderately heavy tails. Our model can be fitted in fairly high dimensions using Markov chain Monte Carlo by exploiting the Metropolis-adjusted Langevin algorithm (MALA), which guarantees fast convergence of Markov chains with efficient block proposals for the latent variables. We also develop an adaptive scheme to calibrate the MALA tuning parameters. Moreover, our models avoid the expensive numerical evaluations of multifold integrals in censored likelihood expressions. We demonstrate our new methodology by simulation and application to a dataset of extreme rainfall episodes that occurred in Germany. Our fitted model provides a satisfactory performance and can be successfully used to predict rainfall extremes at unobserved locations.

[7]  arXiv:1912.04607 [pdf, other]
Title: Controlling false discovery exceedance for heterogeneous tests
Subjects: Methodology (stat.ME)

Several classical methods exist for controlling the false discovery exceedance (FDX) for large scale multiple testing problems, among them the Lehmann-Romano procedure ([LR] below) and the Guo-Romano procedure ([GR] below). While these two procedures are the most prominent, they were originally designed for homogeneous test statistics, that is, when the null distribution functions of the $p$-values $F_i$, $1\leq i\leq m$, are all equal. In many applications, however, the data are heterogeneous which leads to heterogeneous null distribution functions. Ignoring this heterogeneity usually induces a conservativeness for the aforementioned procedures. In this paper, we develop three new procedures that incorporate the $F_i$'s, while ensuring the FDX control. The heterogeneous version of [LR], denoted [HLR], is based on the arithmetic average of the $F_i$'s, while the heterogeneous version of [GR], denoted [HGR], is based on the geometric average of the $F_i$'s. We also introduce a procedure [PB], that is based on the Poisson-binomial distribution and that uniformly improves [HLR] and [HGR], at the price of a higher computational complexity. Perhaps surprisingly, this shows that, contrary to the known theory of false discovery rate (FDR) control under heterogeneity, the way to incorporate the $F_i$'s can be particularly simple in the case of FDX control, and does not require any further correction term. The performances of the new proposed procedures are illustrated by real and simulated data in two important heterogeneous settings: first, when the test statistics are continuous but the $p$-values are weighted by some known independent weight vector, e.g., coming from co-data sets; second, when the test statistics are discretely distributed, as is the case for data representing frequencies or counts.

[8]  arXiv:1912.04629 [pdf, ps, other]
Title: Classification under local differential privacy
Comments: 12 pages
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

We consider the binary classification problem in a setup that preserves the privacy of the original sample. We provide a privacy mechanism that is locally differentially private and then construct a classifier based on the private sample that is universally consistent in Euclidean spaces. Under stronger assumptions, we establish the minimax rates of convergence of the excess risk and see that they are slower than in the case when the original sample is available.

[9]  arXiv:1912.04677 [pdf, other]
Title: Testing and Estimating Change-Points in the Covariance Matrix of a High-Dimensional Time Series
Authors: Ansgar Steland
Subjects: Statistics Theory (math.ST); Probability (math.PR); Applications (stat.AP)

This paper studies methods for testing and estimating change-points in the covariance structure of a high-dimensional linear time series. The assumed framework allows for a large class of multivariate linear processes (including vector autoregressive moving average (VARMA) models) of growing dimension and spiked covariance models. The approach uses bilinear forms of the centered or non-centered sample variance-covariance matrix. Change-point testing and estimation are based on maximally selected weighted cumulated sum (CUSUM) statistics. Large sample approximations under a change-point regime are provided including a multivariate CUSUM transform of increasing dimension. For the unknown asymptotic variance and covariance parameters associated to (pairs of) CUSUM statistics we propose consistent estimators. Based on weak laws of large numbers for their sequential versions, we also consider stopped sample estimation where observations until the estimated change-point are used. Finite sample properties of the procedures are investigated by simulations and their application is illustrated by analyzing a real data set from environmetrics.

[10]  arXiv:1912.04681 [pdf, other]
Title: Accelerated Sampling on Discrete Spaces with Non-Reversible Markov Processes
Comments: 31 pages, 8 figures
Subjects: Computation (stat.CO)

We consider the task of MCMC sampling from a distribution defined on a discrete space. Building on recent insights provided in [Zan19], we devise a class of efficient continuous-time, non-reversible algorithms which make active use of the structure of the underlying space. Particular emphasis is placed on how symmetries and other group-theoretic notions can be used to improve exploration of the space. We test our algorithms on a range of examples from statistics, computational physics, machine learning, and cryptography, which show improvement on alternative algorithms. We provide practical recommendations on how to design and implement these algorithms, and close with remarks on the outlook for both discrete sampling and continuous-time Monte Carlo more broadly.

[11]  arXiv:1912.04738 [pdf, other]
Title: Histogram Transform Ensembles for Large-scale Regression
Comments: arXiv admin note: text overlap with arXiv:1911.11581
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We propose a novel algorithm for large-scale regression problems named histogram transform ensembles (HTE), composed of random rotations, stretchings, and translations. First of all, we investigate the theoretical properties of HTE when the regression function lies in the H\"{o}lder space $C^{k,\alpha}$, $k \in \mathbb{N}_0$, $\alpha \in (0,1]$. In the case that $k=0, 1$, we adopt the constant regressors and develop the na\"{i}ve histogram transforms (NHT). Within the space $C^{0,\alpha}$, although almost optimal convergence rates can be derived for both single and ensemble NHT, we fail to show the benefits of ensembles over single estimators theoretically. In contrast, in the subspace $C^{1,\alpha}$, we prove that if $d \geq 2(1+\alpha)/\alpha$, the lower bound of the convergence rates for single NHT turns out to be worse than the upper bound of the convergence rates for ensemble NHT. In the other case when $k \geq 2$, the NHT may no longer be appropriate in predicting smoother regression functions. Instead, we apply kernel histogram transforms (KHT) equipped with smoother regressors such as support vector machines (SVMs), and it turns out that both single and ensemble KHT enjoy almost optimal convergence rates. Then we validate the above theoretical results by numerical experiments. On the one hand, simulations are conducted to elucidate that ensemble NHT outperform single NHT. On the other hand, the effects of bin sizes on accuracy of both NHT and KHT also accord with theoretical analysis. Last but not least, in the real-data experiments, comparisons between the ensemble KHT, equipped with adaptive histogram transforms, and other state-of-the-art large-scale regression estimators verify the effectiveness and accuracy of our algorithm.

[12]  arXiv:1912.04753 [pdf]
Title: Optimizing and accelerating space-time Ripley's K function based on Apache Spark for distributed spatiotemporal point pattern analysis
Comments: 35 pages, 23 figures, Future Generation Computer Systems
Journal-ref: Future Generation Computer Systems, 2020
Subjects: Computation (stat.CO); Computational Geometry (cs.CG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Software Engineering (cs.SE)

With increasing point of interest (POI) datasets available with fine-grained spatial and temporal attributes, space-time Ripley's K function has been regarded as a powerful approach to analyze spatiotemporal point process. However, space-time Ripley's K function is computationally intensive for point-wise distance comparisons, edge correction and simulations for significance testing. Parallel computing technologies like OpenMP, MPI and CUDA have been leveraged to accelerate the K function, and related experiments have demonstrated the substantial acceleration. Nevertheless, previous works have not extended optimization of Ripley's K function from space dimension to space-time dimension. Without sophisticated spatiotemporal query and partitioning mechanisms, extra computational overhead can be problematic. Meanwhile, these researches were limited by the restricted scalability and relative expensive programming cost of parallel frameworks and impeded their applications for large POI dataset and Ripley's K function variations. This paper presents a distributed computing method to accelerate space-time Ripley's K function upon state-of-the-art distributed computing framework Apache Spark, and four strategies are adopted to simplify calculation procedures and accelerate distributed computing respectively. Based on the optimized method, a web-based visual analytics framework prototype has been developed. Experiments prove the feasibility and time efficiency of the proposed method, and also demonstrate its value on promoting applications of space-time Ripley's K function in ecology, geography, sociology, economics, urban transportation and other fields.

[13]  arXiv:1912.04758 [pdf, other]
Title: Generalised Network Autoregressive Processes and the GNAR package
Subjects: Methodology (stat.ME)

This article introduces the GNAR package, which fits, predicts, and simulates from a powerful new class of generalised network autoregressive processes. Such processes consist of a multivariate time series along with a real, or inferred, network that provides information about inter-variable relationships. The GNAR model relates values of a time series for a given variable and time to earlier values of the same variable and of neighbouring variables, with inclusion controlled by the network structure. The GNAR package is designed to fit this new model, while working with standard ts objects and the igraph package for ease of use.

[14]  arXiv:1912.04869 [pdf, other]
Title: Adaptive Manifold Clustering
Subjects: Statistics Theory (math.ST)

We extend the theoretical study of a recently proposed nonparametric clustering algorithm called Adaptive Weights Clustering (AWC). In particular, we are interested in the case of high-dimensional data lying in the vicinity of a lower-dimensional non-linear submanifold with positive reach. After a slight adjustment and under rather general assumptions for the cluster structure, the algorithm turns out to be nearly optimal in detecting local inhomogeneities, while aggregating homogeneous data with a high probability. We also adress the problem of parameter tuning.

[15]  arXiv:1912.04884 [pdf, other]
Title: Statistically Robust Neural Network Classification
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Recently there has been much interest in quantifying the robustness of neural network classifiers through adversarial risk metrics. However, for problems where test-time corruptions occur in a probabilistic manner, rather than being generated by an explicit adversary, adversarial metrics typically do not provide an accurate or reliable indicator of robustness. To address this, we introduce a statistically robust risk (SRR) framework which measures robustness in expectation over both network inputs and a corruption distribution. Unlike many adversarial risk metrics, which typically require separate applications on a point-by-point basis, the SRR can easily be directly estimated for an entire network and used as a training objective in a stochastic gradient scheme. Furthermore, we show both theoretically and empirically that it can scale to higher-dimensional networks by providing superior generalization performance compared with comparable adversarial risks.

Cross-lists for Wed, 11 Dec 19

[16]  arXiv:1912.04278 (cross-list from eess.IV) [pdf, other]
Title: Deep Efficient End-to-end Reconstruction (DEER) Network for Low-dose Few-view Breast CT from Projection Data
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of clarifications (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to x-ray radiation, dose reduction of breast CT is an important topic, and for this purpose low-dose few-view scanning is a main approach. In this article, we propose a Deep Efficient End-to-end Reconstruction (DEER) network for low-dose few-view breast CT. The major merits of our network include high dose efficiency, excellent image quality, and low model complexity. By the design, the proposed network can learn the reconstruction process in terms of as less as O(N) parameters, where N is the size of an image to be reconstructed, which represents orders of magnitude improvements relative to the state-of-the-art deep-learning based reconstruction methods that map projection data to tomographic images directly. As a result, our method does not require expensive GPUs to train and run. Also, validated on a cone-beam breast CT dataset prepared by Koning Corporation on a commercial scanner, our method demonstrates competitive performance over the state-of-the-art reconstruction networks in terms of image quality.

[17]  arXiv:1912.04345 (cross-list from physics.chem-ph) [pdf, ps, other]
Title: Noisy, sparse, nonlinear: Navigating the Bermuda Triangle of physical inference with deep filtering
Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)

Capturing the microscopic interactions that determine molecular reactivity poses a challenge across the physical sciences. Even a basic understanding of the underlying reaction mechanisms can substantially accelerate materials and compound design, including the development of new catalysts or drugs. Given the difficulties routinely faced by both experimental and theoretical investigations that aim to improve our mechanistic understanding of a reaction, recent advances have focused on data-driven routes to derive structure-property relationships directly from high-throughput screens. However, even these high-quality, high-volume data are noisy, sparse and biased -- placing them in a regime where machine-learning is extremely challenging. Here we show that a statistical approach based on deep filtering of nonlinear feature networks results in physicochemical models that are more robust, transparent and generalize better than standard machine-learning architectures. Using diligent descriptor design and data post-processing, we exemplify the approach using both literature and fresh data on asymmetric catalytic hydrogenation, Palladium-catalyzed cross-coupling reactions, and drug-drug synergy. We illustrate how the sparse models uncovered by the filtering help us formulate physicochemical reaction ``pharmacophores'', investigate experimental bias and derive strategies for mechanism detection and classification.

[18]  arXiv:1912.04370 (cross-list from eess.AS) [pdf, other]
Title: Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation
Comments: Accepted to ML4H at NeurIPS 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

Multi-language speech datasets are scarce and often have small sample sizes in the medical domain. Robust transfer of linguistic features across languages could improve rates of early diagnosis and therapy for speakers of low-resource languages when detecting health conditions from speech. We utilize out-of-domain, unpaired, single-speaker, healthy speech data for training multiple Optimal Transport (OT) domain adaptation systems. We learn mappings from other languages to English and detect aphasia from linguistic characteristics of speech, and show that OT domain adaptation improves aphasia detection over unilingual baselines for French (6% increased F1) and Mandarin (5% increased F1). Further, we show that adding aphasic data to the domain adaptation system significantly increases performance for both French and Mandarin, increasing the F1 scores further (10% and 8% increase in F1 scores for French and Mandarin, respectively, over unilingual baselines).

[19]  arXiv:1912.04378 (cross-list from cs.LG) [pdf, ps, other]
Title: Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)

Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky highlighted the benefits of depth by presenting a family of functions (based on simple triangular waves) for which DNNs achieve zero classification error, whereas shallow networks with fewer than exponentially many nodes incur constant error. Even though Telgarsky's work reveals the limitations of shallow neural networks, it does not inform us on why these functions are difficult to represent and in fact he states it as a tantalizing open question to characterize those functions that cannot be well-approximated by smaller depths.
In this work, we point to a new connection between DNNs expressivity and Sharkovsky's Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks for representing functions based on the presence of generalized notion of fixed points, called periodic points (a fixed point is a point of period 1). Motivated by our observation that the triangle waves used in Telgarsky's work contain points of period 3 - a period that is special in that it implies chaotic behavior based on the celebrated result by Li-Yorke - we proceed to give general lower bounds for the width needed to represent periodic functions as a function of the depth. Technically, the crux of our approach is based on an eigenvalue analysis of the dynamical system associated with such functions.

[20]  arXiv:1912.04379 (cross-list from cs.PF) [pdf, ps, other]
Title: General Matrix-Matrix Multiplication Using SIMD features of the PIII
Comments: arXiv admin note: substantial text overlap with arXiv:1911.05181
Journal-ref: Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing (2000) Pages 980-983
Subjects: Performance (cs.PF); Machine Learning (stat.ML)

Generalised matrix-matrix multiplication forms the kernel of many mathematical algorithms. A faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices using the floating point Intel Pentium SIMD (Single Instruction Multiple Data) architecture. A description of the issues and our solution is presented, paying attention to all levels of the memory hierarchy. Our results demonstrate an average performance of 2.09 times faster than the leading public domain matrix-matrix multiply routines.

[21]  arXiv:1912.04381 (cross-list from eess.AS) [pdf]
Title: A Dataset for measuring reading levels in India at scale
Comments: 5 pages, 4 figures, 3 Tables, submitted the paper to ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)

One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 6-14. The dataset consists of 5,300 subjects generating 81,658 labeled audio clips in Hindi, Marathi and English. These labels represent expert opinions on the ability of the child to read at a specified level. Using this dataset, we built a simple ASR-based classifier. Early results indicate that we can achieve a prediction accuracy of 86 percent for the English language. Considering the ASER survey spans half a million subjects, this dataset can grow to those scales.

[22]  arXiv:1912.04391 (cross-list from cs.LG) [pdf, other]
Title: Semi-supervised Learning Approach to Generate Neuroimaging Modalities with Adversarial Training
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Magnetic Resonance Imaging (MRI) of the brain can come in the form of different modalities such as T1-weighted and Fluid Attenuated Inversion Recovery (FLAIR) which has been used to investigate a wide range of neurological disorders. Current state-of-the-art models for brain tissue segmentation and disease classification require multiple modalities for training and inference. However, the acquisition of all of these modalities are expensive, time-consuming, inconvenient and the required modalities are often not available. As a result, these datasets contain large amounts of \emph{unpaired} data, where examples in the dataset do not contain all modalities. On the other hand, there is smaller fraction of examples that contain all modalities (\emph{paired} data) and furthermore each modality is high dimensional when compared to number of datapoints. In this work, we develop a method to address these issues with semi-supervised learning in translating between two neuroimaging modalities. Our proposed model, Semi-Supervised Adversarial CycleGAN (SSA-CGAN), uses an adversarial loss to learn from \emph{unpaired} data points, cycle loss to enforce consistent reconstructions of the mappings and another adversarial loss to take advantage of \emph{paired} data points. Our experiments demonstrate that our proposed framework produces an improvement in reconstruction error and reduced variance for the pairwise translation of multiple modalities and is more robust to thermal noise when compared to existing methods.

[23]  arXiv:1912.04408 (cross-list from eess.SY) [pdf, other]
Title: Exploiting Model Sparsity in Adaptive MPC: A Compressed Sensing Viewpoint
Comments: Both authors contributed equally. arXiv admin note: text overlap with arXiv:1804.09790
Subjects: Systems and Control (eess.SY); Machine Learning (stat.ML)

This paper proposes an Adaptive Stochastic Model Predictive Control (MPC) strategy for stable linear time-invariant systems in the presence of bounded disturbances. We consider multi-input, multi-output systems that can be expressed by a Finite Impulse Response (FIR) model. The parameters of the FIR model corresponding to each output are unknown but assumed sparse. We estimate these parameters using the Recursive Least Squares algorithm. The estimates are then improved using set-based bounds obtained by solving the Basis Pursuit Denoising [1] problem. Our approach is able to handle hard input constraints and probabilistic output constraints. Using tools from distributionally robust optimization, we reformulate the probabilistic output constraints as tractable convex second-order cone constraints, which enables us to pose our MPC design task as a convex optimization problem. The efficacy of the developed algorithm is highlighted with a thorough numerical example, where we demonstrate performance gain over the counterpart algorithm of [2], which does not utilize the sparsity information of the system impulse response parameters during control design.

[24]  arXiv:1912.04427 (cross-list from cs.LG) [pdf, other]
Title: Winning the Lottery with Continuous Sparsification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The Lottery Ticket Hypothesis from Frankle & Carbin (2019) conjectures that, for typically-sized neural networks, it is possible to find small sub-networks which train faster and yield superior performance than their original counterparts. The proposed algorithm to search for "winning tickets", Iterative Magnitude Pruning, consistently finds sub-networks with $90-95\%$ less parameters which train faster and better than the overparameterized models they were extracted from, creating potential applications to problems such as transfer learning.
In this paper, we propose Continuous Sparsification, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the sub-network's structure with gradient-based methods instead of relying on pruning strategies. We show empirically that our method is capable of finding tickets that outperforms the ones learned by Iterative Magnitude Pruning, and at the same time providing faster search, when measured in number of training epochs or wall-clock time.

[25]  arXiv:1912.04472 (cross-list from cs.LG) [pdf, other]
Title: Deep Bayesian Reward Learning from Preferences
Comments: Workshop on Safety and Robustness in Decision Making at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 2019
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy. However, Bayesian IRL is computationally intractable for high-dimensional problems because each sample from the posterior requires solving an entire Markov Decision Process (MDP). While there exist non-Bayesian deep IRL methods, these methods typically infer point estimates of reward functions, precluding rigorous safety and uncertainty analysis. We propose Bayesian Reward Extrapolation (B-REX), a highly efficient, preference-based Bayesian reward learning algorithm that scales to high-dimensional, visual control tasks. Our approach uses successor feature representations and preferences over demonstrations to efficiently generate samples from the posterior distribution over the demonstrator's reward function without requiring an MDP solver. Using samples from the posterior, we demonstrate how to calculate high-confidence bounds on policy performance in the imitation learning setting, in which the ground-truth reward function is unknown. We evaluate our proposed approach on the task of learning to play Atari games via imitation learning from pixel inputs, with no access to the game score. We demonstrate that B-REX learns imitation policies that are competitive with a state-of-the-art deep imitation learning method that only learns a point estimate of the reward function. Furthermore, we demonstrate that samples from the posterior generated via B-REX can be used to compute high-confidence performance bounds for a variety of evaluation policies. We show that high-confidence performance bounds are useful for accurately ranking different evaluation policies when the reward function is unknown. We also demonstrate that high-confidence performance bounds may be useful for detecting reward hacking.

[26]  arXiv:1912.04508 (cross-list from cs.LG) [pdf, ps, other]
Title: Reducing Catastrophic Forgetting in Modular Neural Networks by Dynamic Information Balancing
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Lifelong learning is a very important step toward realizing robust autonomous artificial agents. Neural networks are the main engine of deep learning, which is the current state-of-the-art technique in formulating adaptive artificial intelligent systems. However, neural networks suffer from catastrophic forgetting when stressed with the challenge of continual learning. We investigate how to exploit modular topology in neural networks in order to dynamically balance the information load between different modules by routing inputs based on the information content in each module so that information interference is minimized. Our dynamic information balancing (DIB) technique adapts a reinforcement learning technique to guide the routing of different inputs based on a reward signal derived from a measure of the information load in each module. Our empirical results show that DIB combined with elastic weight consolidation (EWC) regularization outperforms models with similar capacity and EWC regularization across different task formulations and datasets.

[27]  arXiv:1912.04511 (cross-list from cs.LG) [pdf, other]
Title: A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
Authors: Pan Xu, Quanquan Gu
Comments: 23 pages, 1 table. Under review by ICLR 2020
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

[28]  arXiv:1912.04521 (cross-list from cs.LG) [pdf, other]
Title: Transfer Learning-Based Outdoor Position Recovery with Telco Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Telecommunication (Telco) outdoor position recovery aims to localize outdoor mobile devices by leveraging measurement report (MR) data. Unfortunately, Telco position recovery requires sufficient amount of MR samples across different areas and suffers from high data collection cost. For an area with scarce MR samples, it is hard to achieve good accuracy. In this paper, by leveraging the recently developed transfer learning techniques, we design a novel Telco position recovery framework, called TLoc, to transfer good models in the carefully selected source domains (those fine-grained small subareas) to a target one which originally suffers from poor localization accuracy. Specifically, TLoc introduces three dedicated components: 1) a new coordinate space to divide an area of interest into smaller domains, 2) a similarity measurement to select best source domains, and 3) an adaptation of an existing transfer learning approach. To the best of our knowledge, TLoc is the first framework that demonstrates the efficacy of applying transfer learning in the Telco outdoor position recovery. To exemplify, on the 2G GSM and 4G LTE MR datasets in Shanghai, TLoc outperforms a nontransfer approach by 27.58% and 26.12% less median errors, and further leads to 47.77% and 49.22% less median errors than a recent fingerprinting approach NBL.

[29]  arXiv:1912.04523 (cross-list from cs.CV) [pdf, other]
Title: Context-Dependent Models for Predicting and Characterizing Facial Expressiveness
Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)

In recent years, extensive research has emerged in affective computing on topics like automatic emotion recognition and determining the signals that characterize individual emotions. Much less studied, however, is expressiveness, or the extent to which someone shows any feeling or emotion. Expressiveness is related to personality and mental health and plays a crucial role in social interaction. As such, the ability to automatically detect or predict expressiveness can facilitate significant advancements in areas ranging from psychiatric care to artificial social intelligence. Motivated by these potential applications, we present an extension of the BP4D+ dataset with human ratings of expressiveness and develop methods for (1) automatically predicting expressiveness from visual data and (2) defining relationships between interpretable visual signals and expressiveness. In addition, we study the emotional context in which expressiveness occurs and hypothesize that different sets of signals are indicative of expressiveness in different contexts (e.g., in response to surprise or in response to pain). Analysis of our statistical models confirms our hypothesis. Consequently, by looking at expressiveness separately in distinct emotional contexts, our predictive models show significant improvements over baselines and achieve comparable results to human performance in terms of correlation with the ground truth.

[30]  arXiv:1912.04527 (cross-list from cs.LG) [pdf, other]
Title: Learning Pose Estimation for UAV Autonomous Navigation andLanding Using Visual-Inertial Sensor Data
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)

In this work, we propose a robust network-in-the-loop control system that allows an Unmanned-Aerial-Vehicles to navigate and land autonomously ona desired target. To estimate the global pose of theaerial vehicle, we develop a deep neural network ar-chitecture for visual-inertial odometry, which providesa robust alternative to traditional techniques for au-tonomous navigation of Unmanned-Aerial-Vehicles. Wefirst provide experimental results on the accuracy ofthe estimation by comparing the prediction of our modelto traditional visual-inertial approaches on the publiclyavailable EuRoC MAV dataset. The results indicate aclear improvement in the accuracy of the pose estima-tion up to 25% against the baseline. Second, we useAirsim, a simulator available as a plugin for UnrealEngine, to create new datasets of photorealistic imagesand inertial measurement to train and test our model.We finally integrate the proposed architecture for globallocalization with the Airsim closed-loop control system,and we provide simulation results for the autonomouslanding of the aerial vehicle.

[31]  arXiv:1912.04530 (cross-list from cs.LG) [pdf, ps, other]
Title: No-Trick (Treat) Kernel Adaptive Filtering using Deterministic Features
Comments: 12 pages, 7 figures
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Kernel methods form a powerful, versatile, and theoretically-grounded unifying framework to solve nonlinear problems in signal processing and machine learning. The standard approach relies on the kernel trick to perform pairwise evaluations of a kernel function, which leads to scalability issues for large datasets due to its linear and superlinear growth with respect to the training data. A popular approach to tackle this problem, known as random Fourier features (RFFs), samples from a distribution to obtain the data-independent basis of a higher finite-dimensional feature space, where its dot product approximates the kernel function. Recently, deterministic, rather than random construction has been shown to outperform RFFs, by approximating the kernel in the frequency domain using Gaussian quadrature. In this paper, we view the dot product of these explicit mappings not as an approximation, but as an equivalent positive-definite kernel that induces a new finite-dimensional reproducing kernel Hilbert space (RKHS). This opens the door to no-trick (NT) online kernel adaptive filtering (KAF) that is scalable and robust. Random features are prone to large variances in performance, especially for smaller dimensions. Here, we focus on deterministic feature-map construction based on polynomial-exact solutions and show their superiority over random constructions. Without loss of generality, we apply this approach to classical adaptive filtering algorithms and validate the methodology to show that deterministic features are faster to generate and outperform state-of-the-art kernel methods based on random Fourier features.

[32]  arXiv:1912.04533 (cross-list from cs.LG) [pdf, other]
Title: Exact expressions for double descent and implicit regularization via surrogate random design
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)

Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the over-parameterized regime. We build on recent advances in Randomized Numerical Linear Algebra (RandNLA) to provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing what we call a surrogate random design to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for over-parameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridge-regularized least squares problem on the population distribution.

[33]  arXiv:1912.04549 (cross-list from cs.LG) [pdf, other]
Title: Expansion of Cyber Attack Data From Unbalanced Datasets Using Generative Techniques
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Machine learning techniques help to understand patterns of a dataset to create a defense mechanism against cyber attacks. However, it is difficult to construct a theoretical model due to the imbalances in the dataset for discriminating attacks from the overall dataset. Multilayer Perceptron (MLP) technique will provide improvement in accuracy and increase the performance of detecting the attack and benign data from a balanced dataset. We have worked on the UGR'16 dataset publicly available for this work. Data wrangling has been done due to prepare test set from in the original set. We fed the neural network classifier larger input to the neural network in an increasing manner (i.e. 10000, 50000, 1 million) to see the distribution of features over the accuracy. We have implemented a GAN model that can produce samples of different attack labels (e.g. blacklist, anomaly spam, ssh scan). We have been able to generate as many samples as necessary based on the data sample we have taken from the UGR'16. We have tested the accuracy of our model with the imbalance dataset initially and then with the increasing the attack samples and found improvement of classification performance for the latter.

[34]  arXiv:1912.04556 (cross-list from cs.LG) [pdf]
Title: Accurate Entrance Position Detection Based on Wi-Fi and GPS Signals Using Machine Learning
Authors: Ahmad Abadleh
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

This paper aims at detecting an accurate position of the main entrance of the buildings. The proposed approach relies on the fact that the GPS signals drop significantly when the user enters a building. Moreover, as most of the public buildings provide Wi-Fi services, the Wi-Fi received signal strength (RSS) can be utilized in order to detect the entrance of the buildings. The rationale behind this paper is that the GPS signals decrease as the user gets close to the main entrance and the Wi-Fi signal increases as the user approaches the main entrance. Several real experiments have been conducted in order to guarantee the feasibility of the proposed approach. The experiment results have shown an interesting result and the accuracy of the whole system was one meter

[35]  arXiv:1912.04565 (cross-list from q-fin.TR) [pdf, other]
Title: Market Price of Trading Liquidity Risk and Market Depth
Comments: 46 Pages, 12 Figures, To appear in the International Journal of Theoretical and Applied Finance
Subjects: Trading and Market Microstructure (q-fin.TR); Econometrics (econ.EM); Computational Finance (q-fin.CP); Mathematical Finance (q-fin.MF); Computation (stat.CO)

Price impact of a trade is an important element in pre-trade and post-trade analyses. We introduce a framework to analyze the market price of liquidity risk, which allows us to derive an inhomogeneous Bernoulli ordinary differential equation. We obtain two closed form solutions, one of which reproduces the linear function of the order flow in Kyle (1985) for informed traders. However, when traders are not as asymmetrically informed, an S-shape function of the order flow is obtained. We perform an empirical intra-day analysis on Nikkei futures to quantify the price impact of order flow and compare our results with industry's heuristic price impact functions. Our model of order flow yields a rich framework for not only to estimate the liquidity risk parameters, but also to provide a plausible cause of why volatility and correlation are stochastic in nature. Finally, we find that the market depth encapsulates the market price of liquidity risk.

[36]  arXiv:1912.04635 (cross-list from cs.LG) [pdf, ps, other]
Title: Backprop Diffusion is Biologically Plausible
Comments: 6 pages, 3 figures
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

The Backpropagation algorithm relies on the abstraction of using a neural model that gets rid of the notion of time, since the input is mapped instantaneously to the output. In this paper, we claim that this abstraction of ignoring time, along with the abrupt input changes that occur when feeding the training set, are in fact the reasons why, in some papers, Backprop biological plausibility is regarded as an arguable issue. We show that as soon as a deep feedforward network operates with neurons with time-delayed response, the backprop weight update turns out to be the basic equation of a biologically plausible diffusion process based on forward-backward waves. We also show that such a process very well approximates the gradient for inputs that are not too fast with respect to the depth of the network. These remarks somewhat disclose the diffusion process behind the backprop equation and leads us to interpret the corresponding algorithm as a degeneration of a more general diffusion process that takes place also in neural networks with cyclic connections.

[37]  arXiv:1912.04661 (cross-list from econ.EM) [pdf, other]
Title: Adaptive Dynamic Model Averaging with an Application to House Price Forecasting
Subjects: Econometrics (econ.EM); Methodology (stat.ME)

Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a state-of-the-art non-parametric model combination algorithm from the prediction with expert advice literature, which offers finite-time performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications.

[38]  arXiv:1912.04684 (cross-list from cs.LG) [pdf, other]
Title: Neural Network Based Explicit MPC for Chemical Reactor Control
Comments: Preprint submitted to Acta Chimica Slovaca, ISSN: 1339-3065
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)

In this paper, we show the implementation of deep neural networks applied in process control. In our approach, we based the training of the neural network on model predictive control. Model predictive control is popular for its ability to be tuned by the weighting matrices and by the fact that it respects the constraints. We present the neural network that can approximate the behavior of the MPC in the way of mimicking the control input trajectory while the constraints on states and control input remain unimpaired of the value of the weighting matrices. This approach is demonstrated in a simulation case study involving a continuous stirred tank reactor, where multi-component chemical reaction takes place.

[39]  arXiv:1912.04690 (cross-list from cs.LG) [pdf]
Title: Reconstructing Multi-echo Magnetic Resonance Images via Structured Deep Dictionary Learning
Comments: Final version accepted at Neurocomputing
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)

Multi-echo magnetic resonance (MR) images are acquired by changing the echo times (for T2 weighted) or relaxation times (for T1 weighted) of scans. The resulting (multi-echo) images are usually used for quantitative MR imaging. Acquiring MR images is a slow process and acquiring multi scans of the same cross section for multi-echo imaging is even slower. In order to accelerate the scan, compressed sensing (CS) based techniques have been advocating partial K-space (Fourier domain) scans; the resulting images are reconstructed via structured CS algorithms. In recent times, it has been shown that instead of using off-the-shelf CS, better results can be obtained by adaptive reconstruction algorithms based on structured dictionary learning. In this work, we show that the reconstruction results can be further improved by using structured deep dictionaries. Experimental results on real datasets show that by using our proposed technique the scan-time can be cut by half compared to the state-of-the-art.

[40]  arXiv:1912.04695 (cross-list from cs.LG) [pdf, other]
Title: Transparent Classification with Multilayer Logical Perceptrons and Random Binarization
Comments: AAAI-20 oral
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Models with transparent inner structure and high classification performance are required to reduce potential risk and provide trust for users in domains like health care, finance, security, etc. However, existing models are hard to simultaneously satisfy the above two properties. In this paper, we propose a new hierarchical rule-based model for classification tasks, named Concept Rule Sets (CRS), which has both a strong expressive ability and a transparent inner structure. To address the challenge of efficiently learning the non-differentiable CRS model, we propose a novel neural network architecture, Multilayer Logical Perceptron (MLLP), which is a continuous version of CRS. Using MLLP and the Random Binarization (RB) method we proposed, we can search the discrete solution of CRS in continuous space using gradient descent and ensure the discrete CRS acts almost the same as the corresponding continuous MLLP. Experiments on 12 public data sets show that CRS outperforms the state-of-the-art approaches and the complexity of the learned CRS is close to the simple decision tree.

[41]  arXiv:1912.04734 (cross-list from cs.LG) [pdf]
Title: Transformed Subspace Clustering
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Subspace clustering assumes that the data is sepa-rable into separate subspaces. Such a simple as-sumption, does not always hold. We assume that, even if the raw data is not separable into subspac-es, one can learn a representation (transform coef-ficients) such that the learnt representation is sep-arable into subspaces. To achieve the intended goal, we embed subspace clustering techniques (locally linear manifold clustering, sparse sub-space clustering and low rank representation) into transform learning. The entire formulation is jointly learnt; giving rise to a new class of meth-ods called transformed subspace clustering (TSC). In order to account for non-linearity, ker-nelized extensions of TSC are also proposed. To test the performance of the proposed techniques, benchmarking is performed on image clustering and document clustering datasets. Comparison with state-of-the-art clustering techniques shows that our formulation improves upon them.

[42]  arXiv:1912.04747 (cross-list from cs.LG) [pdf, other]
Title: Oversampling Log Messages Using a Sequence Generative Adversarial Network for Anomaly Detection and Classification
Comments: 23 pages, 4 figures, 2 tables. arXiv admin note: text overlap with arXiv:1911.08744
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Dealing with imbalanced data is one the main challenges in machine/deep learning algorithms for classification. This issue is more important with log message data as it is typically imbalanced and negative logs are rare. In this paper, a model is proposed to generate text log messages using a SeqGAN network. Then features are extracted using an Autoencoder and anomaly detection and classification is done using a GRU network. The proposed model is evaluated with two imbalanced log data sets, namely BGL and Openstack. Results are presented which show that oversampling and balancing data increases the accuracy of anomaly detection and classification.

[43]  arXiv:1912.04754 (cross-list from cs.LG) [pdf]
Title: Deep Latent Factor Model for Collaborative Filtering
Comments: This is an initial draft of the accepted paper at Elsevier Signal Processing
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Latent factor models have been used widely in collaborative filtering based recommender systems. In recent years, deep learning has been successful in solving a wide variety of machine learning problems. Motivated by the success of deep learning, we propose a deeper version of latent factor model. Experiments on benchmark datasets shows that our proposed technique significantly outperforms all state-of-the-art collaborative filtering techniques.

[44]  arXiv:1912.04783 (cross-list from cs.LG) [pdf, other]
Title: Removable and/or Repeated Units Emerge in Overparametrized Deep Neural Networks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Deep neural networks (DNNs) perform well on a variety of tasks despite the fact that most networks used in practice are vastly overparametrized and even capable of perfectly fitting randomly labeled data. Recent evidence suggests that developing compressible representations is key for adjusting the complexity of overparametrized networks to the task at hand. In this paper, we provide new empirical evidence that supports this hypothesis by identifying two types of units that emerge when the network's width is increased: removable units which can be dropped out of the network without significant change to the output and repeated units whose activities are highly correlated with other units. The emergence of these units implies capacity constraints as the function the network represents could be expressed by a smaller network without these units. In a series of experiments with AlexNet, ResNet and Inception networks in the CIFAR-10 and ImageNet datasets, and also using shallow networks with synthetic data, we show that DNNs consistently increase either the number of removable units, repeated units, or both at greater widths for a comprehensive set of hyperparameters. These results suggest that the mechanisms by which networks in the deep learning regime adjust their complexity operate at the unit level and highlight the need for additional research into what drives the emergence of such units.

[45]  arXiv:1912.04792 (cross-list from cs.LG) [pdf, other]
Title: On Certifying Robust Models by Polyhedral Envelope
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Certifying neural networks enables one to offer guarantees on a model's robustness. In this work, we use linear approximation to obtain an upper and lower bound of the model's output when the input data is perturbed within a predefined adversarial budget. This allows us to bound the adversary-free region in the data neighborhood by a polyhedral envelope, and calculate robustness guarantees based on this geometric approximation. Compared with existing methods, our approach gives a finer-grain quantitative evaluation of a model's robustness. Therefore, the certification method can not only obtain better certified bounds than the state-of-the-art techniques given the same adversarial budget but also derives a faster search scheme for the optimal adversarial budget. Furthermore, we introduce a simple regularization scheme based on our method that enables us to effectively train robust models.

[46]  arXiv:1912.04825 (cross-list from cs.LG) [pdf, other]
Title: Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery
Comments: 11 pages, 8 figures
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)

Symbolic regression is a powerful technique that can discover analytical equations that describe data, which can lead to explainable models and generalizability outside of the training data set. In contrast, neural networks have achieved amazing levels of accuracy on image recognition and natural language processing tasks, but are often seen as black-box models that are difficult to interpret and typically extrapolate poorly. Here we use a neural network-based architecture for symbolic regression that we call the Sequential Equation Learner (SEQL) network and integrate it with other deep learning architectures such that the whole system can be trained end-to-end through backpropagation. To demonstrate the power of such systems, we study their performance on several substantially different tasks. First, we show that the neural network can perform symbolic regression and learn the form of several functions. Next, we present an MNIST arithmetic task where a separate part of the neural network extracts the digits. Finally, we demonstrate prediction of dynamical systems where an unknown parameter is extracted through an encoder. We find that the EQL-based architecture can extrapolate quite well outside of the training data set compared to a standard neural network-based architecture, paving the way for deep learning to be applied in scientific exploration and discovery.

[47]  arXiv:1912.04832 (cross-list from cs.LG) [pdf, other]
Title: Feature Relevance Determination for Ordinal Regression in the Context of Feature Redundancies and Privileged Information
Comments: Preprint accepted at Neurocomputing
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)

Advances in machine learning technologies have led to increasingly powerful models in particular in the context of big data. Yet, many application scenarios demand for robustly interpretable models rather than optimum model accuracy; as an example, this is the case if potential biomarkers or causal factors should be discovered based on a set of given measurements. In this contribution, we focus on feature selection paradigms, which enable us to uncover relevant factors of a given regularity based on a sparse model. We focus on the important specific setting of linear ordinal regression, i.e.\ data have to be ranked into one of a finite number of ordered categories by a linear projection. Unlike previous work, we consider the case that features are potentially redundant, such that no unique minimum set of relevant features exists. We aim for an identification of all strongly and all weakly relevant features as well as their type of relevance (strong or weak); we achieve this goal by determining feature relevance bounds, which correspond to the minimum and maximum feature relevance, respectively, if searched over all equivalent models. In addition, we discuss how this setting enables us to substitute some of the features, e.g.\ due to their semantics, and how to extend the framework of feature relevance intervals to the setting of privileged information, i.e.\ potentially relevant information is available for training purposes only, but cannot be used for the prediction itself.

[48]  arXiv:1912.04838 (cross-list from cs.CV) [pdf, other]
Title: Scalability in Perception for Autonomous Driving: An Open Dataset Benchmark
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)

The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with real-world self-driving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more up-to-date information at this http URL

[49]  arXiv:1912.04845 (cross-list from cs.LG) [pdf, other]
Title: Magnitude and Uncertainty Pruning Criterion for Neural Networks
Comments: 10 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Neural networks have achieved dramatic improvements in recent years and depict the state-of-the-art methods for many real-world tasks nowadays. One drawback is, however, that many of these models are overparameterized, which makes them both computationally and memory intensive. Furthermore, overparameterization can also lead to undesired overfitting side-effects. Inspired by recently proposed magnitude-based pruning schemes and the Wald test from the field of statistics, we introduce a novel magnitude and uncertainty (M&U) pruning criterion that helps to lessen such shortcomings. One important advantage of our M&U pruning criterion is that it is scale-invariant, a phenomenon that the magnitude-based pruning criterion suffers from. In addition, we present a ``pseudo bootstrap'' scheme, which can efficiently estimate the uncertainty of the weights by using their update information during training. Our experimental evaluation, which is based on various neural network architectures and datasets, shows that our new criterion leads to more compressed models compared to models that are solely based on magnitude-based pruning criteria, with, at the same time, less loss in predictive power.

[50]  arXiv:1912.04858 (cross-list from math.PR) [pdf, ps, other]
Title: Rates of convergence to the local time of Oscillating and Skew Brownian Motions
Authors: Sara Mazzonetto
Subjects: Probability (math.PR); Statistics Theory (math.ST)

In this paper a class of statistics based on high frequency observations of oscillating Brownian motions and skew Brownian motions is considered. Their convergence rate towards the local time of the underling process is obtained in form of a Central Limit Theorem.

[51]  arXiv:1912.04862 (cross-list from cs.LG) [pdf, other]
Title: Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
Comments: 26 pages
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)

Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physics-informed neural networks for the solution of partial differential equations.

[52]  arXiv:1912.04871 (cross-list from cs.LG) [pdf, other]
Title: Deep symbolic regression: Recovering mathematical expressions from data via policy gradients
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of symbolic regression. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are lacking. We propose a framework that combines deep learning with symbolic regression via a simple idea: use a large model to search the space of small models. More specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions, and employ reinforcement learning to train the network to generate better-fitting expressions. Our algorithm significantly outperforms standard genetic programming-based symbolic regression in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate a priori constraints in situ.

Replacements for Wed, 11 Dec 19

[53]  arXiv:1709.01062 (replaced) [pdf, ps, other]
Title: A hierarchical loss and its problems when classifying non-hierarchically
Comments: 19 pages, 4 figures, 7 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[54]  arXiv:1806.09548 (replaced) [pdf, other]
Title: Learning dynamical systems with particle stochastic approximation EM
Subjects: Computation (stat.CO); Computational Engineering, Finance, and Science (cs.CE); Signal Processing (eess.SP); Machine Learning (stat.ML)
[55]  arXiv:1811.08968 (replaced) [pdf, other]
Title: Spread Divergences
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[56]  arXiv:1812.01097 (replaced) [pdf, other]
Title: LEAF: A Benchmark for Federated Settings
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[57]  arXiv:1812.07929 (replaced) [pdf, other]
Title: Importance Sampling-based Transport Map Hamiltonian Monte Carlo for Bayesian Hierarchical Models
Subjects: Computation (stat.CO)
[58]  arXiv:1901.10837 (replaced) [pdf, other]
Title: Noise-tolerant fair classification
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
[59]  arXiv:1902.00610 (replaced) [pdf, other]
Title: On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems
Comments: Advances in Neural Information Processing Systems 32 (NIPS 2019)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[60]  arXiv:1902.04972 (replaced) [pdf, other]
Title: Provable Low Rank Phase Retrieval and Compressive PCA
Comments: A short version of this work is in ICML 2019, this longer version is revised and resubmitted to IEEE Trans. Info. Th
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[61]  arXiv:1902.11153 (replaced) [pdf, other]
Title: On the generalization of GAN image forensics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[62]  arXiv:1903.08114 (replaced) [pdf, other]
Title: Exact Gaussian Processes on a Million Data Points
Comments: Published at NeurIPS 2019
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[63]  arXiv:1904.03920 (replaced) [pdf, other]
Title: A Generalization Bound for Online Variational Inference
Comments: Published in the proceedings of ACML 2019
Journal-ref: Proceedings in Machine Learning Research, 2019, vol. 101, pp. 662-677
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)
[64]  arXiv:1904.11132 (replaced) [pdf, other]
Title: TreeGrad: Transferring Tree Ensembles to Neural Networks
Authors: Chapman Siu
Comments: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: this https URL Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.11790
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[65]  arXiv:1905.00441 (replaced) [pdf, other]
Title: NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[66]  arXiv:1905.12558 (replaced) [pdf, other]
Title: Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[67]  arXiv:1906.00696 (replaced) [pdf, other]
Title: Transformed Central Quantile Subspace
Authors: Eliana Christou
Comments: arXiv admin note: text overlap with arXiv:1906.00694
Subjects: Methodology (stat.ME)
[68]  arXiv:1906.03849 (replaced) [pdf, other]
Title: Robustness Verification of Tree-based Models
Comments: Hongge Chen and Huan Zhang contributed equally
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[69]  arXiv:1906.05200 (replaced) [pdf, other]
Title: Macro-action Multi-time scale Dynamic Programming for Energy Management in Buildings with Phase Change Materials
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[70]  arXiv:1906.10075 (replaced) [pdf, other]
Title: Distribution-Independent PAC Learning of Halfspaces with Massart Noise
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
[71]  arXiv:1906.12074 (replaced) [pdf, other]
Title: Recursion scheme for the largest $β$-Wishart-Laguerre eigenvalue and Landauer conductance in quantum transport
Comments: Published version; 20 pages, 2 figures in the main text + 2 in the Mathematica code towards the end
Journal-ref: Journal of Physics A: Mathematical and Theoretical, Volume 52, Page 42LT02, Year 2019
Subjects: Mathematical Physics (math-ph); Mesoscale and Nanoscale Physics (cond-mat.mes-hall); Statistical Mechanics (cond-mat.stat-mech); Applications (stat.AP); Computation (stat.CO)
[72]  arXiv:1906.12331 (replaced) [pdf, ps, other]
Title: Modeling Food Popularity Dependencies using Social Media data
Comments: 5 pages
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
[73]  arXiv:1907.06560 (replaced) [pdf]
Title: Eliciting Priors for Bayesian Prediction of Daily Response Propensity in Responsive Survey Design: Historical Data Analysis vs. Literature Review
Comments: 47 pages, 10 figures, two tables
Subjects: Methodology (stat.ME); Applications (stat.AP)
[74]  arXiv:1907.09617 (replaced) [pdf, other]
Title: Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many Locations
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
[75]  arXiv:1907.11792 (replaced) [pdf, other]
Title: Maximum Causal Entropy Specification Inference from Demonstrations
Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
[76]  arXiv:1908.04457 (replaced) [pdf, other]
Title: On the Convergence of AdaBound and its Connection to SGD
Authors: Pedro Savarese
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[77]  arXiv:1908.09377 (replaced) [pdf, other]
Title: Probabilistic Forecasting of the Arctic Sea Ice Edge with Contour Modeling
Subjects: Applications (stat.AP)
[78]  arXiv:1909.00719 (replaced) [pdf, other]
Title: Pathologies of Factorised Gaussian and MC Dropout Posteriors in Bayesian Neural Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[79]  arXiv:1909.02940 (replaced) [pdf, ps, other]
Title: Reinforcement Learning with Non-Markovian Rewards
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[80]  arXiv:1909.04239 (replaced) [pdf, other]
Title: PMD: An Optimal Transportation-based User Distance for Recommender Systems
Comments: This paper is accepted by European Conference on Information Retrieval (ECIR 2020)
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[81]  arXiv:1909.06342 (replaced) [pdf, ps, other]
Title: Explainable Machine Learning in Deployment
Comments: Accepted to the ACM Conference on Fairness, Accountability, and Transparency (ACM FAT* 2020)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
[82]  arXiv:1909.10024 (replaced) [pdf, other]
Title: Distribution-free consistent independence tests via Hallin's multivariate rank
Comments: In this (3rd) version, we added more references
Subjects: Statistics Theory (math.ST)
[83]  arXiv:1910.08520 (replaced) [pdf, other]
Title: Optimization Hierarchy for Fair Statistical Decision Problems
Subjects: Statistics Theory (math.ST); Optimization and Control (math.OC)
[84]  arXiv:1910.10196 (replaced) [pdf, other]
Title: Online Meta-Learning on Non-convex Setting
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[85]  arXiv:1910.12566 (replaced) [pdf, other]
Title: The spectral dimension of simplicial complexes: a renormalization group theory
Comments: (30 pages, 5 figures)
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Statistical Mechanics (cond-mat.stat-mech); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)
[86]  arXiv:1911.00197 (replaced) [pdf, other]
Title: Phase transitions and optimal algorithms for semi-supervised classifications on graphs: from belief propagation to graph convolution network
Comments: 18 pages, 21 figures
Subjects: Statistical Mechanics (cond-mat.stat-mech); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)
[87]  arXiv:1911.04489 (replaced) [pdf, other]
Title: Making Good on LSTMs' Unfulfilled Promise
Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. arXiv admin note: text overlap with arXiv:1812.02340
Subjects: Machine Learning (cs.LG); Computational Finance (q-fin.CP); Portfolio Management (q-fin.PM); Machine Learning (stat.ML)
[88]  arXiv:1911.07891 (replaced) [pdf, other]
Title: Basic Principles of Clustering Methods
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[89]  arXiv:1911.11607 (replaced) [pdf, other]
Title: Deep Learning with Gaussian Differential Privacy
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[90]  arXiv:1911.11610 (replaced) [pdf, other]
Title: Improving EEG based Continuous Speech Recognition
Comments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1911.04261, arXiv:1906.08871
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[91]  arXiv:1912.01792 (replaced) [pdf, ps, other]
Title: Learn Electronic Health Records by Fully Decentralized Federated Learning
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[92]  arXiv:1912.01823 (replaced) [pdf, other]
Title: Domain-independent Dominance of Adaptive Methods
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[93]  arXiv:1912.02427 (replaced) [pdf, other]
Title: Analysis of the Optimization Landscapes for Overcomplete Representation Learning
Comments: 68 pages, 5 figures
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
[94]  arXiv:1912.03011 (replaced) [pdf, other]
Title: A priori generalization error for two-layer ReLU neural network through minimum norm solution
Comments: 15 pages,1 figure
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[95]  arXiv:1912.04151 (replaced) [pdf, other]
Title: Identification of causal intervention effects under contagion
Subjects: Applications (stat.AP); Statistics Theory (math.ST); Populations and Evolution (q-bio.PE)
[ total of 95 entries: 1-95 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 1912, contact, help  (Access key information)