Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Wed, 11 Dec 19
 [1] arXiv:1912.04406 [pdf, other]

Title: Semiparametric Regression for Dual Population MortalityComments: 28 pages, 8 graphsSubjects: Applications (stat.AP)
Parameter shrinkage applied optimally can always reduce error and projection variances from those of maximum likelihood estimation. Many variables that actuaries use are on numerical scales, like age or year, which require parameters at each point. Rather than shrinking these towards zero, nearby parameters are better shrunk towards each other. Semiparametric regression is a statistical discipline for building curves across parameter classes using shrinkage methodology. It is similar to but more parsimonious than cubic splines. We introduce it in the context of Bayesian shrinkage and apply it to joint mortality modeling for related populations, with Swedish and Danish mortality as an illustration. Bayesian shrinkage of slope changes of linear splines is an approach to semiparametric modeling that evolved in the actuarial literature. It has some theoretical and practical advantages, like closedform curves, direct and transparent determination of degree of shrinkage and of placing knots for the splines, and quantifying goodness of fit. It is also relatively easy to apply to the many nonlinear models that arise in actuarial work.
 [2] arXiv:1912.04432 [pdf]

Title: Variable selection for transportabilityComments: Under ReviewSubjects: Methodology (stat.ME); Other Statistics (stat.OT)
Transportability provides a principled framework to address the problem of applying study results to new populations. Here, we consider the problem of selecting variables to include in transport estimators. We provide a brief overview of the transportability framework and illustrate that while selection diagrams are a vital first step in variable selection, these graphs alone identify a sufficient but not strictly necessary set of variables for generating an unbiased transport estimate. Next, we conduct a simulation experiment assessing the impact of including unnecessary variables on the performance of the parametric gcomputation transport estimator. Our results highlight that the types of variables included can affect the bias, variance, and mean squared error of the estimates. We find that addition of variables that are not causes of the outcome but whose distributions differ between the source and target populations can increase the variance and mean squared error of the transported estimates. On the other hand, inclusion of variables that are causes of the outcome (regardless of whether they modify the causal contrast of interest or differ in distribution between the populations) reduces the variance of the estimates without increasing the bias. Finally, exclusion of variables that cause the outcome but do not modify the causal contrast of interest does not increase bias. These findings suggest that variable selection approaches for transport should prioritize identifying and including all causes of the outcome in the study population rather than focusing on variables whose distribution may differ between the study sample and target population.
 [3] arXiv:1912.04435 [pdf, other]

Title: Stylised Choropleth Maps for New Zealand Regions and District Health BoardsAuthors: Thomas LumleySubjects: Applications (stat.AP)
New Zealand has two toplevel sets of administrative divisions: the District Health Boards and the Regions. In this note I describe a hexagonal layout for creating stylised maps of these divisions, and using colour, size, and triangular subdivisions to compare data between divisions and across multiple variables. I present an implementation in the DHBins package for R using both base graphics and ggplot2; the concepts and specific hexagonal layout could be used in any software.
 [4] arXiv:1912.04439 [pdf, other]

Title: Privacypreserving data sharing via probabilistic modellingSubjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Differential privacy allows quantifying privacy loss from computations on sensitive personal data. This loss grows with the number of accesses to the data, making it hard to open the use of such data while respecting privacy. To avoid this limitation, we propose privacypreserving release of a synthetic version of a data set, which can be used for an unlimited number of analyses with any methods, without affecting the privacy guarantees. The synthetic data generation is based on differentially private learning of a generative probabilistic model which can capture the probability distribution of the original data. We demonstrate empirically that we can reliably reproduce statistical discoveries from the synthetic data. We expect the method to have broad use in sharing anonymized versions of key data sets for research.
 [5] arXiv:1912.04542 [pdf, ps, other]

Title: What is the best predictor that you can compute in five minutes using a given Bayesian hierarchical model?Authors: Jonathan R. BradleySubjects: Methodology (stat.ME)
The goal of this paper is to provide a way for statisticians to answer the question posed in the title of this article using any Bayesian hierarchical model of their choosing and without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of ``big data'' has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a ``data subset model'' to the popular ``data model, process model, and parameter model'' framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies predefined computational constraints. Thus, these hyperparameters effectively calibrates the statistical model to the computer itself to obtain predictions/estimations in a prespecified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semiparametric properties. Furthermore, we show that subsets of normally distributed data are asymptotically partially sufficient under reasonable constraints. Results from a simulated dataset will be presented across different computers, to show the effect of the computer on the statistical analysis. Additionally, we provide a joint spatial analysis of two different environmental datasets.
 [6] arXiv:1912.04571 [pdf, other]

Title: Spatial hierarchical modeling of threshold exceedances using rate mixturesSubjects: Methodology (stat.ME)
We develop new flexible univariate models for lighttailed and heavytailed data, which extend a hierarchical representation of the generalized Pareto (GP) limit for threshold exceedances. These models can accommodate departure from asymptotic threshold stability in finite samples while keeping the asymptotic GP distribution as a special (or boundary) case and can capture the tails and the bulk jointly without losing much flexibility. Spatial dependence is modeled through a latent process, while the data are assumed to be conditionally independent. We design penalized complexity priors for crucial model parameters, shrinking our proposed spatial Bayesian hierarchical model toward a simpler reference whose marginal distributions are GP with moderately heavy tails. Our model can be fitted in fairly high dimensions using Markov chain Monte Carlo by exploiting the Metropolisadjusted Langevin algorithm (MALA), which guarantees fast convergence of Markov chains with efficient block proposals for the latent variables. We also develop an adaptive scheme to calibrate the MALA tuning parameters. Moreover, our models avoid the expensive numerical evaluations of multifold integrals in censored likelihood expressions. We demonstrate our new methodology by simulation and application to a dataset of extreme rainfall episodes that occurred in Germany. Our fitted model provides a satisfactory performance and can be successfully used to predict rainfall extremes at unobserved locations.
 [7] arXiv:1912.04607 [pdf, other]

Title: Controlling false discovery exceedance for heterogeneous testsSubjects: Methodology (stat.ME)
Several classical methods exist for controlling the false discovery exceedance (FDX) for large scale multiple testing problems, among them the LehmannRomano procedure ([LR] below) and the GuoRomano procedure ([GR] below). While these two procedures are the most prominent, they were originally designed for homogeneous test statistics, that is, when the null distribution functions of the $p$values $F_i$, $1\leq i\leq m$, are all equal. In many applications, however, the data are heterogeneous which leads to heterogeneous null distribution functions. Ignoring this heterogeneity usually induces a conservativeness for the aforementioned procedures. In this paper, we develop three new procedures that incorporate the $F_i$'s, while ensuring the FDX control. The heterogeneous version of [LR], denoted [HLR], is based on the arithmetic average of the $F_i$'s, while the heterogeneous version of [GR], denoted [HGR], is based on the geometric average of the $F_i$'s. We also introduce a procedure [PB], that is based on the Poissonbinomial distribution and that uniformly improves [HLR] and [HGR], at the price of a higher computational complexity. Perhaps surprisingly, this shows that, contrary to the known theory of false discovery rate (FDR) control under heterogeneity, the way to incorporate the $F_i$'s can be particularly simple in the case of FDX control, and does not require any further correction term. The performances of the new proposed procedures are illustrated by real and simulated data in two important heterogeneous settings: first, when the test statistics are continuous but the $p$values are weighted by some known independent weight vector, e.g., coming from codata sets; second, when the test statistics are discretely distributed, as is the case for data representing frequencies or counts.
 [8] arXiv:1912.04629 [pdf, ps, other]

Title: Classification under local differential privacyComments: 12 pagesSubjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
We consider the binary classification problem in a setup that preserves the privacy of the original sample. We provide a privacy mechanism that is locally differentially private and then construct a classifier based on the private sample that is universally consistent in Euclidean spaces. Under stronger assumptions, we establish the minimax rates of convergence of the excess risk and see that they are slower than in the case when the original sample is available.
 [9] arXiv:1912.04677 [pdf, other]

Title: Testing and Estimating ChangePoints in the Covariance Matrix of a HighDimensional Time SeriesAuthors: Ansgar StelandSubjects: Statistics Theory (math.ST); Probability (math.PR); Applications (stat.AP)
This paper studies methods for testing and estimating changepoints in the covariance structure of a highdimensional linear time series. The assumed framework allows for a large class of multivariate linear processes (including vector autoregressive moving average (VARMA) models) of growing dimension and spiked covariance models. The approach uses bilinear forms of the centered or noncentered sample variancecovariance matrix. Changepoint testing and estimation are based on maximally selected weighted cumulated sum (CUSUM) statistics. Large sample approximations under a changepoint regime are provided including a multivariate CUSUM transform of increasing dimension. For the unknown asymptotic variance and covariance parameters associated to (pairs of) CUSUM statistics we propose consistent estimators. Based on weak laws of large numbers for their sequential versions, we also consider stopped sample estimation where observations until the estimated changepoint are used. Finite sample properties of the procedures are investigated by simulations and their application is illustrated by analyzing a real data set from environmetrics.
 [10] arXiv:1912.04681 [pdf, other]

Title: Accelerated Sampling on Discrete Spaces with NonReversible Markov ProcessesComments: 31 pages, 8 figuresSubjects: Computation (stat.CO)
We consider the task of MCMC sampling from a distribution defined on a discrete space. Building on recent insights provided in [Zan19], we devise a class of efficient continuoustime, nonreversible algorithms which make active use of the structure of the underlying space. Particular emphasis is placed on how symmetries and other grouptheoretic notions can be used to improve exploration of the space. We test our algorithms on a range of examples from statistics, computational physics, machine learning, and cryptography, which show improvement on alternative algorithms. We provide practical recommendations on how to design and implement these algorithms, and close with remarks on the outlook for both discrete sampling and continuoustime Monte Carlo more broadly.
 [11] arXiv:1912.04738 [pdf, other]

Title: Histogram Transform Ensembles for Largescale RegressionComments: arXiv admin note: text overlap with arXiv:1911.11581Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We propose a novel algorithm for largescale regression problems named histogram transform ensembles (HTE), composed of random rotations, stretchings, and translations. First of all, we investigate the theoretical properties of HTE when the regression function lies in the H\"{o}lder space $C^{k,\alpha}$, $k \in \mathbb{N}_0$, $\alpha \in (0,1]$. In the case that $k=0, 1$, we adopt the constant regressors and develop the na\"{i}ve histogram transforms (NHT). Within the space $C^{0,\alpha}$, although almost optimal convergence rates can be derived for both single and ensemble NHT, we fail to show the benefits of ensembles over single estimators theoretically. In contrast, in the subspace $C^{1,\alpha}$, we prove that if $d \geq 2(1+\alpha)/\alpha$, the lower bound of the convergence rates for single NHT turns out to be worse than the upper bound of the convergence rates for ensemble NHT. In the other case when $k \geq 2$, the NHT may no longer be appropriate in predicting smoother regression functions. Instead, we apply kernel histogram transforms (KHT) equipped with smoother regressors such as support vector machines (SVMs), and it turns out that both single and ensemble KHT enjoy almost optimal convergence rates. Then we validate the above theoretical results by numerical experiments. On the one hand, simulations are conducted to elucidate that ensemble NHT outperform single NHT. On the other hand, the effects of bin sizes on accuracy of both NHT and KHT also accord with theoretical analysis. Last but not least, in the realdata experiments, comparisons between the ensemble KHT, equipped with adaptive histogram transforms, and other stateoftheart largescale regression estimators verify the effectiveness and accuracy of our algorithm.
 [12] arXiv:1912.04753 [pdf]

Title: Optimizing and accelerating spacetime Ripley's K function based on Apache Spark for distributed spatiotemporal point pattern analysisComments: 35 pages, 23 figures, Future Generation Computer SystemsJournalref: Future Generation Computer Systems, 2020Subjects: Computation (stat.CO); Computational Geometry (cs.CG); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Software Engineering (cs.SE)
With increasing point of interest (POI) datasets available with finegrained spatial and temporal attributes, spacetime Ripley's K function has been regarded as a powerful approach to analyze spatiotemporal point process. However, spacetime Ripley's K function is computationally intensive for pointwise distance comparisons, edge correction and simulations for significance testing. Parallel computing technologies like OpenMP, MPI and CUDA have been leveraged to accelerate the K function, and related experiments have demonstrated the substantial acceleration. Nevertheless, previous works have not extended optimization of Ripley's K function from space dimension to spacetime dimension. Without sophisticated spatiotemporal query and partitioning mechanisms, extra computational overhead can be problematic. Meanwhile, these researches were limited by the restricted scalability and relative expensive programming cost of parallel frameworks and impeded their applications for large POI dataset and Ripley's K function variations. This paper presents a distributed computing method to accelerate spacetime Ripley's K function upon stateoftheart distributed computing framework Apache Spark, and four strategies are adopted to simplify calculation procedures and accelerate distributed computing respectively. Based on the optimized method, a webbased visual analytics framework prototype has been developed. Experiments prove the feasibility and time efficiency of the proposed method, and also demonstrate its value on promoting applications of spacetime Ripley's K function in ecology, geography, sociology, economics, urban transportation and other fields.
 [13] arXiv:1912.04758 [pdf, other]

Title: Generalised Network Autoregressive Processes and the GNAR packageSubjects: Methodology (stat.ME)
This article introduces the GNAR package, which fits, predicts, and simulates from a powerful new class of generalised network autoregressive processes. Such processes consist of a multivariate time series along with a real, or inferred, network that provides information about intervariable relationships. The GNAR model relates values of a time series for a given variable and time to earlier values of the same variable and of neighbouring variables, with inclusion controlled by the network structure. The GNAR package is designed to fit this new model, while working with standard ts objects and the igraph package for ease of use.
 [14] arXiv:1912.04869 [pdf, other]

Title: Adaptive Manifold ClusteringSubjects: Statistics Theory (math.ST)
We extend the theoretical study of a recently proposed nonparametric clustering algorithm called Adaptive Weights Clustering (AWC). In particular, we are interested in the case of highdimensional data lying in the vicinity of a lowerdimensional nonlinear submanifold with positive reach. After a slight adjustment and under rather general assumptions for the cluster structure, the algorithm turns out to be nearly optimal in detecting local inhomogeneities, while aggregating homogeneous data with a high probability. We also adress the problem of parameter tuning.
 [15] arXiv:1912.04884 [pdf, other]

Title: Statistically Robust Neural Network ClassificationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Recently there has been much interest in quantifying the robustness of neural network classifiers through adversarial risk metrics. However, for problems where testtime corruptions occur in a probabilistic manner, rather than being generated by an explicit adversary, adversarial metrics typically do not provide an accurate or reliable indicator of robustness. To address this, we introduce a statistically robust risk (SRR) framework which measures robustness in expectation over both network inputs and a corruption distribution. Unlike many adversarial risk metrics, which typically require separate applications on a pointbypoint basis, the SRR can easily be directly estimated for an entire network and used as a training objective in a stochastic gradient scheme. Furthermore, we show both theoretically and empirically that it can scale to higherdimensional networks by providing superior generalization performance compared with comparable adversarial risks.
Crosslists for Wed, 11 Dec 19
 [16] arXiv:1912.04278 (crosslist from eess.IV) [pdf, other]

Title: Deep Efficient Endtoend Reconstruction (DEER) Network for Lowdose Fewview Breast CT from Projection DataSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Breast CT provides image volumes with isotropic resolution in high contrast, enabling detection of clarifications (down to a few hundred microns in size) and subtle density differences. Since breast is sensitive to xray radiation, dose reduction of breast CT is an important topic, and for this purpose lowdose fewview scanning is a main approach. In this article, we propose a Deep Efficient Endtoend Reconstruction (DEER) network for lowdose fewview breast CT. The major merits of our network include high dose efficiency, excellent image quality, and low model complexity. By the design, the proposed network can learn the reconstruction process in terms of as less as O(N) parameters, where N is the size of an image to be reconstructed, which represents orders of magnitude improvements relative to the stateoftheart deeplearning based reconstruction methods that map projection data to tomographic images directly. As a result, our method does not require expensive GPUs to train and run. Also, validated on a conebeam breast CT dataset prepared by Koning Corporation on a commercial scanner, our method demonstrates competitive performance over the stateoftheart reconstruction networks in terms of image quality.
 [17] arXiv:1912.04345 (crosslist from physics.chemph) [pdf, ps, other]

Title: Noisy, sparse, nonlinear: Navigating the Bermuda Triangle of physical inference with deep filteringSubjects: Chemical Physics (physics.chemph); Machine Learning (cs.LG); Machine Learning (stat.ML)
Capturing the microscopic interactions that determine molecular reactivity poses a challenge across the physical sciences. Even a basic understanding of the underlying reaction mechanisms can substantially accelerate materials and compound design, including the development of new catalysts or drugs. Given the difficulties routinely faced by both experimental and theoretical investigations that aim to improve our mechanistic understanding of a reaction, recent advances have focused on datadriven routes to derive structureproperty relationships directly from highthroughput screens. However, even these highquality, highvolume data are noisy, sparse and biased  placing them in a regime where machinelearning is extremely challenging. Here we show that a statistical approach based on deep filtering of nonlinear feature networks results in physicochemical models that are more robust, transparent and generalize better than standard machinelearning architectures. Using diligent descriptor design and data postprocessing, we exemplify the approach using both literature and fresh data on asymmetric catalytic hydrogenation, Palladiumcatalyzed crosscoupling reactions, and drugdrug synergy. We illustrate how the sparse models uncovered by the filtering help us formulate physicochemical reaction ``pharmacophores'', investigate experimental bias and derive strategies for mechanism detection and classification.
 [18] arXiv:1912.04370 (crosslist from eess.AS) [pdf, other]

Title: CrossLanguage Aphasia Detection using Optimal Transport Domain AdaptationAuthors: Aparna Balagopalan, Jekaterina Novikova, Matthew B. A. McDermott, Bret Nestor, Tristan Naumann, Marzyeh GhassemiComments: Accepted to ML4H at NeurIPS 2019Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Multilanguage speech datasets are scarce and often have small sample sizes in the medical domain. Robust transfer of linguistic features across languages could improve rates of early diagnosis and therapy for speakers of lowresource languages when detecting health conditions from speech. We utilize outofdomain, unpaired, singlespeaker, healthy speech data for training multiple Optimal Transport (OT) domain adaptation systems. We learn mappings from other languages to English and detect aphasia from linguistic characteristics of speech, and show that OT domain adaptation improves aphasia detection over unilingual baselines for French (6% increased F1) and Mandarin (5% increased F1). Further, we show that adding aphasic data to the domain adaptation system significantly increases performance for both French and Mandarin, increasing the F1 scores further (10% and 8% increase in F1 scores for French and Mandarin, respectively, over unilingual baselines).
 [19] arXiv:1912.04378 (crosslist from cs.LG) [pdf, ps, other]

Title: DepthWidth Tradeoffs for ReLU Networks via Sharkovsky's TheoremSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky highlighted the benefits of depth by presenting a family of functions (based on simple triangular waves) for which DNNs achieve zero classification error, whereas shallow networks with fewer than exponentially many nodes incur constant error. Even though Telgarsky's work reveals the limitations of shallow neural networks, it does not inform us on why these functions are difficult to represent and in fact he states it as a tantalizing open question to characterize those functions that cannot be wellapproximated by smaller depths.
In this work, we point to a new connection between DNNs expressivity and Sharkovsky's Theorem from dynamical systems, that enables us to characterize the depthwidth tradeoffs of ReLU networks for representing functions based on the presence of generalized notion of fixed points, called periodic points (a fixed point is a point of period 1). Motivated by our observation that the triangle waves used in Telgarsky's work contain points of period 3  a period that is special in that it implies chaotic behavior based on the celebrated result by LiYorke  we proceed to give general lower bounds for the width needed to represent periodic functions as a function of the depth. Technically, the crux of our approach is based on an eigenvalue analysis of the dynamical system associated with such functions.  [20] arXiv:1912.04379 (crosslist from cs.PF) [pdf, ps, other]

Title: General MatrixMatrix Multiplication Using SIMD features of the PIIIComments: arXiv admin note: substantial text overlap with arXiv:1911.05181Journalref: EuroPar '00 Proceedings from the 6th International EuroPar Conference on Parallel Processing (2000) Pages 980983Subjects: Performance (cs.PF); Machine Learning (stat.ML)
Generalised matrixmatrix multiplication forms the kernel of many mathematical algorithms. A faster matrixmatrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices using the floating point Intel Pentium SIMD (Single Instruction Multiple Data) architecture. A description of the issues and our solution is presented, paying attention to all levels of the memory hierarchy. Our results demonstrate an average performance of 2.09 times faster than the leading public domain matrixmatrix multiply routines.
 [21] arXiv:1912.04381 (crosslist from eess.AS) [pdf]

Title: A Dataset for measuring reading levels in India at scaleComments: 5 pages, 4 figures, 3 Tables, submitted the paper to ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
One out of four children in India are leaving grade eight without basic reading skills. Measuring the reading levels in a vast country like India poses significant hurdles. Recent advances in machine learning opens up the possibility of automating this task. However, the datasets are primarily in English. To solve this assessment problem and advance deep learning research in regional Indian languages, we present the ASER dataset of children in the age group of 614. The dataset consists of 5,300 subjects generating 81,658 labeled audio clips in Hindi, Marathi and English. These labels represent expert opinions on the ability of the child to read at a specified level. Using this dataset, we built a simple ASRbased classifier. Early results indicate that we can achieve a prediction accuracy of 86 percent for the English language. Considering the ASER survey spans half a million subjects, this dataset can grow to those scales.
 [22] arXiv:1912.04391 (crosslist from cs.LG) [pdf, other]

Title: Semisupervised Learning Approach to Generate Neuroimaging Modalities with Adversarial TrainingSubjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Magnetic Resonance Imaging (MRI) of the brain can come in the form of different modalities such as T1weighted and Fluid Attenuated Inversion Recovery (FLAIR) which has been used to investigate a wide range of neurological disorders. Current stateoftheart models for brain tissue segmentation and disease classification require multiple modalities for training and inference. However, the acquisition of all of these modalities are expensive, timeconsuming, inconvenient and the required modalities are often not available. As a result, these datasets contain large amounts of \emph{unpaired} data, where examples in the dataset do not contain all modalities. On the other hand, there is smaller fraction of examples that contain all modalities (\emph{paired} data) and furthermore each modality is high dimensional when compared to number of datapoints. In this work, we develop a method to address these issues with semisupervised learning in translating between two neuroimaging modalities. Our proposed model, SemiSupervised Adversarial CycleGAN (SSACGAN), uses an adversarial loss to learn from \emph{unpaired} data points, cycle loss to enforce consistent reconstructions of the mappings and another adversarial loss to take advantage of \emph{paired} data points. Our experiments demonstrate that our proposed framework produces an improvement in reconstruction error and reduced variance for the pairwise translation of multiple modalities and is more robust to thermal noise when compared to existing methods.
 [23] arXiv:1912.04408 (crosslist from eess.SY) [pdf, other]

Title: Exploiting Model Sparsity in Adaptive MPC: A Compressed Sensing ViewpointComments: Both authors contributed equally. arXiv admin note: text overlap with arXiv:1804.09790Subjects: Systems and Control (eess.SY); Machine Learning (stat.ML)
This paper proposes an Adaptive Stochastic Model Predictive Control (MPC) strategy for stable linear timeinvariant systems in the presence of bounded disturbances. We consider multiinput, multioutput systems that can be expressed by a Finite Impulse Response (FIR) model. The parameters of the FIR model corresponding to each output are unknown but assumed sparse. We estimate these parameters using the Recursive Least Squares algorithm. The estimates are then improved using setbased bounds obtained by solving the Basis Pursuit Denoising [1] problem. Our approach is able to handle hard input constraints and probabilistic output constraints. Using tools from distributionally robust optimization, we reformulate the probabilistic output constraints as tractable convex secondorder cone constraints, which enables us to pose our MPC design task as a convex optimization problem. The efficacy of the developed algorithm is highlighted with a thorough numerical example, where we demonstrate performance gain over the counterpart algorithm of [2], which does not utilize the sparsity information of the system impulse response parameters during control design.
 [24] arXiv:1912.04427 (crosslist from cs.LG) [pdf, other]

Title: Winning the Lottery with Continuous SparsificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Lottery Ticket Hypothesis from Frankle & Carbin (2019) conjectures that, for typicallysized neural networks, it is possible to find small subnetworks which train faster and yield superior performance than their original counterparts. The proposed algorithm to search for "winning tickets", Iterative Magnitude Pruning, consistently finds subnetworks with $9095\%$ less parameters which train faster and better than the overparameterized models they were extracted from, creating potential applications to problems such as transfer learning.
In this paper, we propose Continuous Sparsification, a new algorithm to search for winning tickets which continuously removes parameters from a network during training, and learns the subnetwork's structure with gradientbased methods instead of relying on pruning strategies. We show empirically that our method is capable of finding tickets that outperforms the ones learned by Iterative Magnitude Pruning, and at the same time providing faster search, when measured in number of training epochs or wallclock time.  [25] arXiv:1912.04472 (crosslist from cs.LG) [pdf, other]

Title: Deep Bayesian Reward Learning from PreferencesComments: Workshop on Safety and Robustness in Decision Making at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 2019Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy. However, Bayesian IRL is computationally intractable for highdimensional problems because each sample from the posterior requires solving an entire Markov Decision Process (MDP). While there exist nonBayesian deep IRL methods, these methods typically infer point estimates of reward functions, precluding rigorous safety and uncertainty analysis. We propose Bayesian Reward Extrapolation (BREX), a highly efficient, preferencebased Bayesian reward learning algorithm that scales to highdimensional, visual control tasks. Our approach uses successor feature representations and preferences over demonstrations to efficiently generate samples from the posterior distribution over the demonstrator's reward function without requiring an MDP solver. Using samples from the posterior, we demonstrate how to calculate highconfidence bounds on policy performance in the imitation learning setting, in which the groundtruth reward function is unknown. We evaluate our proposed approach on the task of learning to play Atari games via imitation learning from pixel inputs, with no access to the game score. We demonstrate that BREX learns imitation policies that are competitive with a stateoftheart deep imitation learning method that only learns a point estimate of the reward function. Furthermore, we demonstrate that samples from the posterior generated via BREX can be used to compute highconfidence performance bounds for a variety of evaluation policies. We show that highconfidence performance bounds are useful for accurately ranking different evaluation policies when the reward function is unknown. We also demonstrate that highconfidence performance bounds may be useful for detecting reward hacking.
 [26] arXiv:1912.04508 (crosslist from cs.LG) [pdf, ps, other]

Title: Reducing Catastrophic Forgetting in Modular Neural Networks by Dynamic Information BalancingSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Lifelong learning is a very important step toward realizing robust autonomous artificial agents. Neural networks are the main engine of deep learning, which is the current stateoftheart technique in formulating adaptive artificial intelligent systems. However, neural networks suffer from catastrophic forgetting when stressed with the challenge of continual learning. We investigate how to exploit modular topology in neural networks in order to dynamically balance the information load between different modules by routing inputs based on the information content in each module so that information interference is minimized. Our dynamic information balancing (DIB) technique adapts a reinforcement learning technique to guide the routing of different inputs based on a reward signal derived from a measure of the information load in each module. Our empirical results show that DIB combined with elastic weight consolidation (EWC) regularization outperforms models with similar capacity and EWC regularization across different task formulations and datasets.
 [27] arXiv:1912.04511 (crosslist from cs.LG) [pdf, other]

Title: A FiniteTime Analysis of QLearning with Neural Network Function ApproximationComments: 23 pages, 1 table. Under review by ICLR 2020Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Qlearning with neural network function approximation (neural Qlearning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the nonasymptotic convergence rate of neural Qlearning remains virtually unknown. In this paper, we present a finitetime analysis of a neural Qlearning algorithm, where the data are generated from a Markov decision process and the actionvalue function is approximated by a deep ReLU neural network. We prove that neural Qlearning finds the optimal policy with $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finitetime analysis of neural Qlearning under noni.i.d. data assumption.
 [28] arXiv:1912.04521 (crosslist from cs.LG) [pdf, other]

Title: Transfer LearningBased Outdoor Position Recovery with Telco DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Telecommunication (Telco) outdoor position recovery aims to localize outdoor mobile devices by leveraging measurement report (MR) data. Unfortunately, Telco position recovery requires sufficient amount of MR samples across different areas and suffers from high data collection cost. For an area with scarce MR samples, it is hard to achieve good accuracy. In this paper, by leveraging the recently developed transfer learning techniques, we design a novel Telco position recovery framework, called TLoc, to transfer good models in the carefully selected source domains (those finegrained small subareas) to a target one which originally suffers from poor localization accuracy. Specifically, TLoc introduces three dedicated components: 1) a new coordinate space to divide an area of interest into smaller domains, 2) a similarity measurement to select best source domains, and 3) an adaptation of an existing transfer learning approach. To the best of our knowledge, TLoc is the first framework that demonstrates the efficacy of applying transfer learning in the Telco outdoor position recovery. To exemplify, on the 2G GSM and 4G LTE MR datasets in Shanghai, TLoc outperforms a nontransfer approach by 27.58% and 26.12% less median errors, and further leads to 47.77% and 49.22% less median errors than a recent fingerprinting approach NBL.
 [29] arXiv:1912.04523 (crosslist from cs.CV) [pdf, other]

Title: ContextDependent Models for Predicting and Characterizing Facial ExpressivenessSubjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
In recent years, extensive research has emerged in affective computing on topics like automatic emotion recognition and determining the signals that characterize individual emotions. Much less studied, however, is expressiveness, or the extent to which someone shows any feeling or emotion. Expressiveness is related to personality and mental health and plays a crucial role in social interaction. As such, the ability to automatically detect or predict expressiveness can facilitate significant advancements in areas ranging from psychiatric care to artificial social intelligence. Motivated by these potential applications, we present an extension of the BP4D+ dataset with human ratings of expressiveness and develop methods for (1) automatically predicting expressiveness from visual data and (2) defining relationships between interpretable visual signals and expressiveness. In addition, we study the emotional context in which expressiveness occurs and hypothesize that different sets of signals are indicative of expressiveness in different contexts (e.g., in response to surprise or in response to pain). Analysis of our statistical models confirms our hypothesis. Consequently, by looking at expressiveness separately in distinct emotional contexts, our predictive models show significant improvements over baselines and achieve comparable results to human performance in terms of correlation with the ground truth.
 [30] arXiv:1912.04527 (crosslist from cs.LG) [pdf, other]

Title: Learning Pose Estimation for UAV Autonomous Navigation andLanding Using VisualInertial Sensor DataSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
In this work, we propose a robust networkintheloop control system that allows an UnmannedAerialVehicles to navigate and land autonomously ona desired target. To estimate the global pose of theaerial vehicle, we develop a deep neural network architecture for visualinertial odometry, which providesa robust alternative to traditional techniques for autonomous navigation of UnmannedAerialVehicles. Wefirst provide experimental results on the accuracy ofthe estimation by comparing the prediction of our modelto traditional visualinertial approaches on the publiclyavailable EuRoC MAV dataset. The results indicate aclear improvement in the accuracy of the pose estimation up to 25% against the baseline. Second, we useAirsim, a simulator available as a plugin for UnrealEngine, to create new datasets of photorealistic imagesand inertial measurement to train and test our model.We finally integrate the proposed architecture for globallocalization with the Airsim closedloop control system,and we provide simulation results for the autonomouslanding of the aerial vehicle.
 [31] arXiv:1912.04530 (crosslist from cs.LG) [pdf, ps, other]

Title: NoTrick (Treat) Kernel Adaptive Filtering using Deterministic FeaturesComments: 12 pages, 7 figuresSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Kernel methods form a powerful, versatile, and theoreticallygrounded unifying framework to solve nonlinear problems in signal processing and machine learning. The standard approach relies on the kernel trick to perform pairwise evaluations of a kernel function, which leads to scalability issues for large datasets due to its linear and superlinear growth with respect to the training data. A popular approach to tackle this problem, known as random Fourier features (RFFs), samples from a distribution to obtain the dataindependent basis of a higher finitedimensional feature space, where its dot product approximates the kernel function. Recently, deterministic, rather than random construction has been shown to outperform RFFs, by approximating the kernel in the frequency domain using Gaussian quadrature. In this paper, we view the dot product of these explicit mappings not as an approximation, but as an equivalent positivedefinite kernel that induces a new finitedimensional reproducing kernel Hilbert space (RKHS). This opens the door to notrick (NT) online kernel adaptive filtering (KAF) that is scalable and robust. Random features are prone to large variances in performance, especially for smaller dimensions. Here, we focus on deterministic featuremap construction based on polynomialexact solutions and show their superiority over random constructions. Without loss of generality, we apply this approach to classical adaptive filtering algorithms and validate the methodology to show that deterministic features are faster to generate and outperform stateoftheart kernel methods based on random Fourier features.
 [32] arXiv:1912.04533 (crosslist from cs.LG) [pdf, other]

Title: Exact expressions for double descent and implicit regularization via surrogate random designSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly overparameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the overparameterized regime. We build on recent advances in Randomized Numerical Linear Algebra (RandNLA) to provide the first exact nonasymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing what we call a surrogate random design to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for overparameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridgeregularized least squares problem on the population distribution.
 [33] arXiv:1912.04549 (crosslist from cs.LG) [pdf, other]

Title: Expansion of Cyber Attack Data From Unbalanced Datasets Using Generative TechniquesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Machine learning techniques help to understand patterns of a dataset to create a defense mechanism against cyber attacks. However, it is difficult to construct a theoretical model due to the imbalances in the dataset for discriminating attacks from the overall dataset. Multilayer Perceptron (MLP) technique will provide improvement in accuracy and increase the performance of detecting the attack and benign data from a balanced dataset. We have worked on the UGR'16 dataset publicly available for this work. Data wrangling has been done due to prepare test set from in the original set. We fed the neural network classifier larger input to the neural network in an increasing manner (i.e. 10000, 50000, 1 million) to see the distribution of features over the accuracy. We have implemented a GAN model that can produce samples of different attack labels (e.g. blacklist, anomaly spam, ssh scan). We have been able to generate as many samples as necessary based on the data sample we have taken from the UGR'16. We have tested the accuracy of our model with the imbalance dataset initially and then with the increasing the attack samples and found improvement of classification performance for the latter.
 [34] arXiv:1912.04556 (crosslist from cs.LG) [pdf]

Title: Accurate Entrance Position Detection Based on WiFi and GPS Signals Using Machine LearningAuthors: Ahmad AbadlehSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
This paper aims at detecting an accurate position of the main entrance of the buildings. The proposed approach relies on the fact that the GPS signals drop significantly when the user enters a building. Moreover, as most of the public buildings provide WiFi services, the WiFi received signal strength (RSS) can be utilized in order to detect the entrance of the buildings. The rationale behind this paper is that the GPS signals decrease as the user gets close to the main entrance and the WiFi signal increases as the user approaches the main entrance. Several real experiments have been conducted in order to guarantee the feasibility of the proposed approach. The experiment results have shown an interesting result and the accuracy of the whole system was one meter
 [35] arXiv:1912.04565 (crosslist from qfin.TR) [pdf, other]

Title: Market Price of Trading Liquidity Risk and Market DepthComments: 46 Pages, 12 Figures, To appear in the International Journal of Theoretical and Applied FinanceSubjects: Trading and Market Microstructure (qfin.TR); Econometrics (econ.EM); Computational Finance (qfin.CP); Mathematical Finance (qfin.MF); Computation (stat.CO)
Price impact of a trade is an important element in pretrade and posttrade analyses. We introduce a framework to analyze the market price of liquidity risk, which allows us to derive an inhomogeneous Bernoulli ordinary differential equation. We obtain two closed form solutions, one of which reproduces the linear function of the order flow in Kyle (1985) for informed traders. However, when traders are not as asymmetrically informed, an Sshape function of the order flow is obtained. We perform an empirical intraday analysis on Nikkei futures to quantify the price impact of order flow and compare our results with industry's heuristic price impact functions. Our model of order flow yields a rich framework for not only to estimate the liquidity risk parameters, but also to provide a plausible cause of why volatility and correlation are stochastic in nature. Finally, we find that the market depth encapsulates the market price of liquidity risk.
 [36] arXiv:1912.04635 (crosslist from cs.LG) [pdf, ps, other]

Title: Backprop Diffusion is Biologically PlausibleComments: 6 pages, 3 figuresSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (qbio.NC); Machine Learning (stat.ML)
The Backpropagation algorithm relies on the abstraction of using a neural model that gets rid of the notion of time, since the input is mapped instantaneously to the output. In this paper, we claim that this abstraction of ignoring time, along with the abrupt input changes that occur when feeding the training set, are in fact the reasons why, in some papers, Backprop biological plausibility is regarded as an arguable issue. We show that as soon as a deep feedforward network operates with neurons with timedelayed response, the backprop weight update turns out to be the basic equation of a biologically plausible diffusion process based on forwardbackward waves. We also show that such a process very well approximates the gradient for inputs that are not too fast with respect to the depth of the network. These remarks somewhat disclose the diffusion process behind the backprop equation and leads us to interpret the corresponding algorithm as a degeneration of a more general diffusion process that takes place also in neural networks with cyclic connections.
 [37] arXiv:1912.04661 (crosslist from econ.EM) [pdf, other]

Title: Adaptive Dynamic Model Averaging with an Application to House Price ForecastingSubjects: Econometrics (econ.EM); Methodology (stat.ME)
Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a stateoftheart nonparametric model combination algorithm from the prediction with expert advice literature, which offers finitetime performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications.
 [38] arXiv:1912.04684 (crosslist from cs.LG) [pdf, other]

Title: Neural Network Based Explicit MPC for Chemical Reactor ControlComments: Preprint submitted to Acta Chimica Slovaca, ISSN: 13393065Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
In this paper, we show the implementation of deep neural networks applied in process control. In our approach, we based the training of the neural network on model predictive control. Model predictive control is popular for its ability to be tuned by the weighting matrices and by the fact that it respects the constraints. We present the neural network that can approximate the behavior of the MPC in the way of mimicking the control input trajectory while the constraints on states and control input remain unimpaired of the value of the weighting matrices. This approach is demonstrated in a simulation case study involving a continuous stirred tank reactor, where multicomponent chemical reaction takes place.
 [39] arXiv:1912.04690 (crosslist from cs.LG) [pdf]

Title: Reconstructing Multiecho Magnetic Resonance Images via Structured Deep Dictionary LearningComments: Final version accepted at NeurocomputingSubjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Multiecho magnetic resonance (MR) images are acquired by changing the echo times (for T2 weighted) or relaxation times (for T1 weighted) of scans. The resulting (multiecho) images are usually used for quantitative MR imaging. Acquiring MR images is a slow process and acquiring multi scans of the same cross section for multiecho imaging is even slower. In order to accelerate the scan, compressed sensing (CS) based techniques have been advocating partial Kspace (Fourier domain) scans; the resulting images are reconstructed via structured CS algorithms. In recent times, it has been shown that instead of using offtheshelf CS, better results can be obtained by adaptive reconstruction algorithms based on structured dictionary learning. In this work, we show that the reconstruction results can be further improved by using structured deep dictionaries. Experimental results on real datasets show that by using our proposed technique the scantime can be cut by half compared to the stateoftheart.
 [40] arXiv:1912.04695 (crosslist from cs.LG) [pdf, other]

Title: Transparent Classification with Multilayer Logical Perceptrons and Random BinarizationComments: AAAI20 oralSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Models with transparent inner structure and high classification performance are required to reduce potential risk and provide trust for users in domains like health care, finance, security, etc. However, existing models are hard to simultaneously satisfy the above two properties. In this paper, we propose a new hierarchical rulebased model for classification tasks, named Concept Rule Sets (CRS), which has both a strong expressive ability and a transparent inner structure. To address the challenge of efficiently learning the nondifferentiable CRS model, we propose a novel neural network architecture, Multilayer Logical Perceptron (MLLP), which is a continuous version of CRS. Using MLLP and the Random Binarization (RB) method we proposed, we can search the discrete solution of CRS in continuous space using gradient descent and ensure the discrete CRS acts almost the same as the corresponding continuous MLLP. Experiments on 12 public data sets show that CRS outperforms the stateoftheart approaches and the complexity of the learned CRS is close to the simple decision tree.
 [41] arXiv:1912.04734 (crosslist from cs.LG) [pdf]

Title: Transformed Subspace ClusteringSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Subspace clustering assumes that the data is separable into separate subspaces. Such a simple assumption, does not always hold. We assume that, even if the raw data is not separable into subspaces, one can learn a representation (transform coefficients) such that the learnt representation is separable into subspaces. To achieve the intended goal, we embed subspace clustering techniques (locally linear manifold clustering, sparse subspace clustering and low rank representation) into transform learning. The entire formulation is jointly learnt; giving rise to a new class of methods called transformed subspace clustering (TSC). In order to account for nonlinearity, kernelized extensions of TSC are also proposed. To test the performance of the proposed techniques, benchmarking is performed on image clustering and document clustering datasets. Comparison with stateoftheart clustering techniques shows that our formulation improves upon them.
 [42] arXiv:1912.04747 (crosslist from cs.LG) [pdf, other]

Title: Oversampling Log Messages Using a Sequence Generative Adversarial Network for Anomaly Detection and ClassificationComments: 23 pages, 4 figures, 2 tables. arXiv admin note: text overlap with arXiv:1911.08744Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Dealing with imbalanced data is one the main challenges in machine/deep learning algorithms for classification. This issue is more important with log message data as it is typically imbalanced and negative logs are rare. In this paper, a model is proposed to generate text log messages using a SeqGAN network. Then features are extracted using an Autoencoder and anomaly detection and classification is done using a GRU network. The proposed model is evaluated with two imbalanced log data sets, namely BGL and Openstack. Results are presented which show that oversampling and balancing data increases the accuracy of anomaly detection and classification.
 [43] arXiv:1912.04754 (crosslist from cs.LG) [pdf]

Title: Deep Latent Factor Model for Collaborative FilteringComments: This is an initial draft of the accepted paper at Elsevier Signal ProcessingSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Latent factor models have been used widely in collaborative filtering based recommender systems. In recent years, deep learning has been successful in solving a wide variety of machine learning problems. Motivated by the success of deep learning, we propose a deeper version of latent factor model. Experiments on benchmark datasets shows that our proposed technique significantly outperforms all stateoftheart collaborative filtering techniques.
 [44] arXiv:1912.04783 (crosslist from cs.LG) [pdf, other]

Title: Removable and/or Repeated Units Emerge in Overparametrized Deep Neural NetworksSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Deep neural networks (DNNs) perform well on a variety of tasks despite the fact that most networks used in practice are vastly overparametrized and even capable of perfectly fitting randomly labeled data. Recent evidence suggests that developing compressible representations is key for adjusting the complexity of overparametrized networks to the task at hand. In this paper, we provide new empirical evidence that supports this hypothesis by identifying two types of units that emerge when the network's width is increased: removable units which can be dropped out of the network without significant change to the output and repeated units whose activities are highly correlated with other units. The emergence of these units implies capacity constraints as the function the network represents could be expressed by a smaller network without these units. In a series of experiments with AlexNet, ResNet and Inception networks in the CIFAR10 and ImageNet datasets, and also using shallow networks with synthetic data, we show that DNNs consistently increase either the number of removable units, repeated units, or both at greater widths for a comprehensive set of hyperparameters. These results suggest that the mechanisms by which networks in the deep learning regime adjust their complexity operate at the unit level and highlight the need for additional research into what drives the emergence of such units.
 [45] arXiv:1912.04792 (crosslist from cs.LG) [pdf, other]

Title: On Certifying Robust Models by Polyhedral EnvelopeSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Certifying neural networks enables one to offer guarantees on a model's robustness. In this work, we use linear approximation to obtain an upper and lower bound of the model's output when the input data is perturbed within a predefined adversarial budget. This allows us to bound the adversaryfree region in the data neighborhood by a polyhedral envelope, and calculate robustness guarantees based on this geometric approximation. Compared with existing methods, our approach gives a finergrain quantitative evaluation of a model's robustness. Therefore, the certification method can not only obtain better certified bounds than the stateoftheart techniques given the same adversarial budget but also derives a faster search scheme for the optimal adversarial budget. Furthermore, we introduce a simple regularization scheme based on our method that enables us to effectively train robust models.
 [46] arXiv:1912.04825 (crosslist from cs.LG) [pdf, other]

Title: Integration of Neural NetworkBased Symbolic Regression in Deep Learning for Scientific DiscoveryAuthors: Samuel Kim, Peter Lu, Srijon Mukherjee, Michael Gilbert, Li Jing, Vladimir Ceperic, Marin SoljacicComments: 11 pages, 8 figuresSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (stat.ML)
Symbolic regression is a powerful technique that can discover analytical equations that describe data, which can lead to explainable models and generalizability outside of the training data set. In contrast, neural networks have achieved amazing levels of accuracy on image recognition and natural language processing tasks, but are often seen as blackbox models that are difficult to interpret and typically extrapolate poorly. Here we use a neural networkbased architecture for symbolic regression that we call the Sequential Equation Learner (SEQL) network and integrate it with other deep learning architectures such that the whole system can be trained endtoend through backpropagation. To demonstrate the power of such systems, we study their performance on several substantially different tasks. First, we show that the neural network can perform symbolic regression and learn the form of several functions. Next, we present an MNIST arithmetic task where a separate part of the neural network extracts the digits. Finally, we demonstrate prediction of dynamical systems where an unknown parameter is extracted through an encoder. We find that the EQLbased architecture can extrapolate quite well outside of the training data set compared to a standard neural networkbased architecture, paving the way for deep learning to be applied in scientific exploration and discovery.
 [47] arXiv:1912.04832 (crosslist from cs.LG) [pdf, other]

Title: Feature Relevance Determination for Ordinal Regression in the Context of Feature Redundancies and Privileged InformationAuthors: Lukas Pfannschmidt, Jonathan Jakob, Fabian Hinder, Michael Biehl, Peter Tino, Barbara HammerComments: Preprint accepted at NeurocomputingSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Advances in machine learning technologies have led to increasingly powerful models in particular in the context of big data. Yet, many application scenarios demand for robustly interpretable models rather than optimum model accuracy; as an example, this is the case if potential biomarkers or causal factors should be discovered based on a set of given measurements. In this contribution, we focus on feature selection paradigms, which enable us to uncover relevant factors of a given regularity based on a sparse model. We focus on the important specific setting of linear ordinal regression, i.e.\ data have to be ranked into one of a finite number of ordered categories by a linear projection. Unlike previous work, we consider the case that features are potentially redundant, such that no unique minimum set of relevant features exists. We aim for an identification of all strongly and all weakly relevant features as well as their type of relevance (strong or weak); we achieve this goal by determining feature relevance bounds, which correspond to the minimum and maximum feature relevance, respectively, if searched over all equivalent models. In addition, we discuss how this setting enables us to substitute some of the features, e.g.\ due to their semantics, and how to extend the framework of feature relevance intervals to the setting of privileged information, i.e.\ potentially relevant information is available for training purposes only, but cannot be used for the prediction itself.
 [48] arXiv:1912.04838 (crosslist from cs.CV) [pdf, other]

Title: Scalability in Perception for Autonomous Driving: An Open Dataset BenchmarkAuthors: Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, YuZhang, Jon Shlens, Zhifeng Chen, Dragomir AnguelovSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing selfdriving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall viability of the technology. In an effort to help align the research community's contributions with realworld selfdriving problems, we introduce a new large scale, high quality, diverse dataset. Our new dataset consists of 1150 scenes that each span 20 seconds, consisting of well synchronized and calibrated high quality LiDAR and camera data captured across a range of urban and suburban geographies. It is 15x more diverse than the largest camera+LiDAR dataset available based on our proposed diversity metric. We exhaustively annotated this data with 2D (camera image) and 3D (LiDAR) bounding boxes, with consistent identifiers across frames. Finally, we provide strong baselines for 2D as well as 3D detection and tracking tasks. We further study the effects of dataset size and generalization across geographies on 3D detection methods. Find data, code and more uptodate information at this http URL
 [49] arXiv:1912.04845 (crosslist from cs.LG) [pdf, other]

Title: Magnitude and Uncertainty Pruning Criterion for Neural NetworksComments: 10 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Neural networks have achieved dramatic improvements in recent years and depict the stateoftheart methods for many realworld tasks nowadays. One drawback is, however, that many of these models are overparameterized, which makes them both computationally and memory intensive. Furthermore, overparameterization can also lead to undesired overfitting sideeffects. Inspired by recently proposed magnitudebased pruning schemes and the Wald test from the field of statistics, we introduce a novel magnitude and uncertainty (M&U) pruning criterion that helps to lessen such shortcomings. One important advantage of our M&U pruning criterion is that it is scaleinvariant, a phenomenon that the magnitudebased pruning criterion suffers from. In addition, we present a ``pseudo bootstrap'' scheme, which can efficiently estimate the uncertainty of the weights by using their update information during training. Our experimental evaluation, which is based on various neural network architectures and datasets, shows that our new criterion leads to more compressed models compared to models that are solely based on magnitudebased pruning criteria, with, at the same time, less loss in predictive power.
 [50] arXiv:1912.04858 (crosslist from math.PR) [pdf, ps, other]

Title: Rates of convergence to the local time of Oscillating and Skew Brownian MotionsAuthors: Sara MazzonettoSubjects: Probability (math.PR); Statistics Theory (math.ST)
In this paper a class of statistics based on high frequency observations of oscillating Brownian motions and skew Brownian motions is considered. Their convergence rate towards the local time of the underling process is obtained in form of a Central Limit Theorem.
 [51] arXiv:1912.04862 (crosslist from cs.LG) [pdf, other]

Title: Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis ViewpointComments: 26 pagesSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
Motivated by the gap between theoretical optimal approximation rates of deep neural networks (DNNs) and the accuracy realized in practice, we seek to improve the training of DNNs. The adoption of an adaptive basis viewpoint of DNNs leads to novel initializations and a hybrid least squares/gradient descent optimizer. We provide analysis of these techniques and illustrate via numerical examples dramatic increases in accuracy and convergence rate for benchmarks characterizing scientific applications where DNNs are currently used, including regression problems and physicsinformed neural networks for the solution of partial differential equations.
 [52] arXiv:1912.04871 (crosslist from cs.LG) [pdf, other]

Title: Deep symbolic regression: Recovering mathematical expressions from data via policy gradientsAuthors: Brenden K. PetersenSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of symbolic regression. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are lacking. We propose a framework that combines deep learning with symbolic regression via a simple idea: use a large model to search the space of small models. More specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions, and employ reinforcement learning to train the network to generate betterfitting expressions. Our algorithm significantly outperforms standard genetic programmingbased symbolic regression in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variablelength objects under a blackbox performance metric, with the ability to incorporate a priori constraints in situ.
Replacements for Wed, 11 Dec 19
 [53] arXiv:1709.01062 (replaced) [pdf, ps, other]

Title: A hierarchical loss and its problems when classifying nonhierarchicallyComments: 19 pages, 4 figures, 7 tablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [54] arXiv:1806.09548 (replaced) [pdf, other]

Title: Learning dynamical systems with particle stochastic approximation EMSubjects: Computation (stat.CO); Computational Engineering, Finance, and Science (cs.CE); Signal Processing (eess.SP); Machine Learning (stat.ML)
 [55] arXiv:1811.08968 (replaced) [pdf, other]

Title: Spread DivergencesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [56] arXiv:1812.01097 (replaced) [pdf, other]

Title: LEAF: A Benchmark for Federated SettingsAuthors: Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, Ameet TalwalkarSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [57] arXiv:1812.07929 (replaced) [pdf, other]

Title: Importance Samplingbased Transport Map Hamiltonian Monte Carlo for Bayesian Hierarchical ModelsSubjects: Computation (stat.CO)
 [58] arXiv:1901.10837 (replaced) [pdf, other]

Title: Noisetolerant fair classificationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (stat.ML)
 [59] arXiv:1902.00610 (replaced) [pdf, other]

Title: On the Optimality of Perturbations in Stochastic and Adversarial Multiarmed Bandit ProblemsComments: Advances in Neural Information Processing Systems 32 (NIPS 2019)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [60] arXiv:1902.04972 (replaced) [pdf, other]

Title: Provable Low Rank Phase Retrieval and Compressive PCAComments: A short version of this work is in ICML 2019, this longer version is revised and resubmitted to IEEE Trans. Info. ThSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [61] arXiv:1902.11153 (replaced) [pdf, other]

Title: On the generalization of GAN image forensicsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [62] arXiv:1903.08114 (replaced) [pdf, other]

Title: Exact Gaussian Processes on a Million Data PointsAuthors: Ke Alexander Wang, Geoff Pleiss, Jacob R. Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon WilsonComments: Published at NeurIPS 2019Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
 [63] arXiv:1904.03920 (replaced) [pdf, other]

Title: A Generalization Bound for Online Variational InferenceComments: Published in the proceedings of ACML 2019Journalref: Proceedings in Machine Learning Research, 2019, vol. 101, pp. 662677Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)
 [64] arXiv:1904.11132 (replaced) [pdf, other]

Title: TreeGrad: Transferring Tree Ensembles to Neural NetworksAuthors: Chapman SiuComments: Technical Report on Implementation of Deep Neural Decision Forests Algorithm. To accompany implementation here: this https URL Update: Please cite as: Siu, C. (2019). "Transferring Tree Ensembles to Neural Networks". International Conference on Neural Information Processing. Springer, 2019. arXiv admin note: text overlap with arXiv:1909.11790Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [65] arXiv:1905.00441 (replaced) [pdf, other]

Title: NATTACK: Learning the Distributions of Adversarial Examples for an Improved BlackBox Attack on Deep Neural NetworksSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [66] arXiv:1905.12558 (replaced) [pdf, other]

Title: Limitations of the Empirical Fisher Approximation for Natural Gradient DescentSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [67] arXiv:1906.00696 (replaced) [pdf, other]

Title: Transformed Central Quantile SubspaceAuthors: Eliana ChristouComments: arXiv admin note: text overlap with arXiv:1906.00694Subjects: Methodology (stat.ME)
 [68] arXiv:1906.03849 (replaced) [pdf, other]

Title: Robustness Verification of Treebased ModelsComments: Hongge Chen and Huan Zhang contributed equallySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [69] arXiv:1906.05200 (replaced) [pdf, other]

Title: Macroaction Multitime scale Dynamic Programming for Energy Management in Buildings with Phase Change MaterialsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [70] arXiv:1906.10075 (replaced) [pdf, other]

Title: DistributionIndependent PAC Learning of Halfspaces with Massart NoiseSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [71] arXiv:1906.12074 (replaced) [pdf, other]

Title: Recursion scheme for the largest $β$WishartLaguerre eigenvalue and Landauer conductance in quantum transportComments: Published version; 20 pages, 2 figures in the main text + 2 in the Mathematica code towards the endJournalref: Journal of Physics A: Mathematical and Theoretical, Volume 52, Page 42LT02, Year 2019Subjects: Mathematical Physics (mathph); Mesoscale and Nanoscale Physics (condmat.meshall); Statistical Mechanics (condmat.statmech); Applications (stat.AP); Computation (stat.CO)
 [72] arXiv:1906.12331 (replaced) [pdf, ps, other]

Title: Modeling Food Popularity Dependencies using Social Media dataComments: 5 pagesSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
 [73] arXiv:1907.06560 (replaced) [pdf]

Title: Eliciting Priors for Bayesian Prediction of Daily Response Propensity in Responsive Survey Design: Historical Data Analysis vs. Literature ReviewComments: 47 pages, 10 figures, two tablesSubjects: Methodology (stat.ME); Applications (stat.AP)
 [74] arXiv:1907.09617 (replaced) [pdf, other]

Title: Hierarchical Transformed Scale Mixtures for Flexible Modeling of Spatial Extremes on Datasets with Many LocationsSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
 [75] arXiv:1907.11792 (replaced) [pdf, other]

Title: Maximum Causal Entropy Specification Inference from DemonstrationsSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
 [76] arXiv:1908.04457 (replaced) [pdf, other]

Title: On the Convergence of AdaBound and its Connection to SGDAuthors: Pedro SavareseSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [77] arXiv:1908.09377 (replaced) [pdf, other]

Title: Probabilistic Forecasting of the Arctic Sea Ice Edge with Contour ModelingSubjects: Applications (stat.AP)
 [78] arXiv:1909.00719 (replaced) [pdf, other]

Title: Pathologies of Factorised Gaussian and MC Dropout Posteriors in Bayesian Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [79] arXiv:1909.02940 (replaced) [pdf, ps, other]

Title: Reinforcement Learning with NonMarkovian RewardsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
 [80] arXiv:1909.04239 (replaced) [pdf, other]

Title: PMD: An Optimal Transportationbased User Distance for Recommender SystemsAuthors: Yitong Meng, Xinyan Dai, Xiao Yan, James Cheng, Weiwen Liu, Benben Liao, Jun Guo, Guangyong ChenComments: This paper is accepted by European Conference on Information Retrieval (ECIR 2020)Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [81] arXiv:1909.06342 (replaced) [pdf, ps, other]

Title: Explainable Machine Learning in DeploymentAuthors: Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José M. F. Moura, Peter EckersleyComments: Accepted to the ACM Conference on Fairness, Accountability, and Transparency (ACM FAT* 2020)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); HumanComputer Interaction (cs.HC); Machine Learning (stat.ML)
 [82] arXiv:1909.10024 (replaced) [pdf, other]

Title: Distributionfree consistent independence tests via Hallin's multivariate rankComments: In this (3rd) version, we added more referencesSubjects: Statistics Theory (math.ST)
 [83] arXiv:1910.08520 (replaced) [pdf, other]

Title: Optimization Hierarchy for Fair Statistical Decision ProblemsSubjects: Statistics Theory (math.ST); Optimization and Control (math.OC)
 [84] arXiv:1910.10196 (replaced) [pdf, other]

Title: Online MetaLearning on Nonconvex SettingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [85] arXiv:1910.12566 (replaced) [pdf, other]

Title: The spectral dimension of simplicial complexes: a renormalization group theoryComments: (30 pages, 5 figures)Subjects: Disordered Systems and Neural Networks (condmat.disnn); Statistical Mechanics (condmat.statmech); Social and Information Networks (cs.SI); Physics and Society (physics.socph); Machine Learning (stat.ML)
 [86] arXiv:1911.00197 (replaced) [pdf, other]

Title: Phase transitions and optimal algorithms for semisupervised classifications on graphs: from belief propagation to graph convolution networkComments: 18 pages, 21 figuresSubjects: Statistical Mechanics (condmat.statmech); Social and Information Networks (cs.SI); Physics and Society (physics.socph); Machine Learning (stat.ML)
 [87] arXiv:1911.04489 (replaced) [pdf, other]

Title: Making Good on LSTMs' Unfulfilled PromiseComments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. arXiv admin note: text overlap with arXiv:1812.02340Subjects: Machine Learning (cs.LG); Computational Finance (qfin.CP); Portfolio Management (qfin.PM); Machine Learning (stat.ML)
 [88] arXiv:1911.07891 (replaced) [pdf, other]

Title: Basic Principles of Clustering MethodsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [89] arXiv:1911.11607 (replaced) [pdf, other]

Title: Deep Learning with Gaussian Differential PrivacySubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [90] arXiv:1911.11610 (replaced) [pdf, other]

Title: Improving EEG based Continuous Speech RecognitionComments: On preparation for submission to EUSIPCO 2020. arXiv admin note: text overlap with arXiv:1911.04261, arXiv:1906.08871Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
 [91] arXiv:1912.01792 (replaced) [pdf, ps, other]

Title: Learn Electronic Health Records by Fully Decentralized Federated LearningSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
 [92] arXiv:1912.01823 (replaced) [pdf, other]

Title: Domainindependent Dominance of Adaptive MethodsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [93] arXiv:1912.02427 (replaced) [pdf, other]

Title: Analysis of the Optimization Landscapes for Overcomplete Representation LearningComments: 68 pages, 5 figuresSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [94] arXiv:1912.03011 (replaced) [pdf, other]

Title: A priori generalization error for twolayer ReLU neural network through minimum norm solutionComments: 15 pages,1 figureSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [95] arXiv:1912.04151 (replaced) [pdf, other]

Title: Identification of causal intervention effects under contagionSubjects: Applications (stat.AP); Statistics Theory (math.ST); Populations and Evolution (qbio.PE)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 1912, contact, help (Access key information)