We gratefully acknowledge support from
the Simons Foundation
and member institutions


New submissions

[ total of 54 entries: 1-54 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 22 Aug 17

[1]  arXiv:1708.05712 [pdf]
Title: Extensions of Morse-Smale Regression with Application to Actuarial Science
Comments: 14 pages, 10 figures
Subjects: Machine Learning (stat.ML); Applications (stat.AP)

The problem of subgroups is ubiquitous in scientific research (ex. disease heterogeneity, spatial distributions in ecology...), and piecewise regression is one way to deal with this phenomenon. Morse-Smale regression offers a way to partition the regression function based on level sets of a defined function and that function's basins of attraction. This topologically-based piecewise regression algorithm has shown promise in its initial applications, but the current implementation in the literature has been limited to elastic net and generalized linear regression. It is possible that nonparametric methods, such as random forest or conditional inference trees, may provide better prediction and insight through modeling interaction terms and other nonlinear relationships between predictors and a given outcome.
This study explores the use of several machine learning algorithms within a Morse-Smale piecewise regression framework, including boosted regression with linear baselearners, homotopy-based LASSO, conditional inference trees, random forest, and a wide neural network framework called extreme learning machines. Simulations on Tweedie regression problems with varying Tweedie parameter and dispersion suggest that many machine learning approaches to Morse-Smale piecewise regression improve the original algorithm's performance, particularly for outcomes with lower dispersion and linear or a mix of linear and nonlinear predictor relationships. On a real actuarial problem, several of these new algorithms perform as good as or better than the original Morse-Smale regression algorithm, and most provide information on the nature of predictor relationships within each partition to provide insight into differences between dataset partitions.

[2]  arXiv:1708.05715 [pdf, other]
Title: The Stochastic Replica Approach to Machine Learning: Stability and Parameter Optimization
Comments: 30 pages, 42 figures
Subjects: Machine Learning (stat.ML); Statistical Mechanics (cond-mat.stat-mech); Data Analysis, Statistics and Probability (physics.data-an)

We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as expansions in terms of various stochastic functions. The algorithm predicts the classification/regression values of new data by combining (via voting) the outputs of these numerous linear expansions in randomly chosen functions. The few parameters (typically only one parameter is used in all studied examples) that this model has may be automatically optimized. The algorithm has been tested on 10 diverse training data sets of various types and feature space dimensions. It has been shown to consistently exhibit high accuracy and readily allow for optimization of parameters, while simultaneously avoiding pitfalls of existing algorithms such as those associated with class imbalance. We very briefly speculate on whether spatial coordinates in physical theories may be viewed as emergent "features" that enable a robust machine learning type description of data with generic low order smooth functions.

[3]  arXiv:1708.05768 [pdf, other]
Title: Data-Driven Tree Transforms and Metrics
Comments: 16 pages, 5 figures. Accepted to IEEE Transactions on Signal and Information Processing over Networks
Subjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (q-bio.QM)

We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer sub-types and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.

[4]  arXiv:1708.05789 [pdf, other]
Title: Semi-supervised Conditional GANs
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

We introduce a new model for building conditional generative models in a semi-supervised setting to conditionally generate data given attributes by adapting the GAN framework. The proposed semi-supervised GAN (SS-GAN) model uses a pair of stacked discriminators to learn the marginal distribution of the data, and the conditional distribution of the attributes given the data respectively. In the semi-supervised setting, the marginal distribution (which is often harder to learn) is learned from the labeled + unlabeled data, and the conditional distribution is learned purely from the labeled data. Our experimental results demonstrate that this model performs significantly better compared to existing semi-supervised conditional GAN models.

[5]  arXiv:1708.05836 [pdf, ps, other]
Title: Common change point estimation in panel data from the least squares and maximum likelihood viewpoints
Subjects: Statistics Theory (math.ST)

We establish the convergence rates and asymptotic distributions of the common break change-point estimators, obtained by least squares and maximum likelihood in panel data models and compare their asymptotic variances. Our model assumptions accommodate a variety of commonly encountered probability distributions and, in particular, models of particular interest in econometrics beyond the commonly analyzed Gaussian model, including the zero-inflated Poisson model for count data, and the probit and tobit models. We also provide novel results for time dependent data in the signal-plus-noise model, with emphasis on a wide array of noise processes, including Gaussian process, MA$(\infty)$ and $m$-dependent processes. The obtained results show that maximum likelihood estimation requires a stronger signal-to-noise model identifiability condition compared to its least squares counterpart. Finally, since there are three different asymptotic regimes that depend on the behavior of the norm difference of the model parameters before and after the change point, which cannot be realistically assumed to be known, we develop a novel data driven adaptive procedure that provides valid confidence intervals for the common break, without requiring a priori knowledge of the asymptotic regime the problem falls in.

[6]  arXiv:1708.05840 [pdf, other]
Title: A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark
Comments: 12 pages
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs in Apache Spark. The framework implements both Data Parallelism and Model Parallelism making it suitable to use for deep networks which require huge training data and model parameters which are too big to fit into the memory of a single machine. It can be scaled easily over a cluster of cheap commodity hardware to attain significant speedup and obtain better results making it quite economical as compared to farm of GPUs and supercomputers. We have proposed a new algorithm for training of deep networks for the case when the network is partitioned across the machines (Model Parallelism) along with detailed cost analysis and proof of convergence of the same. We have developed implementations for Fully-Connected Feedforward Networks, Convolutional Neural Networks, Recurrent Neural Networks and Long Short-Term Memory architectures. We present the results of extensive simulations demonstrating the speedup and accuracy obtained by our framework for different sizes of the data and model parameters with variation in the number of worker cores/partitions; thereby showing that our proposed framework can achieve significant speedup (upto 11X for CNN) and is also quite scalable.

[7]  arXiv:1708.05879 [pdf, other]
Title: Regularized Estimation and Testing for High-Dimensional Multi-Block Vector-Autoregressive Models
Subjects: Methodology (stat.ME)

Dynamical systems comprising of multiple components that can be partitioned into distinct blocks originate in many scientific areas. A pertinent example is the interactions between financial assets and selected macroeconomic indicators, which has been studied at aggregate level---e.g. a stock index and an employment index---extensively in the macroeconomics literature. A key shortcoming of this approach is that it ignores potential influences from other related components (e.g. Gross Domestic Product) that may exert influence on the system's dynamics and structure and thus produces incorrect results. To mitigate this issue, we consider a multi-block linear dynamical system with Granger-causal ordering between blocks, wherein the blocks' temporal dynamics are described by vector autoregressive processes and are influenced by blocks higher in the system hierarchy. We derive the maximum likelihood estimator for the posited model for Gaussian data in the high-dimensional setting based on appropriate regularization schemes for the parameters of the block components. To optimize the underlying non-convex likelihood function, we develop an iterative algorithm with convergence guarantees. We establish theoretical properties of the maximum likelihood estimates, leveraging the decomposability of the regularizers and a careful analysis of the iterates. Finally, we develop testing procedures for the null hypothesis of whether a block "Granger-causes" another block of variables. The performance of the model and the testing procedures are evaluated on synthetic data, and illustrated on a data set involving log-returns of the US S&P100 component stocks and key macroeconomic variables for the 2001--16 period.

[8]  arXiv:1708.05894 [pdf, other]
Title: An Improved Multi-Output Gaussian Process RNN with Real-Time Validation for Early Sepsis Detection
Comments: Presented at Machine Learning for Healthcare 2017, Boston, MA
Subjects: Machine Learning (stat.ML); Applications (stat.AP); Methodology (stat.ME)

Sepsis is a poorly understood and potentially life-threatening complication that can occur as a result of infection. Early detection and treatment improves patient outcomes, and as such it poses an important challenge in medicine. In this work, we develop a flexible classifier that leverages streaming lab results, vitals, and medications to predict sepsis before it occurs. We model patient clinical time series with multi-output Gaussian processes, maintaining uncertainty about the physiological state of a patient while also imputing missing values. The mean function takes into account the effects of medications administered on the trajectories of the physiological variables. Latent function values from the Gaussian process are then fed into a deep recurrent neural network to classify patient encounters as septic or not, and the overall model is trained end-to-end using back-propagation. We train and validate our model on a large dataset of 18 months of heterogeneous inpatient stays from the Duke University Health System, and develop a new "real-time" validation scheme for simulating the performance of our model as it will actually be used. Our proposed method substantially outperforms clinical baselines, and improves on a previous related model for detecting sepsis. Our model's predictions will be displayed in a real-time analytics dashboard to be used by a sepsis rapid response team to help detect and improve treatment of sepsis.

[9]  arXiv:1708.05895 [pdf, other]
Title: Identifying down and up-regulated chromosome regions using RNA-Seq data
Subjects: Applications (stat.AP)

The number of studies dealing with RNA-Seq data analysis has experienced a fast increase in the past years making this type of gene expression a strong competitor to the DNA microarrays. This paper proposes a Bayesian model to detect down and up-regulated chromosome regions using RNA-Seq data. The methodology is based on a recent work developed to detect up-regulated regions in the context of microarray data. A hidden Markov model is developed by considering a mixture of Gaussian distributions with ordered means in a way that first and last mixture components are supposed to accommodate the under and overexpressed genes, respectively. The model is flexible enough to efficiently deal with the highly irregular spaced configuration of the data by assuming a hierarchical Markov dependence structure. The analysis of four cancer data sets (breast, lung, ovarian and uterus) is presented. Results indicate that the proposed model is selective in determining the regulation status, robust with respect to prior specifications and provides tools for a global or local search of under and overexpressed chromosome regions.

[10]  arXiv:1708.05917 [pdf, ps, other]
Title: Accelerating Kernel Classifiers Through Borders Mapping
Authors: Peter Mills
Comments: Stuck even deeper in peer-review limbo
Subjects: Machine Learning (stat.ML); Learning (cs.LG)

Support vector machines (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernel-based classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 9 of them by factors as high as 88 times or more. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.

[11]  arXiv:1708.05931 [pdf]
Title: Innovations orthogonalization: a solution to the major pitfalls of EEG/MEG "leakage correction"
Comments: preprint, technical report, under license "Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)", this https URL
Subjects: Methodology (stat.ME); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)

The problem of interest here is the study of brain functional and effective connectivity based non-invasive EEG-MEG inverse solution time series. These signals generally have low spatial resolution, such that an estimated signal at any one site is an instantaneous linear mixture of the true, actual, unobserved signals across all cortical sites. False connectivity can result from analysis of these low-resolution signals. Recent efforts toward "unmixing" have been developed, under the name of "leakage correction". One recent noteworthy approach is that by Colclough et al (2015 NeuroImage, 117:439-448), which forces the inverse solution signals to have zero cross-correlation at lag zero. One goal is to show that Colclough's method produces false human connectomes under very broad conditions. The second major goal is to develop a new solution, that appropriately "unmixes" the inverse solution signals, based on innovations orthogonalization. The new method first fits a multivariate autoregression to the inverse solution signals, giving the mixed innovations. Second, the mixed innovations are orthogonalized. Third, the mixed and orthogonalized innovations allow the estimation of the "unmixing" matrix, which is then finally used to "unmix" the inverse solution signals. It is shown that under very broad conditions, the new method produces proper human connectomes, even when the signals are not generated by an autoregressive model.

[12]  arXiv:1708.05932 [pdf, other]
Title: Fundamental Limits of Weak Recovery with Applications to Phase Retrieval
Comments: 46 pages, 3 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)

In phase retrieval we want to recover an unknown signal $\boldsymbol x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i = |\langle\boldsymbol a_i,\boldsymbol x\rangle|^2+w_i$ where $\boldsymbol a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise. We ask the following weak recovery question: what is the minimum number of measurements $n$ needed to produce an estimator $\hat{\boldsymbol x}(\boldsymbol y)$ that is positively correlated with the signal $\boldsymbol x$? We consider the case of Gaussian vectors $\boldsymbol a_i$. We prove that - in the high-dimensional limit - a sharp phase transition takes place, and we locate the threshold in the regime of vanishingly small noise. For $n\le d-o(d)$ no estimator can do significantly better than random and achieve a strictly positive correlation. For $n\ge d+o(d)$ a simple spectral estimator achieves a positive correlation. Surprisingly, numerical simulations with the same spectral estimator demonstrate promising performances with realistic sensing matrices as well. Spectral methods are used to initialize non-convex optimization algorithms in phase retrieval, and our approach can boost performances in this setting as well.
Our impossibility result is based on classical information-theory arguments. The spectral algorithm computes the leading eigenvector of a weighted empirical covariance matrix. We obtain a sharp characterization of the spectral properties of this random matrix using tools from free probability and generalizing a recent result by Lu and Li. Both the upper and lower bound generalize beyond phase retrieval to measurements $y_i$ produced according to a generalized linear model.

[13]  arXiv:1708.05963 [pdf, ps, other]
Title: Neural Networks Compression for Language Modeling
Comments: Keywords: LSTM, RNN, language modeling, low-rank factorization, pruning, quantization. Published by Springer in the LNCS series, 7th International Conference on Pattern Recognition and Machine Intelligence, 2017
Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g, LSTM-based networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, low-rank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.

[14]  arXiv:1708.06077 [pdf, ps, other]
Title: ExSIS: Extended Sure Independence Screening for Ultrahigh-dimensional Linear Models
Comments: 22 pages (single-column version); 10 figures; submitted for journal publication
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)

Statistical inference can be computationally prohibitive in ultrahigh-dimensional linear models. Correlation-based variable screening, in which one leverages marginal correlations for removal of irrelevant variables from the model prior to statistical inference, can be used to overcome this challenge. Prior works on correlation-based variable screening either impose strong statistical priors on the linear model or assume specific post-screening inference methods. This paper first extends the analysis of correlation-based variable screening to arbitrary linear models and post-screening inference techniques. In particular, ($i$) it shows that a condition---termed the screening condition---is sufficient for successful correlation-based screening of linear models, and ($ii$) it provides insights into the dependence of marginal correlation-based screening on different problem parameters. Numerical experiments confirm that these insights are not mere artifacts of analysis; rather, they are reflective of the challenges associated with marginal correlation-based variable screening. Second, the paper explicitly derives the screening condition for two families of linear models, namely, sub-Gaussian linear models and arbitrary (random or deterministic) linear models. In the process, it establishes that---under appropriate conditions---it is possible to reduce the dimension of an ultrahigh-dimensional, arbitrary linear model to almost the sample size even when the number of active variables scales almost linearly with the sample size.

[15]  arXiv:1708.06152 [pdf, other]
Title: Physiological Gaussian Process Priors for the Hemodynamics in fMRI Analysis
Comments: 15 pages, 12 figures
Subjects: Applications (stat.AP)

Inference from fMRI data faces the challenge that the hemodynamic system, that relates the underlying neural activity to the observed BOLD fMRI signal, is not known. We propose a new Bayesian model for task fMRI data with the following features: (i) joint estimation of brain activity and the underlying hemodynamics, (ii) the hemodynamics is modeled nonparametrically with a Gaussian process (GP) prior guided by physiological information and (iii) the predicted BOLD is not necessarily generated by a linear time-invariant (LTI) system. We place a GP prior directly on the predicted BOLD time series, rather than on the hemodynamic response function as in previous literature. This allows us to incorporate physiological information via the GP prior mean in a flexible way. The prior mean function may be generated from a standard LTI system, based on a canonical hemodynamic response function, or a more elaborate physiological model such as the Balloon model. This gives us the nonparametric flexibility of the GP, but allows the posterior to fall back on the physiologically based prior when the data are weak. Results on simulated data show that even with an erroneous prior for the GP, the proposed model is still able to discriminate between active and non-active voxels in a satisfactory way. The proposed model is also applied to real fMRI data, where our Gaussian process model in several cases finds brain activity where a baseline model with fixed hemodynamics does not.

[16]  arXiv:1708.06160 [pdf]
Title: Economic Design of Memory-Type Control Charts: The Fallacy of the Formula Proposed by Lorenzen and Vance (1986)
Subjects: Applications (stat.AP); Computational Engineering, Finance, and Science (cs.CE); Mathematical Software (cs.MS); Optimization and Control (math.OC); Economics (q-fin.EC)

The memory-type control charts, such as EWMA and CUSUM, are powerful tools for detecting small quality changes in univariate and multivariate processes. Many papers on economic design of these control charts use the formula proposed by Lorenzen and Vance (1986) [Lorenzen, T. J., & Vance, L. C. (1986). The economic design of control charts: A unified approach. Technometrics, 28(1), 3-10, DOI: 10.2307/1269598]. This paper shows that this formula is not correct for memory-type control charts and its values can significantly deviate from the original values even if the ARL values used in this formula are accurately computed. Consequently, the use of this formula can result in charts that are not economically optimal. The formula is corrected for memory-type control charts, but unfortunately the modified formula is not a helpful tool from a computational perspective. We show that simulation-based optimization is a possible alternative method.

[17]  arXiv:1708.06235 [pdf, other]
Title: Deep Convolutional Neural Networks for Massive MIMO Fingerprint-Based Positioning
Comments: Accepted in the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 2017
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)

This paper provides an initial investigation on the application of convolutional neural networks (CNNs) for fingerprint-based positioning using measured massive MIMO channels. When represented in appropriate domains, massive MIMO channels have a sparse structure which can be efficiently learned by CNNs for positioning purposes. We evaluate the positioning accuracy of state-of-the-art CNNs with channel fingerprints generated from a channel model with a rich clustered structure: the COST 2100 channel model. We find that moderately deep CNNs can achieve fractional-wavelength positioning accuracies, provided that an enough representative data set is available for training.

[18]  arXiv:1708.06302 [pdf, other]
Title: A general framework for Vecchia approximations of Gaussian processes
Subjects: Methodology (stat.ME); Computation (stat.CO)

Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling observations as a GP plus an additive nugget or noise term, we propose a generalization of the Vecchia (1988) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large datasets but can lead to tremendous improvements in approximation accuracy over Vecchia's original approach. We provide several theoretical results, and conduct numerical comparisons. We conclude with guidelines for the use of Vecchia approximations.

[19]  arXiv:1708.06332 [pdf, other]
Title: Efficient Nonparametric Bayesian Inference For X-Ray Transforms
Comments: 39 pages, 6 figures
Subjects: Statistics Theory (math.ST); Analysis of PDEs (math.AP); Methodology (stat.ME)

We consider the statistical inverse problem of recovering a function $f: M \to \mathbb R$, where $M$ is a smooth compact Riemannian manifold with boundary, from measurements of general $X$-ray transforms $I_a(f)$ of $f$, corrupted by additive Gaussian noise. For $M$ equal to the unit disk with `flat' geometry and $a=0$ this reduces to the standard Radon transform, but our general setting allows for anisotropic media $M$ and can further model local `attenuation' effects -- both highly relevant in practical imaging problems such as SPECT tomography. We propose a nonparametric Bayesian inference approach based on standard Gaussian process priors for $f$. The posterior reconstruction of $f$ corresponds to a Tikhonov regulariser with a reproducing kernel Hilbert space norm penalty that does not require the calculation of the singular value decomposition of the forward operator $I_a$. We prove Bernstein-von Mises theorems that entail that posterior-based inferences such as credible sets are valid and optimal from a frequentist point of view for a large family of semi-parametric aspects of $f$. In particular we derive the asymptotic distribution of smooth linear functionals of the Tikhonov regulariser, which is shown to attain the semi-parametric Cram\'er-Rao information bound. The proofs rely on an invertibility result for the `Fisher information' operator $I_a^*I_a$ between suitable function spaces, a result of independent interest that relies on techniques from microlocal analysis. We illustrate the performance of the proposed method via simulations in various settings.

[20]  arXiv:1708.06337 [pdf, other]
Title: Nonlinear association structures in flexible Bayesian additive joint models
Subjects: Methodology (stat.ME)

Joint models of longitudinal and survival data have become an important tool for modeling associations between longitudinal biomarkers and event processes. This association, which is the effect of the marker on the log-hazard, is assumed to be linear in existing shared random effects models with this assumption usually remaining unchecked. We present an extended framework of flexible additive joint models that allows the estimation of nonlinear, covariate specific associations by making use of Bayesian P-splines. The ability to capture truly linear and nonlinear associations is assessed in simulations and illustrated on the widely studied biomedical data on the rare fatal liver disease primary biliary cirrhosis. Our joint models are estimated in a Bayesian framework using structured additive predictors allowing for great flexibility in the specification of smooth nonlinear, time-varying and random effects terms. The model is implemented in the R package bamlss to facilitate the application of this flexible joint model.

Cross-lists for Tue, 22 Aug 17

[21]  arXiv:1708.05757 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Identification of individual coherent sets associated with flow trajectories using Coherent Structure Coloring
Comments: In press at Chaos
Subjects: Fluid Dynamics (physics.flu-dyn); Dynamical Systems (math.DS); Machine Learning (stat.ML)

We present a method for identifying the coherent structures associated with individual Lagrangian flow trajectories even where only sparse particle trajectory data is available. The method, based on techniques in spectral graph theory, uses the Coherent Structure Coloring vector and associated eigenvectors to analyze the distance in higher-dimensional eigenspace between a selected reference trajectory and other tracer trajectories in the flow. By analyzing this distance metric in a hierarchical clustering, the coherent structure of which the reference particle is a member can be identified. This algorithm is proven successful in identifying coherent structures of varying complexities in canonical unsteady flows. Additionally, the method is able to assess the relative coherence of the associated structure in comparison to the surrounding flow. Although the method is demonstrated here in the context of fluid flow kinematics, the generality of the approach allows for its potential application to other unsupervised clustering problems in dynamical systems such as neuronal activity, gene expression, or social networks.

[22]  arXiv:1708.05859 (cross-list from math.PR) [pdf, ps, other]
Title: Decomposition of mean-field Gibbs distributions into product measures
Subjects: Probability (math.PR); Mathematical Physics (math-ph); Statistics Theory (math.ST)

We show that under a low complexity condition on the gradient of a Hamiltonian, Gibbs distributions on the Boolean hypercube are approximate mixtures of product measures whose probability vectors are critical points of an associated mean-field functional. This extends a previous work by the first author. As an application, we demonstrate how this framework helps characterize both Ising models satisfying a mean-field condition and the conditional distributions which arise in the emerging theory of nonlinear large deviations.

[23]  arXiv:1708.05866 (cross-list from cs.LG) [pdf, other]
Title: A Brief Survey of Deep Reinforcement Learning
Comments: To appear in IEEE Signal Processing Magazine, Special Issue on Deep Learning for Image Understanding
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.

[24]  arXiv:1708.05929 (cross-list from cs.LG) [pdf, other]
Title: X-PACS: eXPlaining Anomalies by Characterizing Subspaces
Comments: 10 pages, 5 figures, 5 tables
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Anomaly detection has numerous critical applications in finance, security, etc. and has been vastly studied. In this paper, we tap into a gap in the literature and consider a complementary problem: anomaly description. Interpretation of anomalies has important implications for decision makers, from being able to troubleshoot and prioritize their actions to making policy changes for prevention. We present a new method called X-PACS which "reverse-engineers" the known anomalies in a dataset by identifying a few anomalous patterns that they form along with the characterizing subspace of features that separates them from normal instances. From a descriptive data mining perspective, our solution has five key desired properties. It can unearth anomalous patterns (i) of multiple different types, (ii) hidden in arbitrary subspaces of a high dimensional space, (iii) interpretable by the end-users, (iv) succinct, providing the shortest data description, and finally (v) different from normal patterns of the data. There is no existing work on anomaly description that satisfy all of these desiderata simultaneously. While not our primary goal, anomalous patterns X-PACS finds can further be seen as multiple, interpretable "signatures" and can be used for detection. We show the effectiveness of X-PACS in explanation as well as detection tasks on 9 real-world datasets.

[25]  arXiv:1708.05978 (cross-list from cs.LG) [pdf, other]
Title: Stochastic Primal-Dual Proximal ExtraGradient Descent for Compositely Regularized Optimization
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We consider a wide range of regularized stochastic minimization problems with two regularization terms, one of which is composed with a linear function. This optimization model abstracts a number of important applications in artificial intelligence and machine learning, such as fused Lasso, fused logistic regression, and a class of graph-guided regularized minimization. The computational challenges of this model are in two folds. On one hand, the closed-form solution of the proximal mapping associated with the composed regularization term or the expected objective function is not available. On the other hand, the calculation of the full gradient of the expectation in the objective is very expensive when the number of input data samples is considerably large. To address these issues, we propose a stochastic variant of extra-gradient type methods, namely \textsf{Stochastic Primal-Dual Proximal ExtraGradient descent (SPDPEG)}, and analyze its convergence property for both convex and strongly convex objectives. For general convex objectives, the uniformly average iterates generated by \textsf{SPDPEG} converge in expectation with $O(1/\sqrt{t})$ rate. While for strongly convex objectives, the uniformly and non-uniformly average iterates generated by \textsf{SPDPEG} converge with $O(\log(t)/t)$ and $O(1/t)$ rates, respectively. The order of the rate of the proposed algorithm is known to match the best convergence rate for first-order stochastic algorithms. Experiments on fused logistic regression and graph-guided regularized logistic regression problems show that the proposed algorithm performs very efficiently and consistently outperforms other competing algorithms.

[26]  arXiv:1708.06018 (cross-list from math.NA) [pdf, ps, other]
Title: Conversion of Mersenne Twister to double-precision floating-point numbers
Authors: Shin Harase
Comments: 14 pages
Subjects: Numerical Analysis (math.NA); Numerical Analysis (cs.NA); Computation (stat.CO)

The 32-bit Mersenne Twister generator MT19937 is a widely used random number generator. To generate numbers with more than 32 bits in bit length, and particularly when converting into 53-bit double-precision floating-point numbers in $[0,1)$ in the IEEE 754 format, the typical implementation concatenates two successive 32-bit integers and divides them by a power of $2$. In this case, the 32-bit MT19937 is optimized in terms of its equidistribution properties (the so-called dimension of equidistribution with $v$-bit accuracy) under the assumption that one will mainly be using 32-bit output values, and hence the concatenation sometimes degrades the dimension of equidistribution compared with the simple use of 32-bit outputs. In this paper, we analyze such phenomena by investigating hidden $\mathbb{F}_2$-linear relations among the bits of high-dimensional outputs. Accordingly, we report that MT19937 with a specific lag set fails several statistical tests, such as the overlapping collision test, matrix rank test, and Hamming independence test.

[27]  arXiv:1708.06020 (cross-list from cs.LG) [pdf, ps, other]
Title: Improving Deep Learning using Generic Data Augmentation
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating the training set with label preserving transformations. Recently there has been extensive use of generic data augmentation to improve Convolutional Neural Network (CNN) task performance. This study benchmarks various popular data augmentation schemes to allow researchers to make informed decisions as to which training methods are most appropriate for their data sets. Various geometric and photometric schemes are evaluated on a coarse-grained data set using a relatively simple CNN. Experimental results, run using 4-fold cross-validation and reported in terms of Top-1 and Top-5 accuracy, indicate that cropping in geometric augmentation significantly increases CNN task performance.

[28]  arXiv:1708.06040 (cross-list from cs.AI) [pdf, other]
Title: Neural Block Sampling
Comments: 10 pages
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

Efficient Monte Carlo inference often requires manual construction of model-specific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler's ability to escape local modes yields higher final F1 scores than single-site Gibbs.

[29]  arXiv:1708.06243 (cross-list from cs.LG) [pdf]
Title: General Backpropagation Algorithm for Training Second-order Neural Networks
Comments: 5 pages, 7 figures, 19 references
Subjects: Learning (cs.LG); Machine Learning (stat.ML)

The artificial neural network is a popular framework in machine learning. To empower individual neurons, we recently suggested that the current type of neurons could be upgraded to 2nd order counterparts, in which the linear operation between inputs to a neuron and the associated weights is replaced with a nonlinear quadratic operation. A single 2nd order neurons already has a strong nonlinear modeling ability, such as implementing basic fuzzy logic operations. In this paper, we develop a general backpropagation (BP) algorithm to train the network consisting of 2nd-order neurons. The numerical studies are performed to verify of the generalized BP algorithm.

[30]  arXiv:1708.06246 (cross-list from cs.AI) [pdf, other]
Title: Comparative Benchmarking of Causal Discovery Techniques
Comments: arXiv admin note: text overlap with arXiv:1506.07669, arXiv:1611.03977 by other authors
Subjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accuracy, and (c) accuracy of counterfactual inference. For (b) and (c) we train causal Bayesian networks with structures as predicted by each causal discovery technique to carry out counterfactual or standard predictive inference. We compare causal algorithms on two pub- licly available and one simulated datasets having different sample sizes: small, medium and large. Experiments show that structural accuracy of a technique does not necessarily correlate with higher accuracy of inferencing tasks. Fur- ther, surveyed structure learning algorithms do not perform well in terms of structural accuracy in case of datasets having large number of variables.

[31]  arXiv:1708.06250 (cross-list from cs.CV) [pdf, other]
Title: Pillar Networks++: Distributed non-parametric deep and wide networks
Comments: arXiv admin note: substantial text overlap with arXiv:1707.06923
Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Computation (stat.CO); Machine Learning (stat.ML)

In recent work, it was shown that combining multi-kernel based support vector machines (SVMs) can lead to near state-of-the-art performance on an action recognition dataset (HMDB-51 dataset). This was 0.4\% lower than frameworks that used hand-crafted features in addition to the deep convolutional feature extractors. In the present work, we show that combining distributed Gaussian Processes with multi-stream deep convolutional neural networks (CNN) alleviate the need to augment a neural network with hand-crafted features. In contrast to prior work, we treat each deep neural convolutional network as an expert wherein the individual predictions (and their respective uncertainties) are combined into a Product of Experts (PoE) framework.

Replacements for Tue, 22 Aug 17

[32]  arXiv:1509.02230 (replaced) [pdf, other]
Title: Properties of the Affine Invariant Ensemble Sampler in high dimensions
Comments: 13 pages, 5 figures
Subjects: Computation (stat.CO); Data Analysis, Statistics and Probability (physics.data-an)
[33]  arXiv:1512.08191 (replaced) [pdf, ps, other]
Title: Estimation of Kullback-Leibler losses for noisy recovery problems within the exponential family
Authors: Charles-Alban Deledalle (IMB)
Subjects: Applications (stat.AP); Methodology (stat.ME)
[34]  arXiv:1605.08285 (replaced) [pdf, other]
Title: Solving Systems of Random Quadratic Equations via Truncated Amplitude Flow
Comments: 37 Pages, 16 figures
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Optimization and Control (math.OC)
[35]  arXiv:1605.08299 (replaced) [pdf, other]
Title: A General Family of Trimmed Estimators for Robust High-dimensional Data Analysis
Comments: 39 pages, 6 figures
Subjects: Machine Learning (stat.ML)
[36]  arXiv:1607.02738 (replaced) [pdf, other]
Title: Magnetic Hamiltonian Monte Carlo
Comments: 34th International Conference on Machine Learning (ICML 2017)
Subjects: Machine Learning (stat.ML)
[37]  arXiv:1611.09588 (replaced) [pdf, other]
Title: Level sets and drift estimation for reflected Brownian motion with drift
Subjects: Statistics Theory (math.ST)
[38]  arXiv:1611.10242 (replaced) [pdf, other]
Title: Likelihood-free inference by ratio estimation
Subjects: Machine Learning (stat.ML); Computation (stat.CO); Methodology (stat.ME)
[39]  arXiv:1701.04889 (replaced) [pdf, other]
Title: Efficient and Adaptive Linear Regression in Semi-Supervised Settings
Comments: 51 pages; Revised version - to appear in The Annals of Statistics
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
[40]  arXiv:1701.05230 (replaced) [pdf, other]
Title: Surrogate Aided Unsupervised Recovery of Sparse Signals in Single Index Models for Binary Outcomes
Comments: 43 pages; Revised version with additional results and discussions
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
[41]  arXiv:1701.05654 (replaced) [pdf, other]
Title: Bayesian Network Learning via Topological Order
Subjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS)
[42]  arXiv:1702.02258 (replaced) [pdf, other]
Title: Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections
Comments: accepted to ICCV 2017 (PeopleCap)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Machine Learning (stat.ML)
[43]  arXiv:1702.08435 (replaced) [pdf, other]
Title: Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models
Comments: Preprint submitted to the IEEE Transactions on Signal Processing
Subjects: Systems and Control (cs.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
[44]  arXiv:1704.07050 (replaced) [pdf, other]
Title: Using Global Constraints and Reranking to Improve Cognates Detection
Comments: 10 pages, 6 figures, 6 tables; published in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1983-1992, Vancouver, Canada, July 2017
Journal-ref: In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1983-1992, Vancouver, Canada, July 2017. Association for Computational Linguistics
Subjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
[45]  arXiv:1705.03938 (replaced) [pdf, other]
Title: A three-dimensional statistical model for imaged microstructures of porous polymer films
Subjects: Applications (stat.AP)
[46]  arXiv:1705.07120 (replaced) [pdf, other]
Title: VAE with a VampPrior
Comments: 16 pages, new results (two additional datasets) comparing to the previous version + the text was re-organized and re-written
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[47]  arXiv:1705.08417 (replaced) [pdf, other]
Title: Reinforcement Learning with a Corrupted Reward Channel
Comments: A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy track
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
[48]  arXiv:1706.09152 (replaced) [pdf, other]
Title: Generative Bridging Network in Neural Sequence Prediction
Comments: A submission for AAAI 2018
Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
[49]  arXiv:1707.01227 (replaced) [pdf, other]
Title: Exponential random graphs behave like mixtures of stochastic block models
Subjects: Probability (math.PR); Social and Information Networks (cs.SI); Mathematical Physics (math-ph); Combinatorics (math.CO); Statistics Theory (math.ST)
[50]  arXiv:1707.03017 (replaced) [pdf, other]
Title: Learning Visual Reasoning Without Strong Priors
Comments: This work was presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
[51]  arXiv:1707.07716 (replaced) [pdf, other]
Title: Stochastic Gradient Descent for Relational Logistic Regression via Partial Network Crawls
Comments: 7 pages, 3 figures, Proceedings of the Seventh International Workshop on Statistical Relational AI (StarAI 2017)
Subjects: Machine Learning (stat.ML); Learning (cs.LG)
[52]  arXiv:1708.01383 (replaced) [pdf, other]
Title: Convergence of Variance-Reduced Stochastic Learning under Random Reshuffling
Subjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[53]  arXiv:1708.01666 (replaced) [pdf]
Title: An Effective Training Method For Deep Convolutional Neural Network
Subjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[54]  arXiv:1708.03272 (replaced) [pdf, other]
Title: Fast and accurate Bayesian model criticism and conflict diagnostics using R-INLA
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)
[ total of 54 entries: 1-54 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)