Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 22 Aug 17
 [1] arXiv:1708.05712 [pdf]

Title: Extensions of MorseSmale Regression with Application to Actuarial ScienceAuthors: Colleen M. FarrellyComments: 14 pages, 10 figuresSubjects: Machine Learning (stat.ML); Applications (stat.AP)
The problem of subgroups is ubiquitous in scientific research (ex. disease heterogeneity, spatial distributions in ecology...), and piecewise regression is one way to deal with this phenomenon. MorseSmale regression offers a way to partition the regression function based on level sets of a defined function and that function's basins of attraction. This topologicallybased piecewise regression algorithm has shown promise in its initial applications, but the current implementation in the literature has been limited to elastic net and generalized linear regression. It is possible that nonparametric methods, such as random forest or conditional inference trees, may provide better prediction and insight through modeling interaction terms and other nonlinear relationships between predictors and a given outcome.
This study explores the use of several machine learning algorithms within a MorseSmale piecewise regression framework, including boosted regression with linear baselearners, homotopybased LASSO, conditional inference trees, random forest, and a wide neural network framework called extreme learning machines. Simulations on Tweedie regression problems with varying Tweedie parameter and dispersion suggest that many machine learning approaches to MorseSmale piecewise regression improve the original algorithm's performance, particularly for outcomes with lower dispersion and linear or a mix of linear and nonlinear predictor relationships. On a real actuarial problem, several of these new algorithms perform as good as or better than the original MorseSmale regression algorithm, and most provide information on the nature of predictor relationships within each partition to provide insight into differences between dataset partitions.  [2] arXiv:1708.05715 [pdf, other]

Title: The Stochastic Replica Approach to Machine Learning: Stability and Parameter OptimizationComments: 30 pages, 42 figuresSubjects: Machine Learning (stat.ML); Statistical Mechanics (condmat.statmech); Data Analysis, Statistics and Probability (physics.dataan)
We introduce a statistical physics inspired supervised machine learning algorithm for classification and regression problems. The method is based on the invariances or stability of predicted results when known data is represented as expansions in terms of various stochastic functions. The algorithm predicts the classification/regression values of new data by combining (via voting) the outputs of these numerous linear expansions in randomly chosen functions. The few parameters (typically only one parameter is used in all studied examples) that this model has may be automatically optimized. The algorithm has been tested on 10 diverse training data sets of various types and feature space dimensions. It has been shown to consistently exhibit high accuracy and readily allow for optimization of parameters, while simultaneously avoiding pitfalls of existing algorithms such as those associated with class imbalance. We very briefly speculate on whether spatial coordinates in physical theories may be viewed as emergent "features" that enable a robust machine learning type description of data with generic low order smooth functions.
 [3] arXiv:1708.05768 [pdf, other]

Title: DataDriven Tree Transforms and MetricsComments: 16 pages, 5 figures. Accepted to IEEE Transactions on Signal and Information Processing over NetworksSubjects: Machine Learning (stat.ML); Learning (cs.LG); Quantitative Methods (qbio.QM)
We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale datadriven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the codependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer subtypes and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.
 [4] arXiv:1708.05789 [pdf, other]

Title: Semisupervised Conditional GANsSubjects: Machine Learning (stat.ML); Learning (cs.LG)
We introduce a new model for building conditional generative models in a semisupervised setting to conditionally generate data given attributes by adapting the GAN framework. The proposed semisupervised GAN (SSGAN) model uses a pair of stacked discriminators to learn the marginal distribution of the data, and the conditional distribution of the attributes given the data respectively. In the semisupervised setting, the marginal distribution (which is often harder to learn) is learned from the labeled + unlabeled data, and the conditional distribution is learned purely from the labeled data. Our experimental results demonstrate that this model performs significantly better compared to existing semisupervised conditional GAN models.
 [5] arXiv:1708.05836 [pdf, ps, other]

Title: Common change point estimation in panel data from the least squares and maximum likelihood viewpointsSubjects: Statistics Theory (math.ST)
We establish the convergence rates and asymptotic distributions of the common break changepoint estimators, obtained by least squares and maximum likelihood in panel data models and compare their asymptotic variances. Our model assumptions accommodate a variety of commonly encountered probability distributions and, in particular, models of particular interest in econometrics beyond the commonly analyzed Gaussian model, including the zeroinflated Poisson model for count data, and the probit and tobit models. We also provide novel results for time dependent data in the signalplusnoise model, with emphasis on a wide array of noise processes, including Gaussian process, MA$(\infty)$ and $m$dependent processes. The obtained results show that maximum likelihood estimation requires a stronger signaltonoise model identifiability condition compared to its least squares counterpart. Finally, since there are three different asymptotic regimes that depend on the behavior of the norm difference of the model parameters before and after the change point, which cannot be realistically assumed to be known, we develop a novel data driven adaptive procedure that provides valid confidence intervals for the common break, without requiring a priori knowledge of the asymptotic regime the problem falls in.
 [6] arXiv:1708.05840 [pdf, other]

Title: A Data and ModelParallel, Distributed and Scalable Framework for Training of Deep Networks in Apache SparkComments: 12 pagesSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)
Training deep networks is expensive and timeconsuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs in Apache Spark. The framework implements both Data Parallelism and Model Parallelism making it suitable to use for deep networks which require huge training data and model parameters which are too big to fit into the memory of a single machine. It can be scaled easily over a cluster of cheap commodity hardware to attain significant speedup and obtain better results making it quite economical as compared to farm of GPUs and supercomputers. We have proposed a new algorithm for training of deep networks for the case when the network is partitioned across the machines (Model Parallelism) along with detailed cost analysis and proof of convergence of the same. We have developed implementations for FullyConnected Feedforward Networks, Convolutional Neural Networks, Recurrent Neural Networks and Long ShortTerm Memory architectures. We present the results of extensive simulations demonstrating the speedup and accuracy obtained by our framework for different sizes of the data and model parameters with variation in the number of worker cores/partitions; thereby showing that our proposed framework can achieve significant speedup (upto 11X for CNN) and is also quite scalable.
 [7] arXiv:1708.05879 [pdf, other]

Title: Regularized Estimation and Testing for HighDimensional MultiBlock VectorAutoregressive ModelsSubjects: Methodology (stat.ME)
Dynamical systems comprising of multiple components that can be partitioned into distinct blocks originate in many scientific areas. A pertinent example is the interactions between financial assets and selected macroeconomic indicators, which has been studied at aggregate levele.g. a stock index and an employment indexextensively in the macroeconomics literature. A key shortcoming of this approach is that it ignores potential influences from other related components (e.g. Gross Domestic Product) that may exert influence on the system's dynamics and structure and thus produces incorrect results. To mitigate this issue, we consider a multiblock linear dynamical system with Grangercausal ordering between blocks, wherein the blocks' temporal dynamics are described by vector autoregressive processes and are influenced by blocks higher in the system hierarchy. We derive the maximum likelihood estimator for the posited model for Gaussian data in the highdimensional setting based on appropriate regularization schemes for the parameters of the block components. To optimize the underlying nonconvex likelihood function, we develop an iterative algorithm with convergence guarantees. We establish theoretical properties of the maximum likelihood estimates, leveraging the decomposability of the regularizers and a careful analysis of the iterates. Finally, we develop testing procedures for the null hypothesis of whether a block "Grangercauses" another block of variables. The performance of the model and the testing procedures are evaluated on synthetic data, and illustrated on a data set involving logreturns of the US S&P100 component stocks and key macroeconomic variables for the 200116 period.
 [8] arXiv:1708.05894 [pdf, other]

Title: An Improved MultiOutput Gaussian Process RNN with RealTime Validation for Early Sepsis DetectionAuthors: Joseph Futoma, Sanjay Hariharan, Mark Sendak, Nathan Brajer, Meredith Clement, Armando Bedoya, Cara O'Brien, Katherine HellerComments: Presented at Machine Learning for Healthcare 2017, Boston, MASubjects: Machine Learning (stat.ML); Applications (stat.AP); Methodology (stat.ME)
Sepsis is a poorly understood and potentially lifethreatening complication that can occur as a result of infection. Early detection and treatment improves patient outcomes, and as such it poses an important challenge in medicine. In this work, we develop a flexible classifier that leverages streaming lab results, vitals, and medications to predict sepsis before it occurs. We model patient clinical time series with multioutput Gaussian processes, maintaining uncertainty about the physiological state of a patient while also imputing missing values. The mean function takes into account the effects of medications administered on the trajectories of the physiological variables. Latent function values from the Gaussian process are then fed into a deep recurrent neural network to classify patient encounters as septic or not, and the overall model is trained endtoend using backpropagation. We train and validate our model on a large dataset of 18 months of heterogeneous inpatient stays from the Duke University Health System, and develop a new "realtime" validation scheme for simulating the performance of our model as it will actually be used. Our proposed method substantially outperforms clinical baselines, and improves on a previous related model for detecting sepsis. Our model's predictions will be displayed in a realtime analytics dashboard to be used by a sepsis rapid response team to help detect and improve treatment of sepsis.
 [9] arXiv:1708.05895 [pdf, other]

Title: Identifying down and upregulated chromosome regions using RNASeq dataSubjects: Applications (stat.AP)
The number of studies dealing with RNASeq data analysis has experienced a fast increase in the past years making this type of gene expression a strong competitor to the DNA microarrays. This paper proposes a Bayesian model to detect down and upregulated chromosome regions using RNASeq data. The methodology is based on a recent work developed to detect upregulated regions in the context of microarray data. A hidden Markov model is developed by considering a mixture of Gaussian distributions with ordered means in a way that first and last mixture components are supposed to accommodate the under and overexpressed genes, respectively. The model is flexible enough to efficiently deal with the highly irregular spaced configuration of the data by assuming a hierarchical Markov dependence structure. The analysis of four cancer data sets (breast, lung, ovarian and uterus) is presented. Results indicate that the proposed model is selective in determining the regulation status, robust with respect to prior specifications and provides tools for a global or local search of under and overexpressed chromosome regions.
 [10] arXiv:1708.05917 [pdf, ps, other]

Title: Accelerating Kernel Classifiers Through Borders MappingAuthors: Peter MillsComments: Stuck even deeper in peerreview limboSubjects: Machine Learning (stat.ML); Learning (cs.LG)
Support vector machines (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernelbased classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 9 of them by factors as high as 88 times or more. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.
 [11] arXiv:1708.05931 [pdf]

Title: Innovations orthogonalization: a solution to the major pitfalls of EEG/MEG "leakage correction"Authors: Roberto D. PascualMarqui, Rolando J. Biscay, Jorge BoschBayard, Pascal Faber, Toshihiko Kinoshita, Kieko Kochi, Patricia Milz, Keiichiro Nishida, Masafumi YoshimuraComments: preprint, technical report, under license "AttributionNonCommercialNoDerivatives 4.0 International (CC BYNCND 4.0)", this https URLSubjects: Methodology (stat.ME); Neurons and Cognition (qbio.NC); Quantitative Methods (qbio.QM)
The problem of interest here is the study of brain functional and effective connectivity based noninvasive EEGMEG inverse solution time series. These signals generally have low spatial resolution, such that an estimated signal at any one site is an instantaneous linear mixture of the true, actual, unobserved signals across all cortical sites. False connectivity can result from analysis of these lowresolution signals. Recent efforts toward "unmixing" have been developed, under the name of "leakage correction". One recent noteworthy approach is that by Colclough et al (2015 NeuroImage, 117:439448), which forces the inverse solution signals to have zero crosscorrelation at lag zero. One goal is to show that Colclough's method produces false human connectomes under very broad conditions. The second major goal is to develop a new solution, that appropriately "unmixes" the inverse solution signals, based on innovations orthogonalization. The new method first fits a multivariate autoregression to the inverse solution signals, giving the mixed innovations. Second, the mixed innovations are orthogonalized. Third, the mixed and orthogonalized innovations allow the estimation of the "unmixing" matrix, which is then finally used to "unmix" the inverse solution signals. It is shown that under very broad conditions, the new method produces proper human connectomes, even when the signals are not generated by an autoregressive model.
 [12] arXiv:1708.05932 [pdf, other]

Title: Fundamental Limits of Weak Recovery with Applications to Phase RetrievalComments: 46 pages, 3 figuresSubjects: Machine Learning (stat.ML); Information Theory (cs.IT)
In phase retrieval we want to recover an unknown signal $\boldsymbol x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i = \langle\boldsymbol a_i,\boldsymbol x\rangle^2+w_i$ where $\boldsymbol a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise. We ask the following weak recovery question: what is the minimum number of measurements $n$ needed to produce an estimator $\hat{\boldsymbol x}(\boldsymbol y)$ that is positively correlated with the signal $\boldsymbol x$? We consider the case of Gaussian vectors $\boldsymbol a_i$. We prove that  in the highdimensional limit  a sharp phase transition takes place, and we locate the threshold in the regime of vanishingly small noise. For $n\le do(d)$ no estimator can do significantly better than random and achieve a strictly positive correlation. For $n\ge d+o(d)$ a simple spectral estimator achieves a positive correlation. Surprisingly, numerical simulations with the same spectral estimator demonstrate promising performances with realistic sensing matrices as well. Spectral methods are used to initialize nonconvex optimization algorithms in phase retrieval, and our approach can boost performances in this setting as well.
Our impossibility result is based on classical informationtheory arguments. The spectral algorithm computes the leading eigenvector of a weighted empirical covariance matrix. We obtain a sharp characterization of the spectral properties of this random matrix using tools from free probability and generalizing a recent result by Lu and Li. Both the upper and lower bound generalize beyond phase retrieval to measurements $y_i$ produced according to a generalized linear model.  [13] arXiv:1708.05963 [pdf, ps, other]

Title: Neural Networks Compression for Language ModelingComments: Keywords: LSTM, RNN, language modeling, lowrank factorization, pruning, quantization. Published by Springer in the LNCS series, 7th International Conference on Pattern Recognition and Machine Intelligence, 2017Subjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this paper, we consider several compression techniques for the language modeling problem based on recurrent neural networks (RNNs). It is known that conventional RNNs, e.g, LSTMbased networks in language modeling, are characterized with either high space complexity or substantial inference time. This problem is especially crucial for mobile applications, in which the constant interaction with the remote server is inappropriate. By using the Penn Treebank (PTB) dataset we compare pruning, quantization, lowrank factorization, tensor train decomposition for LSTM networks in terms of model size and suitability for fast inference.
 [14] arXiv:1708.06077 [pdf, ps, other]

Title: ExSIS: Extended Sure Independence Screening for Ultrahighdimensional Linear ModelsComments: 22 pages (singlecolumn version); 10 figures; submitted for journal publicationSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (stat.ML)
Statistical inference can be computationally prohibitive in ultrahighdimensional linear models. Correlationbased variable screening, in which one leverages marginal correlations for removal of irrelevant variables from the model prior to statistical inference, can be used to overcome this challenge. Prior works on correlationbased variable screening either impose strong statistical priors on the linear model or assume specific postscreening inference methods. This paper first extends the analysis of correlationbased variable screening to arbitrary linear models and postscreening inference techniques. In particular, ($i$) it shows that a conditiontermed the screening conditionis sufficient for successful correlationbased screening of linear models, and ($ii$) it provides insights into the dependence of marginal correlationbased screening on different problem parameters. Numerical experiments confirm that these insights are not mere artifacts of analysis; rather, they are reflective of the challenges associated with marginal correlationbased variable screening. Second, the paper explicitly derives the screening condition for two families of linear models, namely, subGaussian linear models and arbitrary (random or deterministic) linear models. In the process, it establishes thatunder appropriate conditionsit is possible to reduce the dimension of an ultrahighdimensional, arbitrary linear model to almost the sample size even when the number of active variables scales almost linearly with the sample size.
 [15] arXiv:1708.06152 [pdf, other]

Title: Physiological Gaussian Process Priors for the Hemodynamics in fMRI AnalysisComments: 15 pages, 12 figuresSubjects: Applications (stat.AP)
Inference from fMRI data faces the challenge that the hemodynamic system, that relates the underlying neural activity to the observed BOLD fMRI signal, is not known. We propose a new Bayesian model for task fMRI data with the following features: (i) joint estimation of brain activity and the underlying hemodynamics, (ii) the hemodynamics is modeled nonparametrically with a Gaussian process (GP) prior guided by physiological information and (iii) the predicted BOLD is not necessarily generated by a linear timeinvariant (LTI) system. We place a GP prior directly on the predicted BOLD time series, rather than on the hemodynamic response function as in previous literature. This allows us to incorporate physiological information via the GP prior mean in a flexible way. The prior mean function may be generated from a standard LTI system, based on a canonical hemodynamic response function, or a more elaborate physiological model such as the Balloon model. This gives us the nonparametric flexibility of the GP, but allows the posterior to fall back on the physiologically based prior when the data are weak. Results on simulated data show that even with an erroneous prior for the GP, the proposed model is still able to discriminate between active and nonactive voxels in a satisfactory way. The proposed model is also applied to real fMRI data, where our Gaussian process model in several cases finds brain activity where a baseline model with fixed hemodynamics does not.
 [16] arXiv:1708.06160 [pdf]

Title: Economic Design of MemoryType Control Charts: The Fallacy of the Formula Proposed by Lorenzen and Vance (1986)Subjects: Applications (stat.AP); Computational Engineering, Finance, and Science (cs.CE); Mathematical Software (cs.MS); Optimization and Control (math.OC); Economics (qfin.EC)
The memorytype control charts, such as EWMA and CUSUM, are powerful tools for detecting small quality changes in univariate and multivariate processes. Many papers on economic design of these control charts use the formula proposed by Lorenzen and Vance (1986) [Lorenzen, T. J., & Vance, L. C. (1986). The economic design of control charts: A unified approach. Technometrics, 28(1), 310, DOI: 10.2307/1269598]. This paper shows that this formula is not correct for memorytype control charts and its values can significantly deviate from the original values even if the ARL values used in this formula are accurately computed. Consequently, the use of this formula can result in charts that are not economically optimal. The formula is corrected for memorytype control charts, but unfortunately the modified formula is not a helpful tool from a computational perspective. We show that simulationbased optimization is a possible alternative method.
 [17] arXiv:1708.06235 [pdf, other]

Title: Deep Convolutional Neural Networks for Massive MIMO FingerprintBased PositioningComments: Accepted in the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 2017Subjects: Machine Learning (stat.ML); Information Theory (cs.IT)
This paper provides an initial investigation on the application of convolutional neural networks (CNNs) for fingerprintbased positioning using measured massive MIMO channels. When represented in appropriate domains, massive MIMO channels have a sparse structure which can be efficiently learned by CNNs for positioning purposes. We evaluate the positioning accuracy of stateoftheart CNNs with channel fingerprints generated from a channel model with a rich clustered structure: the COST 2100 channel model. We find that moderately deep CNNs can achieve fractionalwavelength positioning accuracies, provided that an enough representative data set is available for training.
 [18] arXiv:1708.06302 [pdf, other]

Title: A general framework for Vecchia approximations of Gaussian processesSubjects: Methodology (stat.ME); Computation (stat.CO)
Gaussian processes (GPs) are commonly used as models for functions, time series, and spatial fields, but they are computationally infeasible for large datasets. Focusing on the typical setting of modeling observations as a GP plus an additive nugget or noise term, we propose a generalization of the Vecchia (1988) approach as a framework for GP approximations. We show that our general Vecchia approach contains many popular existing GP approximations as special cases, allowing for comparisons among the different methods within a unified framework. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose a novel sparse general Vecchia approximation, which ensures computational feasibility for large datasets but can lead to tremendous improvements in approximation accuracy over Vecchia's original approach. We provide several theoretical results, and conduct numerical comparisons. We conclude with guidelines for the use of Vecchia approximations.
 [19] arXiv:1708.06332 [pdf, other]

Title: Efficient Nonparametric Bayesian Inference For XRay TransformsComments: 39 pages, 6 figuresSubjects: Statistics Theory (math.ST); Analysis of PDEs (math.AP); Methodology (stat.ME)
We consider the statistical inverse problem of recovering a function $f: M \to \mathbb R$, where $M$ is a smooth compact Riemannian manifold with boundary, from measurements of general $X$ray transforms $I_a(f)$ of $f$, corrupted by additive Gaussian noise. For $M$ equal to the unit disk with `flat' geometry and $a=0$ this reduces to the standard Radon transform, but our general setting allows for anisotropic media $M$ and can further model local `attenuation' effects  both highly relevant in practical imaging problems such as SPECT tomography. We propose a nonparametric Bayesian inference approach based on standard Gaussian process priors for $f$. The posterior reconstruction of $f$ corresponds to a Tikhonov regulariser with a reproducing kernel Hilbert space norm penalty that does not require the calculation of the singular value decomposition of the forward operator $I_a$. We prove Bernsteinvon Mises theorems that entail that posteriorbased inferences such as credible sets are valid and optimal from a frequentist point of view for a large family of semiparametric aspects of $f$. In particular we derive the asymptotic distribution of smooth linear functionals of the Tikhonov regulariser, which is shown to attain the semiparametric Cram\'erRao information bound. The proofs rely on an invertibility result for the `Fisher information' operator $I_a^*I_a$ between suitable function spaces, a result of independent interest that relies on techniques from microlocal analysis. We illustrate the performance of the proposed method via simulations in various settings.
 [20] arXiv:1708.06337 [pdf, other]

Title: Nonlinear association structures in flexible Bayesian additive joint modelsSubjects: Methodology (stat.ME)
Joint models of longitudinal and survival data have become an important tool for modeling associations between longitudinal biomarkers and event processes. This association, which is the effect of the marker on the loghazard, is assumed to be linear in existing shared random effects models with this assumption usually remaining unchecked. We present an extended framework of flexible additive joint models that allows the estimation of nonlinear, covariate specific associations by making use of Bayesian Psplines. The ability to capture truly linear and nonlinear associations is assessed in simulations and illustrated on the widely studied biomedical data on the rare fatal liver disease primary biliary cirrhosis. Our joint models are estimated in a Bayesian framework using structured additive predictors allowing for great flexibility in the specification of smooth nonlinear, timevarying and random effects terms. The model is implemented in the R package bamlss to facilitate the application of this flexible joint model.
Crosslists for Tue, 22 Aug 17
 [21] arXiv:1708.05757 (crosslist from physics.fludyn) [pdf, ps, other]

Title: Identification of individual coherent sets associated with flow trajectories using Coherent Structure ColoringComments: In press at ChaosSubjects: Fluid Dynamics (physics.fludyn); Dynamical Systems (math.DS); Machine Learning (stat.ML)
We present a method for identifying the coherent structures associated with individual Lagrangian flow trajectories even where only sparse particle trajectory data is available. The method, based on techniques in spectral graph theory, uses the Coherent Structure Coloring vector and associated eigenvectors to analyze the distance in higherdimensional eigenspace between a selected reference trajectory and other tracer trajectories in the flow. By analyzing this distance metric in a hierarchical clustering, the coherent structure of which the reference particle is a member can be identified. This algorithm is proven successful in identifying coherent structures of varying complexities in canonical unsteady flows. Additionally, the method is able to assess the relative coherence of the associated structure in comparison to the surrounding flow. Although the method is demonstrated here in the context of fluid flow kinematics, the generality of the approach allows for its potential application to other unsupervised clustering problems in dynamical systems such as neuronal activity, gene expression, or social networks.
 [22] arXiv:1708.05859 (crosslist from math.PR) [pdf, ps, other]

Title: Decomposition of meanfield Gibbs distributions into product measuresSubjects: Probability (math.PR); Mathematical Physics (mathph); Statistics Theory (math.ST)
We show that under a low complexity condition on the gradient of a Hamiltonian, Gibbs distributions on the Boolean hypercube are approximate mixtures of product measures whose probability vectors are critical points of an associated meanfield functional. This extends a previous work by the first author. As an application, we demonstrate how this framework helps characterize both Ising models satisfying a meanfield condition and the conditional distributions which arise in the emerging theory of nonlinear large deviations.
 [23] arXiv:1708.05866 (crosslist from cs.LG) [pdf, other]

Title: A Brief Survey of Deep Reinforcement LearningComments: To appear in IEEE Signal Processing Magazine, Special Issue on Deep Learning for Image UnderstandingSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of valuebased and policybased methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$network, trust region policy optimisation, and asynchronous advantage actorcritic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
 [24] arXiv:1708.05929 (crosslist from cs.LG) [pdf, other]

Title: XPACS: eXPlaining Anomalies by Characterizing SubspacesComments: 10 pages, 5 figures, 5 tablesSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Anomaly detection has numerous critical applications in finance, security, etc. and has been vastly studied. In this paper, we tap into a gap in the literature and consider a complementary problem: anomaly description. Interpretation of anomalies has important implications for decision makers, from being able to troubleshoot and prioritize their actions to making policy changes for prevention. We present a new method called XPACS which "reverseengineers" the known anomalies in a dataset by identifying a few anomalous patterns that they form along with the characterizing subspace of features that separates them from normal instances. From a descriptive data mining perspective, our solution has five key desired properties. It can unearth anomalous patterns (i) of multiple different types, (ii) hidden in arbitrary subspaces of a high dimensional space, (iii) interpretable by the endusers, (iv) succinct, providing the shortest data description, and finally (v) different from normal patterns of the data. There is no existing work on anomaly description that satisfy all of these desiderata simultaneously. While not our primary goal, anomalous patterns XPACS finds can further be seen as multiple, interpretable "signatures" and can be used for detection. We show the effectiveness of XPACS in explanation as well as detection tasks on 9 realworld datasets.
 [25] arXiv:1708.05978 (crosslist from cs.LG) [pdf, other]

Title: Stochastic PrimalDual Proximal ExtraGradient Descent for Compositely Regularized OptimizationSubjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider a wide range of regularized stochastic minimization problems with two regularization terms, one of which is composed with a linear function. This optimization model abstracts a number of important applications in artificial intelligence and machine learning, such as fused Lasso, fused logistic regression, and a class of graphguided regularized minimization. The computational challenges of this model are in two folds. On one hand, the closedform solution of the proximal mapping associated with the composed regularization term or the expected objective function is not available. On the other hand, the calculation of the full gradient of the expectation in the objective is very expensive when the number of input data samples is considerably large. To address these issues, we propose a stochastic variant of extragradient type methods, namely \textsf{Stochastic PrimalDual Proximal ExtraGradient descent (SPDPEG)}, and analyze its convergence property for both convex and strongly convex objectives. For general convex objectives, the uniformly average iterates generated by \textsf{SPDPEG} converge in expectation with $O(1/\sqrt{t})$ rate. While for strongly convex objectives, the uniformly and nonuniformly average iterates generated by \textsf{SPDPEG} converge with $O(\log(t)/t)$ and $O(1/t)$ rates, respectively. The order of the rate of the proposed algorithm is known to match the best convergence rate for firstorder stochastic algorithms. Experiments on fused logistic regression and graphguided regularized logistic regression problems show that the proposed algorithm performs very efficiently and consistently outperforms other competing algorithms.
 [26] arXiv:1708.06018 (crosslist from math.NA) [pdf, ps, other]

Title: Conversion of Mersenne Twister to doubleprecision floatingpoint numbersAuthors: Shin HaraseComments: 14 pagesSubjects: Numerical Analysis (math.NA); Numerical Analysis (cs.NA); Computation (stat.CO)
The 32bit Mersenne Twister generator MT19937 is a widely used random number generator. To generate numbers with more than 32 bits in bit length, and particularly when converting into 53bit doubleprecision floatingpoint numbers in $[0,1)$ in the IEEE 754 format, the typical implementation concatenates two successive 32bit integers and divides them by a power of $2$. In this case, the 32bit MT19937 is optimized in terms of its equidistribution properties (the socalled dimension of equidistribution with $v$bit accuracy) under the assumption that one will mainly be using 32bit output values, and hence the concatenation sometimes degrades the dimension of equidistribution compared with the simple use of 32bit outputs. In this paper, we analyze such phenomena by investigating hidden $\mathbb{F}_2$linear relations among the bits of highdimensional outputs. Accordingly, we report that MT19937 with a specific lag set fails several statistical tests, such as the overlapping collision test, matrix rank test, and Hamming independence test.
 [27] arXiv:1708.06020 (crosslist from cs.LG) [pdf, ps, other]

Title: Improving Deep Learning using Generic Data AugmentationSubjects: Learning (cs.LG); Machine Learning (stat.ML)
Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating the training set with label preserving transformations. Recently there has been extensive use of generic data augmentation to improve Convolutional Neural Network (CNN) task performance. This study benchmarks various popular data augmentation schemes to allow researchers to make informed decisions as to which training methods are most appropriate for their data sets. Various geometric and photometric schemes are evaluated on a coarsegrained data set using a relatively simple CNN. Experimental results, run using 4fold crossvalidation and reported in terms of Top1 and Top5 accuracy, indicate that cropping in geometric augmentation significantly increases CNN task performance.
 [28] arXiv:1708.06040 (crosslist from cs.AI) [pdf, other]

Title: Neural Block SamplingComments: 10 pagesSubjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
Efficient Monte Carlo inference often requires manual construction of modelspecific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no modelspecific training required. We explore several applications including openuniverse Gaussian mixture models, in which our learned proposals outperform a handtuned sampler, and a realworld named entity recognition task, in which our sampler's ability to escape local modes yields higher final F1 scores than singlesite Gibbs.
 [29] arXiv:1708.06243 (crosslist from cs.LG) [pdf]

Title: General Backpropagation Algorithm for Training Secondorder Neural NetworksComments: 5 pages, 7 figures, 19 referencesSubjects: Learning (cs.LG); Machine Learning (stat.ML)
The artificial neural network is a popular framework in machine learning. To empower individual neurons, we recently suggested that the current type of neurons could be upgraded to 2nd order counterparts, in which the linear operation between inputs to a neuron and the associated weights is replaced with a nonlinear quadratic operation. A single 2nd order neurons already has a strong nonlinear modeling ability, such as implementing basic fuzzy logic operations. In this paper, we develop a general backpropagation (BP) algorithm to train the network consisting of 2ndorder neurons. The numerical studies are performed to verify of the generalized BP algorithm.
 [30] arXiv:1708.06246 (crosslist from cs.AI) [pdf, other]

Title: Comparative Benchmarking of Causal Discovery TechniquesSubjects: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accuracy, and (c) accuracy of counterfactual inference. For (b) and (c) we train causal Bayesian networks with structures as predicted by each causal discovery technique to carry out counterfactual or standard predictive inference. We compare causal algorithms on two pub licly available and one simulated datasets having different sample sizes: small, medium and large. Experiments show that structural accuracy of a technique does not necessarily correlate with higher accuracy of inferencing tasks. Fur ther, surveyed structure learning algorithms do not perform well in terms of structural accuracy in case of datasets having large number of variables.
 [31] arXiv:1708.06250 (crosslist from cs.CV) [pdf, other]

Title: Pillar Networks++: Distributed nonparametric deep and wide networksComments: arXiv admin note: substantial text overlap with arXiv:1707.06923Subjects: Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Computation (stat.CO); Machine Learning (stat.ML)
In recent work, it was shown that combining multikernel based support vector machines (SVMs) can lead to near stateoftheart performance on an action recognition dataset (HMDB51 dataset). This was 0.4\% lower than frameworks that used handcrafted features in addition to the deep convolutional feature extractors. In the present work, we show that combining distributed Gaussian Processes with multistream deep convolutional neural networks (CNN) alleviate the need to augment a neural network with handcrafted features. In contrast to prior work, we treat each deep neural convolutional network as an expert wherein the individual predictions (and their respective uncertainties) are combined into a Product of Experts (PoE) framework.
Replacements for Tue, 22 Aug 17
 [32] arXiv:1509.02230 (replaced) [pdf, other]

Title: Properties of the Affine Invariant Ensemble Sampler in high dimensionsComments: 13 pages, 5 figuresSubjects: Computation (stat.CO); Data Analysis, Statistics and Probability (physics.dataan)
 [33] arXiv:1512.08191 (replaced) [pdf, ps, other]

Title: Estimation of KullbackLeibler losses for noisy recovery problems within the exponential familyAuthors: CharlesAlban Deledalle (IMB)Subjects: Applications (stat.AP); Methodology (stat.ME)
 [34] arXiv:1605.08285 (replaced) [pdf, other]

Title: Solving Systems of Random Quadratic Equations via Truncated Amplitude FlowComments: 37 Pages, 16 figuresSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Optimization and Control (math.OC)
 [35] arXiv:1605.08299 (replaced) [pdf, other]

Title: A General Family of Trimmed Estimators for Robust Highdimensional Data AnalysisComments: 39 pages, 6 figuresSubjects: Machine Learning (stat.ML)
 [36] arXiv:1607.02738 (replaced) [pdf, other]

Title: Magnetic Hamiltonian Monte CarloComments: 34th International Conference on Machine Learning (ICML 2017)Subjects: Machine Learning (stat.ML)
 [37] arXiv:1611.09588 (replaced) [pdf, other]

Title: Level sets and drift estimation for reflected Brownian motion with driftSubjects: Statistics Theory (math.ST)
 [38] arXiv:1611.10242 (replaced) [pdf, other]

Title: Likelihoodfree inference by ratio estimationSubjects: Machine Learning (stat.ML); Computation (stat.CO); Methodology (stat.ME)
 [39] arXiv:1701.04889 (replaced) [pdf, other]

Title: Efficient and Adaptive Linear Regression in SemiSupervised SettingsComments: 51 pages; Revised version  to appear in The Annals of StatisticsSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [40] arXiv:1701.05230 (replaced) [pdf, other]

Title: Surrogate Aided Unsupervised Recovery of Sparse Signals in Single Index Models for Binary OutcomesComments: 43 pages; Revised version with additional results and discussionsSubjects: Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [41] arXiv:1701.05654 (replaced) [pdf, other]

Title: Bayesian Network Learning via Topological OrderSubjects: Machine Learning (stat.ML); Data Structures and Algorithms (cs.DS)
 [42] arXiv:1702.02258 (replaced) [pdf, other]

Title: Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint DetectionsComments: accepted to ICCV 2017 (PeopleCap)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Machine Learning (stat.ML)
 [43] arXiv:1702.08435 (replaced) [pdf, other]

Title: Statistical Anomaly Detection via Composite Hypothesis Testing for Markov ModelsComments: Preprint submitted to the IEEE Transactions on Signal ProcessingSubjects: Systems and Control (cs.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [44] arXiv:1704.07050 (replaced) [pdf, other]

Title: Using Global Constraints and Reranking to Improve Cognates DetectionComments: 10 pages, 6 figures, 6 tables; published in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 19831992, Vancouver, Canada, July 2017Journalref: In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 19831992, Vancouver, Canada, July 2017. Association for Computational LinguisticsSubjects: Computation and Language (cs.CL); Learning (cs.LG); Machine Learning (stat.ML)
 [45] arXiv:1705.03938 (replaced) [pdf, other]

Title: A threedimensional statistical model for imaged microstructures of porous polymer filmsSubjects: Applications (stat.AP)
 [46] arXiv:1705.07120 (replaced) [pdf, other]

Title: VAE with a VampPriorComments: 16 pages, new results (two additional datasets) comparing to the previous version + the text was reorganized and rewrittenSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [47] arXiv:1705.08417 (replaced) [pdf, other]

Title: Reinforcement Learning with a Corrupted Reward ChannelComments: A shorter version of this report was accepted to IJCAI 2017 AI and Autonomy trackSubjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
 [48] arXiv:1706.09152 (replaced) [pdf, other]

Title: Generative Bridging Network in Neural Sequence PredictionComments: A submission for AAAI 2018Subjects: Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)
 [49] arXiv:1707.01227 (replaced) [pdf, other]

Title: Exponential random graphs behave like mixtures of stochastic block modelsSubjects: Probability (math.PR); Social and Information Networks (cs.SI); Mathematical Physics (mathph); Combinatorics (math.CO); Statistics Theory (math.ST)
 [50] arXiv:1707.03017 (replaced) [pdf, other]

Title: Learning Visual Reasoning Without Strong PriorsComments: This work was presented at ICML 2017's Machine Learning in Speech and Language Processing WorkshopSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [51] arXiv:1707.07716 (replaced) [pdf, other]

Title: Stochastic Gradient Descent for Relational Logistic Regression via Partial Network CrawlsComments: 7 pages, 3 figures, Proceedings of the Seventh International Workshop on Statistical Relational AI (StarAI 2017)Subjects: Machine Learning (stat.ML); Learning (cs.LG)
 [52] arXiv:1708.01383 (replaced) [pdf, other]

Title: Convergence of VarianceReduced Stochastic Learning under Random ReshufflingSubjects: Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [53] arXiv:1708.01666 (replaced) [pdf]

Title: An Effective Training Method For Deep Convolutional Neural NetworkSubjects: Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [54] arXiv:1708.03272 (replaced) [pdf, other]

Title: Fast and accurate Bayesian model criticism and conflict diagnostics using RINLASubjects: Methodology (stat.ME); Statistics Theory (math.ST); Computation (stat.CO)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)