We gratefully acknowledge support from
the Simons Foundation and member institutions.

Statistics

New submissions

[ total of 131 entries: 1-131 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 26 Jan 21

[1]  arXiv:2101.09297 [pdf, other]
Title: Addressing Spatially Structured Interference in Causal Analysis Using Propensity Scores
Comments: 37 pages, 7 figures, being submitted
Subjects: Applications (stat.AP)

Environmental epidemiologists are increasingly interested in establishing causality between exposures and health outcomes. A popular model for causal inference is the Rubin Causal Model (RCM), which typically seeks to estimate the average difference in study units' potential outcomes. An important assumption under RCM is no interference; that is, the potential outcomes of one unit are not affected by the exposure status of other units. The no interference assumption is violated if we expect spillover or diffusion of exposure effects based on units' proximity to other units and several other causal estimands arise. Air pollution epidemiology typically violates this assumption when we expect upwind events to affect downwind or nearby locations. This paper adapts causal assumptions from social network research to address interference and allow estimation of both direct and spillover causal effects. We use propensity score-based methods to estimate these effects when considering the effects of the Environmental Protection Agency's 2005 nonattainment designations for particulate matter with aerodynamic diameter less than 2.5 micrograms per cubic meter (PM2.5) on lung cancer incidence using county-level data obtained from the Surveillance, Epidemiology, and End Results (SEER) Program. We compare these methods in a rigorous simulation study that considers both spatially autocorrelated variables, interference, and missing confounders. We find that pruning and matching based on the propensity score produces the highest probability coverage of the true causal effects and lower mean squared error. When applied to the research question, we found protective direct and spillover causal effects.

[2]  arXiv:2101.09304 [pdf, other]
Title: Revisiting Identifying Assumptions for Population Size Estimation
Comments: 48 pages. The material presented in Appendix A previously appeared in an unpublished preprint written by the first author: arXiv:2008.09865
Subjects: Methodology (stat.ME)

The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data problem, where the number of unobserved individuals represents the missing data. As with any missing data problem, multiple-systems estimation requires users to make an untestable identifying assumption in order to estimate the population size from the observed data. Approaches to multiple-systems estimation often do not emphasize the role of the identifying assumption during model specification, which makes it difficult to decouple the specification of the model for the observed data from the identifying assumption. We present a re-framing of the multiple-systems estimation problem that decouples the specification of the observed-data model from the identifying assumptions, and discuss how log-linear models and the associated no-highest-order interaction assumption fit into this framing. We present an approach to computation in the Bayesian setting which takes advantage of existing software and facilitates various sensitivity analyses. We demonstrate our approach in a case study of estimating the number of civilian casualties in the Kosovo war. Code used to produce this manuscript is available at https://github.com/aleshing/revisiting-identifying-assumptions.

[3]  arXiv:2101.09315 [pdf, ps, other]
Title: Tighter expected generalization error bounds via Wasserstein distance
Comments: 22 pages: 12 of the main text, 2 of references, and 8 of appendices
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

In this work, we introduce several expected generalization error bounds based on the Wasserstein distance. More precisely, we present full-dataset, single-letter, and random-subset bounds on both the standard setting and the randomized-subsample setting from Steinke and Zakynthinou [2020]. Moreover, we show that, when the loss function is bounded, these bounds recover from below (and thus are tighter than) current bounds based on the relative entropy and, for the standard setting, generate new, non-vacuous bounds also based on the relative entropy. Then, we show how similar bounds featuring the backward channel can be derived with the proposed proof techniques. Finally, we show how various new bounds based on different information measures (e.g., the lautum information or several $f$-divergences) can be derived from the presented bounds.

[4]  arXiv:2101.09418 [pdf, other]
Title: A Geospatial Functional Model For OCO-2 Data with Application on Imputation and Land Fraction Estimation
Subjects: Applications (stat.AP)

Data from NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite is essential to many carbon management strategies. A retrieval algorithm is used to estimate CO2 concentration using the radiance data measured by OCO-2. However, due to factors such as cloud cover and cosmic rays, the spatial coverage of the retrieval algorithm is limited in some areas of critical importance for carbon cycle science. Mixed land/water pixels along the coastline are also not used in the retrieval processing due to the lack of valid ancillary variables including land fraction. We propose an approach to model spatial spectral data to solve these two problems by radiance imputation and land fraction estimation. The spectral observations are modeled as spatially indexed functional data with footprint-specific parameters and are reduced to much lower dimensions by functional principal component analysis. The principal component scores are modeled as random fields to account for the spatial dependence, and the missing spectral observations are imputed by kriging the principal component scores. The proposed method is shown to impute spectral radiance with high accuracy for observations over the Pacific Ocean. An unmixing approach based on this model provides much more accurate land fraction estimates in our validation study along Greece coastlines.

[5]  arXiv:2101.09424 [pdf, other]
Title: A Change-Point Based Control Chart for Detecting Sparse Changes in High-Dimensional Heteroscedastic Data
Subjects: Methodology (stat.ME)

Because of the curse-of-dimensionality, high-dimensional processes present challenges to traditional multivariate statistical process monitoring (SPM) techniques. In addition, the unknown underlying distribution and complicated dependency among variables such as heteroscedasticity increase uncertainty of estimated parameters, and decrease the effectiveness of control charts. In addition, the requirement of sufficient reference samples limits the application of traditional charts in high dimension low sample size scenarios (small n, large p). More difficulties appear in detecting and diagnosing abnormal behaviors that are caused by a small set of variables, i.e., sparse changes. In this article, we propose a changepoint based control chart to detect sparse shifts in the mean vector of high-dimensional heteroscedastic processes. Our proposed method can start monitoring when the number of observations is a lot smaller than the dimensionality. The simulation results show its robustness to nonnormality and heteroscedasticity. A real data example is used to illustrate the effectiveness of the proposed control chart in high-dimensional applications. Supplementary material and code are provided online.

[6]  arXiv:2101.09514 [pdf, ps, other]
Title: Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables
Subjects: Computation (stat.CO)

We aim to estimate the probability that the sum of nonnegative independent and identically distributed random variables falls below a given threshold, i.e., $\mathbb{P}(\sum_{i=1}^{N}{X_i} \leq \gamma)$, via importance sampling (IS). We are particularly interested in the rare event regime when $N$ is large and/or $\gamma$ is small. The exponential twisting is a popular technique that, in most of the cases, compares favorably to existing estimators. However, it has several limitations: i) it assumes the knowledge of the moment generating function of $X_i$ and ii) sampling under the new measure is not straightforward and might be expensive. The aim of this work is to propose an alternative change of measure that yields, in the rare event regime corresponding to large $N$ and/or small $\gamma$, at least the same performance as the exponential twisting technique and, at the same time, does not introduce serious limitations. For distributions whose probability density functions (PDFs) are $\mathcal{O}(x^{d})$, as $x \rightarrow 0$ and $d>-1$, we prove that the Gamma IS PDF with appropriately chosen parameters retrieves asymptotically, in the rare event regime, the same performance of the estimator based on the use of the exponential twisting technique. Moreover, in the Log-normal setting, where the PDF at zero vanishes faster than any polynomial, we numerically show that a Gamma IS PDF with optimized parameters clearly outperforms the exponential twisting change of measure. Numerical experiments validate the efficiency of the proposed estimator in delivering a highly accurate estimate in the regime of large $N$ and/or small $\gamma$.

[7]  arXiv:2101.09558 [pdf, other]
Title: The Gauss Hypergeometric Covariance Kernel for Modeling Second-Order Stationary Random Fields in Euclidean Spaces: its Compact Support, Properties and Spectral Representation
Comments: 22 pages
Subjects: Statistics Theory (math.ST)

This paper presents a parametric family of compactly-supported positive semidefinite kernels aimed to model the covariance structure of second-order stationary isotropic random fields defined in the $d$-dimensional Euclidean space. Both the covariance and its spectral density have an analytic expression involving the hypergeometric functions ${}_2F_1$ and ${}_1F_2$, respectively, and four real-valued parameters related to the correlation range, smoothness and shape of the covariance. The presented hypergeometric kernel family contains, as special cases, the spherical, cubic, penta, Askey, generalized Wendland and truncated power covariances and, as asymptotic cases, the Mat\'ern, Laguerre, Tricomi, incomplete gamma and Gaussian covariances, among others. The parameter space of the univariate hypergeometric kernel is identified and its functional properties -- continuity, smoothness, transitive upscaling (mont\'ee) and downscaling (descente) -- are examined. Several sets of sufficient conditions are also derived to obtain valid stationary bivariate and multivariate covariance kernels, characterized by four matrix-valued parameters. Such kernels turn out to be versatile, insofar as the direct and cross-covariances do not necessarily have the same shapes, correlation ranges or behaviors at short scale, thus associated with vector random fields whose components are cross-correlated but have different spatial structures.

[8]  arXiv:2101.09587 [pdf, other]
Title: Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer
Subjects: Applications (stat.AP); Methodology (stat.ME)

Graphical models are commonly used to discover associations within gene or protein networks for complex diseases such as cancer. Most existing methods estimate a single graph for a population, while in many cases, researchers are interested in characterizing the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the network varies with patient-specific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subject-level covariates. Bayesian shrinkage algorithms are used to induce sparsity in the underlying graphical models. We assess our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity and a case study assessing how blood protein networks in hepatocellular carcinoma patients vary with severity of disease, measured by HepatoScore, a novel biomarker signature measuring disease severity.

[9]  arXiv:2101.09596 [pdf]
Title: The Role of Distributional Overlap on the Precision Gain of Bounds for Generalization
Authors: Wendy Chan
Subjects: Methodology (stat.ME)

Over the past ten years, propensity score methods have made an important contribution to improving generalizations from studies that do not select samples randomly from a population of inference. However, these methods require assumptions and recent work has considered the role of bounding approaches that provide a range of treatment impact estimates that are consistent with the observable data. An important limitation to bound estimates is that they can be uninformatively wide. This has motivated research on the use of propensity score stratification to narrow bounds. This article assesses the role of distributional overlap in propensity scores on the effectiveness of stratification to tighten bounds. Using the results of two simulation studies and two case studies, I evaluate the relationship between distributional overlap and precision gain and discuss the implications when propensity score stratification is used as a method to improve precision in the bounding framework.

[10]  arXiv:2101.09604 [pdf, other]
Title: UltraNest -- a robust, general purpose Bayesian inference engine
Authors: Johannes Buchner
Comments: Longer version of the paper submitted to JOSS. UltraNest can be found at this https URL
Subjects: Computation (stat.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM)

UltraNest is a general-purpose Bayesian inference package for parameter estimation and model comparison. It allows fitting arbitrary models specified as likelihood functions written in Python, C, C++, Fortran, Julia or R. With a focus on correctness and speed (in that order), UltraNest is especially useful for multi-modal or non-Gaussian parameter spaces, computational expensive models, in robust pipelines. Parallelisation to computing clusters and resuming incomplete runs is available.

[11]  arXiv:2101.09605 [pdf, other]
Title: Local linear tie-breaker designs
Subjects: Methodology (stat.ME); Econometrics (econ.EM); Statistics Theory (math.ST)

Tie-breaker experimental designs are hybrids of Randomized Control Trials (RCTs) and Regression Discontinuity Designs (RDDs) in which subjects with moderate scores are placed in an RCT while subjects with extreme scores are deterministically assigned to the treatment or control group. The design maintains the benefits of randomization for causal estimation while avoiding the possibility of excluding the most deserving recipients from the treatment group. The causal effect estimator for a tie-breaker design can be estimated by fitting local linear regressions for both the treatment and control group, as is typically done for RDDs. We study the statistical efficiency of such local linear regression-based causal estimators as a function of $\Delta$, the radius of the interval in which treatment randomization occurs. In particular, we determine the efficiency of the estimator as a function of $\Delta$ for a fixed, arbitrary bandwidth under the assumption of a uniform assignment variable. To generalize beyond uniform assignment variables and asymptotic regimes, we also demonstrate on the Angrist and Lavy (1999) classroom size dataset that prior to conducting an experiment, an experimental designer can estimate the efficiency for various experimental radii choices by using Monte Carlo as long as they have access to the distribution of the assignment variable. For both uniform and triangular kernels, we show that increasing the radius of randomized experiment interval will increase the efficiency until the radius is the size of the local-linear regression bandwidth, after which no additional efficiency benefits are conferred.

[12]  arXiv:2101.09675 [pdf, other]
Title: Nested Sampling Methods
Authors: Johannes Buchner
Comments: Comments are welcome. The open-source UltraNest package and astrostatistics tutorials can be found at this https URL
Subjects: Computation (stat.CO); Instrumentation and Methods for Astrophysics (astro-ph.IM)

Nested sampling (NS) computes parameter posterior distributions and makes Bayesian model comparison computationally feasible. Its strengths are the unsupervised navigation of complex, potentially multi-modal posteriors until a well-defined termination point. A systematic literature review of nested sampling algorithms and variants is presented. We focus on complete algorithms, including solutions to likelihood-restricted prior sampling. A new formulation of NS is presented, which casts the parameter space exploration as a search on a tree. Previously published ways of obtaining robust error estimates and dynamic variations of the number of live points are presented as special cases of this formulation.

[13]  arXiv:2101.09711 [pdf, other]
Title: Testing for subsphericity when $n$ and $p$ are of different asymptotic order
Authors: Joni Virta
Comments: 12 pages, 1 figure
Subjects: Statistics Theory (math.ST)

In this short note, we extend a classical test of subsphericity, based on the first two moments of the eigenvalues of the sample covariance matrix, to the high-dimensional regime where the signal eigenvalues of the covariance matrix diverge to infinity and either $p/n \rightarrow 0$ or $p/n \rightarrow \infty$. In the latter case we further require that the divergence of the eigenvalues is suitably fast in a specific sense. Our work can be seen to complement that of Schott (2006) who established equivalent results for the case $p/n \rightarrow \gamma \in (0, \infty)$. Simulations are used to demonstrate the results, providing also evidence that the test might be further extendable to a wider asymptotic regime.

[14]  arXiv:2101.09747 [pdf, ps, other]
Title: Numerical issues in maximum likelihood parameter estimation for Gaussian process regression
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)

This article focuses on numerical issues in maximum likelihood parameter estimation for Gaussian process regression (GPR). This article investigates the origin of the numerical issues and provides simple but effective improvement strategies. This work targets a basic problem but a host of studies, particularly in the literature of Bayesian optimization, rely on off-the-shelf GPR implementations. For the conclusions of these studies to be reliable and reproducible, robust GPR implementations are critical.

[15]  arXiv:2101.09756 [pdf, other]
Title: Entropy Partial Transport with Tree Metrics: Theory and Practice
Authors: Tam Le, Truyen Nguyen
Comments: To appear in AISTATS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Optimal transport (OT) theory provides powerful tools to compare probability measures. However, OT is limited to nonnegative measures having the same mass, and suffers serious drawbacks about its computation and statistics. This leads to several proposals of regularized variants of OT in the recent literature. In this work, we consider an \textit{entropy partial transport} (EPT) problem for nonnegative measures on a tree having different masses. The EPT is shown to be equivalent to a standard complete OT problem on a one-node extended tree. We derive its dual formulation, then leverage this to propose a novel regularization for EPT which admits fast computation and negative definiteness. To our knowledge, the proposed regularized EPT is the first approach that yields a \textit{closed-form} solution among available variants of unbalanced OT. For practical applications without priori knowledge about the tree structure for measures, we propose tree-sliced variants of the regularized EPT, computed by averaging the regularized EPT between these measures using random tree metrics, built adaptively from support data points. Exploiting the negative definiteness of our regularized EPT, we introduce a positive definite kernel, and evaluate it against other baselines on benchmark tasks such as document classification with word embedding and topological data analysis. In addition, we empirically demonstrate that our regularization also provides effective approximations.

[16]  arXiv:2101.09809 [pdf, other]
Title: NeurT-FDR: Controlling FDR by Incorporating Feature Hierarchy
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex large-scale problems, where hierarchical information often exists among those test-level covariates. We propose NeurT-FDR which boosts statistical power and controls FDR for multiple hypothesis testing while leveraging the hierarchy among test-level covariates. Our method parametrizes the test-level covariates as a neural network and adjusts the feature hierarchy through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets compared to competitive baselines.

[17]  arXiv:2101.09855 [pdf, other]
Title: Diffusion Asymptotics for Sequential Experiments
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG)

We propose a new diffusion-asymptotic analysis for sequentially randomized experiments. Rather than taking sample size $n$ to infinity while keeping the problem parameters fixed, we let the mean signal level scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ gets large. In this regime, we show that the behavior of a class of methods for sequential experimentation converges to a diffusion limit. This connection enables us to make sharp performance predictions and obtain new insights on the behavior of Thompson sampling. Our diffusion asymptotics also help resolve a discrepancy between the $\Theta(\log(n))$ regret predicted by the fixed-parameter, large-sample asymptotics on the one hand, and the $\Theta(\sqrt{n})$ regret from worst-case, finite-sample analysis on the other, suggesting that it is an appropriate asymptotic regime for understanding practical large-scale sequential experiments.

[18]  arXiv:2101.09875 [pdf, other]
Title: Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation
Authors: Xiuyuan Cheng, Nan Wu
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)

This work studies the spectral convergence of graph Laplacian to the Laplace-Beltrami operator when the graph affinity matrix is constructed from $N$ random samples on a $d$-dimensional manifold embedded in a possibly high dimensional space. By analyzing Dirichlet form convergence and constructing candidate approximate eigenfunctions via convolution with manifold heat kernel, we prove that, with Gaussian kernel, one can set the kernel bandwidth parameter $\epsilon \sim (\log N/ N)^{1/(d/2+2)}$ such that the eigenvalue convergence rate is $N^{-1/(d/2+2)}$ and the eigenvector convergence in 2-norm has rate $N^{-1/(d+4)}$; When $\epsilon \sim N^{-1/(d/2+3)}$, both eigenvalue and eigenvector rates are $N^{-1/(d/2+3)}$. These rates are up to a $\log N$ factor and proved for finitely many low-lying eigenvalues. The result holds for un-normalized and random-walk graph Laplacians when data are uniformly sampled on the manifold, as well as the density-corrected graph Laplacian (where the affinity matrix is normalized by the degree matrix from both sides) with non-uniformly sampled data. As an intermediate result, we prove new point-wise and Dirichlet form convergence rates for the density-corrected graph Laplacian. Numerical results are provided to verify the theory.

[19]  arXiv:2101.09942 [pdf, other]
Title: An Environmentally-Adaptive Hawkes Process with An Application to COVID-19
Subjects: Applications (stat.AP)

We proposed a new generalized model based on the classical Hawkes process with environmental multipliers, which is called an environmentally-adaptive Hawkes (EAH) model. Compared to the classical self-exciting Hawkes process, the EAH model exhibits more flexibility in a macro environmentally temporal sense, and can model more complex processes by using dynamic branching matrix. We demonstrate the well-definedness of this EAH model. A more specified version of this new model is applied to model COVID-19 pandemic data through an efficient EM-like algorithm. Consequently, the proposed model consistently outperforms the classical Hawkes process.

[20]  arXiv:2101.10005 [pdf, ps, other]
Title: Low incidence rate of COVID-19 undermines confidence in estimation of the vaccine efficacy
Authors: Yasin Memari
Subjects: Methodology (stat.ME); Applications (stat.AP)

Knowing the true effect size of clinical interventions in randomised clinical trials is key to informing the public health policies. Vaccine efficacy is defined in terms of the ratio of two risks, however only approximate methods are available for the variance of the 'risk ratio'. In this article, we show using a probabilistic model that uncertainty in the efficacy rate could be underestimated when the disease risk is low. Factoring in the baseline rate of the disease we estimate broader confidence intervals for the efficacy rates of the vaccines recently developed for COVID-19. We propose a new method for calculating the sample size in case-control studies where the efficacy is of interest. We further discuss the deleterious effects of classification bias which is particularly relevant at low disease prevalence.

[21]  arXiv:2101.10058 [pdf, other]
Title: The EM Perspective of Directional Mean Shift Algorithm
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

The directional mean shift (DMS) algorithm is a nonparametric method for pursuing local modes of densities defined by kernel density estimators on the unit hypersphere. In this paper, we show that any DMS iteration can be viewed as a generalized Expectation-Maximization (EM) algorithm; in particular, when the von Mises kernel is applied, it becomes an exact EM algorithm. Under the (generalized) EM framework, we provide a new proof for the ascending property of density estimates and demonstrate the global convergence of directional mean shift sequences. Finally, we give a new insight into the linear convergence of the DMS algorithm.

[22]  arXiv:2101.10103 [pdf, other]
Title: sensobol: an R package to compute variance-based sensitivity indices
Subjects: Computation (stat.CO); Applications (stat.AP)

The R package "sensobol" provides several functions to conduct variance-based uncertainty and sensitivity analysis, from the estimation of sensitivity indices to the visual representation of the results. It implements several state-of-the-art first and total-order estimators and allows the computation of up to third-order effects, as well as of the approximation error, in a swift and user-friendly way. Its flexibility makes it also appropriate for models with either a scalar or a multivariate output. We illustrate its functionality by conducting a variance-based sensitivity analysis of three classic models: the Sobol' (1998) G function, the logistic population growth model of Verhulst (1845), and the spruce budworm and forest model of Ludwig, Jones and Holling (1976).

Cross-lists for Tue, 26 Jan 21

[23]  arXiv:2101.09351 (cross-list from physics.ao-ph) [pdf]
Title: Hourly evolution of intra-urban temperature variability across the local climate zones. The case of Madrid
Comments: 7 figures, 8 tables, 1 appendix
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Data Analysis, Statistics and Probability (physics.data-an); Applications (stat.AP)

Field measurement campaigns have grown exponentially in recent years, stemming from the need for reliable data to validate urban climate models and obtain a better understanding of urban climate features. Also contributing to this growth is the Local Climate Zone (LCZ) scheme, firstly developed to enhance the accuracy in the contextualisation of urban measurements, and lately used for characterising urban areas. Due to its relative novelty, researchers are still investigating the potential of LCZs and its indicators for urban temperature variability detection. In this respect, the present study introduces the results of an extensive monitoring campaign carried out in the city of Madrid over a two-year period (2016-2018). The aim of this work is to further examine the relationships between LCZs and air temperature differences, with emphasis on their hourly and seasonal evolution. A graphical and statistical analysis to identify temperature variability trends for each LCZ is performed. Results support the existing evidence suggesting a high level of effectiveness in capturing the heat island (UHI) profile of different urban areas, while underperforming when it comes to capturing diurnal temperature variability. The incorporation of indicators that explain the daytime temperature variation phenomenon into the LCZ scheme is therefore recommended, warranting further research.

[24]  arXiv:2101.09352 (cross-list from q-bio.NC) [pdf, other]
Title: Conex-Connect: Learning Patterns in Extremal Brain Connectivity From Multi-Channel EEG Data
Subjects: Neurons and Cognition (q-bio.NC); Applications (stat.AP); Methodology (stat.ME)

Epilepsy is a chronic neurological disorder affecting more than 50 million people globally. An epileptic seizure acts like a temporary shock to the neuronal system, disrupting normal electrical activity in the brain. Epilepsy is frequently diagnosed with electroencephalograms (EEGs). Current methods study the time-varying spectra and coherence but do not directly model changes in extreme behavior. Thus, we propose a new approach to characterize brain connectivity based on the joint tail behavior of the EEGs. Our proposed method, the conditional extremal dependence for brain connectivity (Conex-Connect), is a pioneering approach that links the association between extreme values of higher oscillations at a reference channel with the other brain network channels. Using the Conex-Connect method, we discover changes in the extremal dependence driven by the activity at the foci of the epileptic seizure. Our model-based approach reveals that, pre-seizure, the dependence is notably stable for all channels when conditioning on extreme values of the focal seizure area. Post-seizure, by contrast, the dependence between channels is weaker, and dependence patterns are more "chaotic". Moreover, in terms of spectral decomposition, we find that high values of the high-frequency Gamma-band are the most relevant features to explain the conditional extremal dependence of brain connectivity.

[25]  arXiv:2101.09382 (cross-list from math.OC) [pdf, other]
Title: A measure of the importance of roads based on topography and traffic intensity
Comments: 35 pages, 7 figures
Subjects: Optimization and Control (math.OC); Graphics (cs.GR); Applications (stat.AP)

Mathematical models of street traffic allowing assessment of the importance of their individual segments for the functionality of the street system is considering. Based on methods of cooperative games and the reliability theory the suitable measure is constructed. The main goal is to analyze methods for assessing the importance (rank) of road fragments, including their functions. A relevance of these elements for effective accessibility for the entire system will be considered.

[26]  arXiv:2101.09394 (cross-list from econ.EM) [pdf, other]
Title: Predicting Recession Probabilities Using Term Spreads: New Evidence from a Machine Learning Approach
Subjects: Econometrics (econ.EM); Machine Learning (stat.ML)

The literature on using yield curves to forecast recessions typically measures the term spread as the difference between the 10-year and the three-month Treasury rates. Furthermore, using the term spread constrains the long- and short-term interest rates to have the same absolute effect on the recession probability. In this study, we adopt a machine learning method to investigate whether the predictive ability of interest rates can be improved. The machine learning algorithm identifies the best maturity pair, separating the effects of interest rates from those of the term spread. Our comprehensive empirical exercise shows that, despite the likelihood gain, the machine learning approach does not significantly improve the predictive accuracy, owing to the estimation error. Our finding supports the conventional use of the 10-year--three-month Treasury yield spread. This is robust to the forecasting horizon, control variable, sample period, and oversampling of the recession observations.

[27]  arXiv:2101.09395 (cross-list from q-fin.ST) [pdf, other]
Title: Unraveling S&P500 stock volatility and networks -- An encoding and decoding approach
Subjects: Statistical Finance (q-fin.ST); Applications (stat.AP)

We extend the Hierarchical Factor Segmentation(HFS) algorithm for discovering multiple volatility states process hidden within each individual S&P500 stock's return time series. Then we develop an associative measure to link stocks into directed networks of various scales of associations. Such networks shed lights on which stocks would likely stimulate or even promote, if not cause, volatility on other linked stocks. Our computing endeavors starting from encoding events of large return on the original time axis to transform the original return time series into a recurrence-time process on discrete-time-axis. By adopting BIC and clustering analysis, we identify potential multiple volatility states, and then apply the extended HFS algorithm on the recurrence time series to discover its underlying volatility state process. Our decoding approach is found favorably compared with Viterbi's in experiments involving both light and heavy tail distributions. After recovering the volatility state process back to the original time-axis, we decode and represent stock dynamics of each stock. Our measurement of association is measured through overlapping concurrent volatility states upon a chosen window. Consequently, we establish data-driven associative networks for S&P500 stocks to discover their global dependency relational groupings with respect to various strengths of links.

[28]  arXiv:2101.09398 (cross-list from econ.EM) [pdf, other]
Title: A Design-Based Perspective on Synthetic Control Methods
Subjects: Econometrics (econ.EM); Methodology (stat.ME)

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s), e.g., random selection as guaranteed in a randomized experiment. We show that SC methods offer benefits even in settings with randomized assignment, and that the design perspective offers new insights into SC methods for observational data. A first insight is that the standard SC estimator is not unbiased under random assignment. We propose a simple modification of the SC estimator that guarantees unbiasedness in this setting and derive its exact, randomization-based, finite sample variance. We also propose an unbiased estimator for this variance. We show in settings with real data that under random assignment this Modified Unbiased Synthetic Control (MUSC) estimator can have a root mean-squared error (RMSE) that is substantially lower than that of the difference-in-means estimator. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected. The improvement is most likely to be substantial if the number of pre-treatment periods is large relative to the number of control units.

[29]  arXiv:2101.09436 (cross-list from cs.LG) [pdf, other]
Title: Hierarchical Domain Invariant Variational Auto-Encoding with weak domain supervision
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

We address the task of domain generalization, where the goal is to train a predictive model based on a number of domains such that it is able to generalize to a new, previously unseen domain. We choose a generative approach within the framework of variational autoencoders and propose a weakly supervised algorithm that is able to account for incomplete and hierarchical domain information. We show that our method is able to learn representations that disentangle domain-specific information from class-label specific information even in complex settings where an unobserved substructure is present in domains. Our interpretable method outperforms previously proposed generative algorithms for domain generalization and achieves competitive performance compared to state-of-the-art approaches, which are based on complex image-processing steps, on the standard domain generalization benchmark dataset PACS.

[30]  arXiv:2101.09438 (cross-list from cs.LG) [pdf, other]
Title: An Optimal Reduction of TV-Denoising to Adaptive Online Learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We consider the problem of estimating a function from $n$ noisy samples whose discrete Total Variation (TV) is bounded by $C_n$. We reveal a deep connection to the seemingly disparate problem of Strongly Adaptive online learning (Daniely et.al, 2015) and provide an $O(n \log n)$ time algorithm that attains the near minimax optimal rate of $\tilde O (n^{1/3}C_n^{2/3})$ under squared error loss. The resulting algorithm runs online and optimally adapts to the unknown smoothness parameter $C_n$. This leads to a new and more versatile alternative to wavelets-based methods for (1) adaptively estimating TV bounded functions; (2) online forecasting of TV bounded trends in time series.

[31]  arXiv:2101.09446 (cross-list from cs.LG) [pdf, other]
Title: Unlabeled Principal Component Analysis
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutation-invariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the ground-truth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a two-stage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. Stage-I of this pipeline employs robust-PCA methods to estimate the ground-truth column-space. Equipped with the column-space, stage-II applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended Yale-B database with arbitrarily permuted patches of arbitrary size in $0.3$ seconds on a standard desktop computer.

[32]  arXiv:2101.09460 (cross-list from cs.LG) [pdf, other]
Title: Feature Selection Using Reinforcement Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

With the decreasing cost of data collection, the space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially. Therefore, identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model. In addition, identifying such features is critical for interpretability, prediction accuracy and optimal computation cost. While statistical methods such as subset selection, shrinkage, dimensionality reduction have been applied in selecting the best set of features, some other approaches in literature have approached feature selection task as a search problem where each state in the search space is a possible feature subset. In this paper, we solved the feature selection problem using Reinforcement Learning. Formulating the state space as a Markov Decision Process (MDP), we used Temporal Difference (TD) algorithm to select the best subset of features. Each state was evaluated using a robust and low cost classifier algorithm which could handle any non-linearities in the dataset.

[33]  arXiv:2101.09512 (cross-list from cs.LG) [pdf, other]
Title: Unsupervised clustering of series using dynamic programming
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We are interested in clustering parts of a given single multi-variate series in an unsupervised manner. We would like to segment and cluster the series such that the resulting blocks present in each cluster are coherent with respect to a known model (e.g. physics model). Data points are said to be coherent if they can be described using this model with the same parameters. We have designed an algorithm based on dynamic programming with constraints on the number of clusters, the number of transitions as well as the minimal size of a block such that the clusters are coherent with this process. We present an use-case: clustering of petrophysical series using the Waxman-Smits equation.

[34]  arXiv:2101.09577 (cross-list from cs.LG) [pdf, other]
Title: ReliefE: Feature Ranking in High-dimensional Spaces via Manifold Embeddings
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Feature ranking has been widely adopted in machine learning applications such as high-throughput biology and social sciences. The approaches of the popular Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances. Despite their high utility, these algorithms can be computationally expensive and not-well suited for high-dimensional sparse input spaces. In contrast, recent embedding-based methods learn compact, low-dimensional representations, potentially facilitating down-stream learning capabilities of conventional learners. This paper explores how the Relief branch of algorithms can be adapted to benefit from (Riemannian) manifold-based embeddings of instance and target spaces, where a given embedding's dimensionality is intrinsic to the dimensionality of the considered data set. The developed ReliefE algorithm is faster and can result in better feature rankings, as shown by our evaluation on 20 real-life data sets for multi-class and multi-label classification tasks. The utility of ReliefE for high-dimensional data sets is ensured by its implementation that utilizes sparse matrix algebraic operations. Finally, the relation of ReliefE to other ranking algorithms is studied via the Fuzzy Jaccard Index.

[35]  arXiv:2101.09611 (cross-list from cs.SI) [pdf, other]
Title: Hypergraph clustering: from blockmodels to modularity
Comments: 25 pages + 5 pages of supplementary information, 3 tables, 4 figures
Subjects: Social and Information Networks (cs.SI); Discrete Mathematics (cs.DM); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)

Hypergraphs are a natural modeling paradigm for a wide range of complex relational systems with multibody interactions. A standard analysis task is to identify clusters of closely related or densely interconnected nodes. While many probabilistic generative models for graph clustering have been proposed, there are relatively few such models for hypergraphs. We propose a Poisson degree-corrected hypergraph stochastic blockmodel (DCHSBM), an expressive generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Maximum-likelihood inference in the DCHSBM naturally leads to a clustering objective that generalizes the popular modularity objective for graphs. We derive a general Louvain-type algorithm for this objective, as well as a a faster, specialized "All-Or-Nothing" (AON) variant in which edges are expected to lie fully within clusters. This special case encompasses a recent proposal for modularity in hypergraphs, while also incorporating flexible resolution and edge-size parameters. We show that hypergraph Louvain is highly scalable, including as an example an experiment on a synthetic hypergraph of one million nodes. We also demonstrate through synthetic experiments that the detectability regimes for hypergraph community detection differ from methods based on dyadic graph projections. In particular, there are regimes in which hypergraph methods can recover planted partitions even though graph based methods necessarily fail due to information-theoretic limits. We use our model to analyze different patterns of higher-order structure in school contact networks, U.S. congressional bill cosponsorship, U.S. congressional committees, product categories in co-purchasing behavior, and hotel locations from web browsing sessions, that it is able to recover ground truth clusters in empirical data sets exhibiting the corresponding higher-order structure.

[36]  arXiv:2101.09612 (cross-list from cs.LG) [pdf, ps, other]
Title: On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
Authors: Quynh Nguyen
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper studies the global convergence of gradient descent for deep ReLU networks under the square loss. For this setting, the current state-of-the-art results show that gradient descent converges to a global optimum if the widths of all the hidden layers scale at least as $\Omega(N^8)$ ($N$ being the number of training samples). In this paper, we discuss a simple proof framework which allows us to improve the existing over-parameterization condition to linear, quadratic and cubic widths (depending on the type of initialization scheme and/or the depth of the network).

[37]  arXiv:2101.09645 (cross-list from cs.LG) [pdf, other]
Title: Multi-Task Time Series Forecasting With Shared Attention
Comments: Accepted by ICDMW 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Time series forecasting is a key component in many industrial and business decision processes and recurrent neural network (RNN) based models have achieved impressive progress on various time series forecasting tasks. However, most of the existing methods focus on single-task forecasting problems by learning separately based on limited supervised objectives, which often suffer from insufficient training instances. As the Transformer architecture and other attention-based models have demonstrated its great capability of capturing long term dependency, we propose two self-attention based sharing schemes for multi-task time series forecasting which can train jointly across multiple tasks. We augment a sequence of paralleled Transformer encoders with an external public multi-head attention function, which is updated by all data of all tasks. Experiments on a number of real-world multi-task time series forecasting tasks show that our proposed architectures can not only outperform the state-of-the-art single-task forecasting baselines but also outperform the RNN-based multi-task forecasting method.

[38]  arXiv:2101.09682 (cross-list from q-fin.PR) [pdf, ps, other]
Title: Solving optimal stopping problems with Deep Q-Learning
Comments: 17 pages
Subjects: Pricing of Securities (q-fin.PR); Machine Learning (stat.ML)

We propose a reinforcement learning (RL) approach to model optimal exercise strategies for option-type products. We pursue the RL avenue in order to learn the optimal action-value function of the underlying stopping problem. In addition to retrieving the optimal Q-function at any time step, one can also price the contract at inception. We first discuss the standard setting with one exercise right, and later extend this framework to the case of multiple stopping opportunities in the presence of constraints. We propose to approximate the Q-function with a deep neural network, which does not require the specification of basis functions as in the least-squares Monte Carlo framework and is scalable to higher dimensions. We derive a lower bound on the option price obtained from the trained neural network and an upper bound from the dual formulation of the stopping problem, which can also be expressed in terms of the Q-function. Our methodology is illustrated with examples covering the pricing of swing options.

[39]  arXiv:2101.09689 (cross-list from cs.IT) [pdf, ps, other]
Title: A Linear Reduction Method for Local Differential Privacy and Log-lift
Subjects: Information Theory (cs.IT); Applications (stat.AP)

This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and log-lift at the same time. It is revealed that both LDP and log-lift are inversely proportional to the statistical distance between conditional probability $P_{Y|S}(x|s)$ and marginal probability $P_{Y}(x)$: the closer the two probabilities are, the more private $Y$ is. Specifying $P_{Y|S}(x|s)$ that linearly reduces this distance $|P_{Y|S}(x|s) - P_Y(x)| = (1-\alpha)|P_{X|S}(x|s) - P_X(x)|,\forall s,x$ for some $\alpha \in (0,1]$, we study the problem of how to generate $Y$ from the original data $S$ and $X$. The Markov randomization/sanitization scheme $P_{Y|X}(x|x') = P_{Y|S,X}(x|s,x')$ is obtained by solving linear equations. The optimal non-Markov sanitization, the transition probability $P_{Y|S,X}(x|s,x')$ that depends on $S$, can be determined by maximizing the data utility subject to linear equality constraints. We compute the solution for two linear utility function: the expected distance and total variance distance. It is shown that the non-Markov randomization significantly improves data utility and the marginal probability $P_X(x)$ remains the same after the linear sanitization method: $P_Y(x) = P_X(x), \forall x \in \mathcal{X}$.

[40]  arXiv:2101.09763 (cross-list from cs.LG) [pdf, other]
Title: Analysing the Noise Model Error for Realistic Noisy Label Data
Comments: Accepted at AAAI 2021, additional material at this https URL
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these noisy labels is noise modelling where the underlying noise process is modelled. In this work, we study the quality of these estimated noise models from the theoretical side by deriving the expected error of the noise model. Apart from evaluating the theoretical results on commonly used synthetic noise, we also publish NoisyNER, a new noisy label dataset from the NLP domain that was obtained through a realistic distant supervision technique. It provides seven sets of labels with differing noise patterns to evaluate different noise levels on the same instances. Parallel, clean labels are available making it possible to study scenarios where a small amount of gold-standard data can be leveraged. Our theoretical results and the corresponding experiments give insights into the factors that influence the noise model estimation like the noise distribution and the sampling technique.

[41]  arXiv:2101.09844 (cross-list from physics.data-an) [pdf, other]
Title: Pattern Ensembling for Spatial Trajectory Reconstruction
Comments: 11 pages, 5 figures
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (cs.LG); Machine Learning (stat.ML)

Digital sensing provides an unprecedented opportunity to assess and understand mobility. However, incompleteness, missing information, possible inaccuracies, and temporal heterogeneity in the geolocation data can undermine its applicability. As mobility patterns are often repeated, we propose a method to use similar trajectory patterns from the local vicinity and probabilistically ensemble them to robustly reconstruct missing or unreliable observations. We evaluate the proposed approach in comparison with traditional functional trajectory interpolation using a case of sea vessel trajectory data provided by The Automatic Identification System (AIS). By effectively leveraging the similarities in real-world trajectories, our pattern ensembling method helps to reconstruct missing trajectory segments of extended length and complex geometry. It can be used for locating mobile objects when temporary unobserved as well as for creating an evenly sampled trajectory interpolation useful for further trajectory mining.

[42]  arXiv:2101.09957 (cross-list from cs.LG) [pdf, other]
Title: Activation Functions in Artificial Neural Networks: A Systematic Overview
Authors: Johannes Lederer
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Activation functions shape the outputs of artificial neurons and, therefore, are integral parts of neural networks in general and deep learning in particular. Some activation functions, such as logistic and relu, have been used for many decades. But with deep learning becoming a mainstream research topic, new activation functions have mushroomed, leading to confusion in both theory and practice. This paper provides an analytic yet up-to-date overview of popular activation functions and their properties, which makes it a timely resource for anyone who studies or applies neural networks.

[43]  arXiv:2101.09960 (cross-list from econ.GN) [pdf, other]
Title: Political Regime and COVID 19 death rate: efficient, biasing or simply different autocracies ?
Subjects: General Economics (econ.GN); Applications (stat.AP)

The difference in COVID 19 death rates across political regimes has caught a lot of attention. The "efficient autocracy" view suggests that autocracies may be more efficient at putting in place policies that contain COVID 19 spread. On the other hand, the "biasing autocracy" view underlines that autocracies may be under reporting their COVID 19 data. We use fixed effect panel regression methods to discriminate between the two sides of the debate. Our results show that a third view may in fact be prevailing: once pre-determined characteristics of countries are accounted for, COVID 19 death rates equalize across political regimes. The difference in death rate across political regime seems therefore to be primarily due to omitted variable bias.

[44]  arXiv:2101.09973 (cross-list from cs.LG) [pdf, ps, other]
Title: Approximating Probability Distributions by ReLU Networks
Comments: Longer version of a paper accepted for presentation at the ITW 2020
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

How many neurons are needed to approximate a target probability distribution using a neural network with a given input distribution and approximation error? This paper examines this question for the case when the input distribution is uniform, and the target distribution belongs to the class of histogram distributions. We obtain a new upper bound on the number of required neurons, which is strictly better than previously existing upper bounds. The key ingredient in this improvement is an efficient construction of the neural nets representing piecewise linear functions. We also obtain a lower bound on the minimum number of neurons needed to approximate the histogram distributions.

[45]  arXiv:2101.10037 (cross-list from cs.LG) [pdf, other]
Title: Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the computational complexity and convergence. Thus, this work focuses on the computational less expensive Online Gradient Descent optimization method, which became popular for learning of neural networks in recent years. For the iterative training of such models, we propose a new approach combining different Online Gradient Descent learners (such as Adam, AMSGrad, Adagrad, Nesterov) to achieve fast convergence. The evaluation on synthetic data and experimental datasets show that the proposed approach outperforms the existing methods resulting in an overall lower prediction error.

[46]  arXiv:2101.10102 (cross-list from cs.LG) [pdf, ps, other]
Title: Probabilistic Robustness Analysis for DNNs based on PAC Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This paper proposes a black box based approach for analysing deep neural networks (DNNs). We view a DNN as a function $\boldsymbol{f}$ from inputs to outputs, and consider the local robustness property for a given input. Based on scenario optimization technique in robust control design, we learn the score difference function $f_i-f_\ell$ with respect to the target label $\ell$ and attacking label $i$. We use a linear template over the input pixels, and learn the corresponding coefficients of the score difference function, based on a reduction to a linear programming (LP) problems. To make it scalable, we propose optimizations including components based learning and focused learning. The learned function offers a probably approximately correct (PAC) guarantee for the robustness property. Since the score difference function is an approximation of the local behaviour of the DNN, it can be used to generate potential adversarial examples, and the original network can be used to check whether they are spurious or not. Finally, we focus on the input pixels with large absolute coefficients, and use them to explain the attacking scenario. We have implemented our approach in a prototypical tool DeepPAC. Our experimental results show that our framework can handle very large neural networks like ResNet152 with $6.5$M neurons, and often generates adversarial examples which are very close to the decision boundary.

[47]  arXiv:2101.10123 (cross-list from cs.LG) [pdf, other]
Title: Conditional Generative Models for Counterfactual Explanations
Comments: 12 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Counterfactual instances offer human-interpretable insight into the local behaviour of machine learning models. We propose a general framework to generate sparse, in-distribution counterfactual model explanations which match a desired target prediction with a conditional generative model, allowing batches of counterfactual instances to be generated with a single forward pass. The method is flexible with respect to the type of generative model used as well as the task of the underlying predictive model. This allows straightforward application of the framework to different modalities such as images, time series or tabular data as well as generative model paradigms such as GANs or autoencoders and predictive tasks like classification or regression. We illustrate the effectiveness of our method on image (CelebA), time series (ECG) and mixed-type tabular (Adult Census) data.

[48]  arXiv:2101.10160 (cross-list from cs.LG) [pdf, other]
Title: Measuring Dependence with Matrix-based Entropy Functional
Comments: Accepted at AAAI-21. An interpretable and differentiable dependence (or independence) measure that can be used to 1) train deep network under covariate shift and non-Gaussian noise; 2) implement a deep deterministic information bottleneck; and 3) understand the dynamics of learning of CNN. Code available at this https URL
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

Measuring the dependence of data plays a central role in statistics and machine learning. In this work, we summarize and generalize the main idea of existing information-theoretic dependence measures into a higher-level perspective by the Shearer's inequality. Based on our generalization, we then propose two measures, namely the matrix-based normalized total correlation ($T_\alpha^*$) and the matrix-based normalized dual total correlation ($D_\alpha^*$), to quantify the dependence of multiple variables in arbitrary dimensional space, without explicit estimation of the underlying data distributions. We show that our measures are differentiable and statistically more powerful than prevalent ones. We also show the impact of our measures in four different machine learning problems, namely the gene regulatory network inference, the robust machine learning under covariate shift and non-Gaussian noises, the subspace outlier detection, and the understanding of the learning dynamics of convolutional neural networks (CNNs), to demonstrate their utilities, advantages, as well as implications to those problems. Code of our dependence measure is available at: https://bit.ly/AAAI-dependence

[49]  arXiv:2101.10189 (cross-list from math.OC) [pdf, other]
Title: Surrogate Models for Optimization of Dynamical Systems
Subjects: Optimization and Control (math.OC); Machine Learning (stat.ML)

Driven by increased complexity of dynamical systems, the solution of system of differential equations through numerical simulation in optimization problems has become computationally expensive. This paper provides a smart data driven mechanism to construct low dimensional surrogate models. These surrogate models reduce the computational time for solution of the complex optimization problems by using training instances derived from the evaluations of the true objective functions. The surrogate models are constructed using combination of proper orthogonal decomposition and radial basis functions and provides system responses by simple matrix multiplication. Using relative maximum absolute error as the measure of accuracy of approximation, it is shown surrogate models with latin hypercube sampling and spline radial basis functions dominate variable order methods in computational time of optimization, while preserving the accuracy. These surrogate models also show robustness in presence of model non-linearities. Therefore, these computational efficient predictive surrogate models are applicable in various fields, specifically to solve inverse problems and optimal control problems, some examples of which are demonstrated in this paper.

[50]  arXiv:2101.10229 (cross-list from cs.LG) [pdf, other]
Title: Universal Approximation Properties for ODENet and ResNet
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Classical Analysis and ODEs (math.CA); Numerical Analysis (math.NA); Machine Learning (stat.ML)

We prove a universal approximation property (UAP) for a class of ODENet and a class of ResNet, which are used in many deep learning algorithms. The UAP can be stated as follows. Let $n$ and $m$ be the dimension of input and output data, and assume $m\leq n$. Then we show that ODENet width $n+m$ with any non-polynomial continuous activation function can approximate any continuous function on a compact subset on $\mathbb{R}^n$. We also show that ResNet has the same property as the depth tends to infinity. Furthermore, we derive explicitly the gradient of a loss function with respect to a certain tuning variable. We use this to construct a learning algorithm for ODENet. To demonstrate the usefulness of this algorithm, we apply it to a regression problem, a binary classification, and a multinomial classification in MNIST.

[51]  arXiv:2101.10255 (cross-list from econ.EM) [pdf, ps, other]
Title: Consistent specification testing under spatial dependence
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST)

We propose a series-based nonparametric specification test for a regression function when data are spatially dependent, the `space' being of a general economic or social nature. Dependence can be parametric, parametric with increasing dimension, semiparametric or any combination thereof, thus covering a vast variety of settings. These include spatial error models of varying types and levels of complexity. Under a new smooth spatial dependence condition, our test statistic is asymptotically standard normal. To prove the latter property, we establish a central limit theorem for quadratic forms in linear processes in an increasing dimension setting. Finite sample performance is investigated in a simulation study and empirical examples illustrate the test with real-world data.

[52]  arXiv:2101.10261 (cross-list from cs.LG) [pdf, other]
Title: Discrete Choice Analysis with Machine Learning Capabilities
Subjects: Machine Learning (cs.LG); Econometrics (econ.EM); Methodology (stat.ME)

This paper discusses capabilities that are essential to models applied in policy analysis settings and the limitations of direct applications of off-the-shelf machine learning methodologies to such settings. Traditional econometric methodologies for building discrete choice models for policy analysis involve combining data with modeling assumptions guided by subject-matter considerations. Such considerations are typically most useful in specifying the systematic component of random utility discrete choice models but are typically of limited aid in determining the form of the random component. We identify an area where machine learning paradigms can be leveraged, namely in specifying and systematically selecting the best specification of the random component of the utility equations. We review two recent novel applications where mixed-integer optimization and cross-validation are used to algorithmically select optimal specifications for the random utility components of nested logit and logit mixture models subject to interpretability constraints.

[53]  arXiv:2101.10266 (cross-list from cs.LG) [pdf, other]
Title: COVID-19 Outbreak Prediction and Analysis using Self Reported Symptoms
Comments: 14 pages, 16 Figures
Subjects: Machine Learning (cs.LG); Applications (stat.AP)

The COVID-19 pandemic has challenged scientists and policy-makers internationally to develop novel approaches to public health policy. Furthermore, it has also been observed that the prevalence and spread of COVID-19 vary across different spatial, temporal, and demographics. Despite ramping up testing, we still are not at the required level in most parts of the globe. Therefore, we utilize self-reported symptoms survey data to understand trends in the spread of COVID-19. The aim of this study is to segment populations that are highly susceptible. In order to understand such populations, we perform exploratory data analysis, outbreak prediction, and time-series forecasting using public health and policy datasets. From our studies, we try to predict the likely % of the population that tested positive for COVID-19 based on self-reported symptoms. Our findings reaffirm the predictive value of symptoms, such as anosmia and ageusia. And we forecast that % of the population having COVID-19-like illness (CLI) and those tested positive as 0.15% and 1.14% absolute error respectively. These findings could help aid faster development of the public health policy, particularly in areas with low levels of testing and having a greater reliance on self-reported symptoms. Our analysis sheds light on identifying clinical attributes of interest across different demographics. We also provide insights into the effects of various policy enactments on COVID-19 prevalence.

Replacements for Tue, 26 Jan 21

[54]  arXiv:1709.08238 (replaced) [pdf, other]
Title: Counterparty Credit Limits: The Impact of a Risk-Mitigation Measure on Everyday Trading
Subjects: Trading and Market Microstructure (q-fin.TR); Econometrics (econ.EM); Probability (math.PR); Applications (stat.AP); Computation (stat.CO)
[55]  arXiv:1809.02383 (replaced) [pdf, other]
Title: Group-based Learning of Disentangled Representations with Generalizability for Novel Contents
Authors: Haruo Hosoya
Journal-ref: published in IJCAI 2019
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[56]  arXiv:1902.00097 (replaced) [pdf, other]
Title: Ensembling methods for countrywide short term forecasting of gas demand
Journal-ref: Int. J. Oil, Gas and Coal Technology, Vol. 26, No. 2, pp.184-201 (2021)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[57]  arXiv:1902.01289 (replaced) [pdf, other]
Title: Diagnostics for Stochastic Gaussian Process Emulators
Subjects: Methodology (stat.ME)
[58]  arXiv:1905.01435 (replaced) [pdf, ps, other]
Title: Tight Regret Bounds for Infinite-armed Linear Contextual Bandits
Comments: 10 pages, AISTATS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[59]  arXiv:1905.03337 (replaced) [pdf, other]
Title: Optimal Rerandomization via a Criterion that Provides Insurance Against Failed Experiments
Comments: 27 pages, 5 figures, 2 tables, 2 algorithms
Subjects: Methodology (stat.ME)
[60]  arXiv:1905.09780 (replaced) [pdf, other]
Title: Bayesian Optimization with Approximate Set Kernels
Comments: 18 pages, 7 figures, 5 tables, accepted for publication in Machine Learning Journal
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[61]  arXiv:1906.02314 (replaced) [pdf, ps, other]
Title: A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization
Comments: Submitted to T-IT. Many new theoretical and experimental results
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[62]  arXiv:1906.06463 (replaced) [pdf, other]
Title: Linear Aggregation in Tree-based Estimators
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
[63]  arXiv:1906.10470 (replaced) [pdf, other]
Title: An Unsupervised Bayesian Neural Network for Truth Discovery in Social Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[64]  arXiv:1907.02109 (replaced) [pdf, other]
Title: A unified approach to mixed-integer optimization problems with logical constraints
Comments: Revised version (including title change). The old title was "A unified approach to mixed-integer optimization: Nonlinear formulations and scalable algorithms"
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[65]  arXiv:1907.02579 (replaced) [pdf, other]
Title: Particularities and commonalities of singular spectrum analysis as a method of time series analysis and signal processing
Authors: Nina Golyandina
Journal-ref: WIREs Computational Statistics, 2020, Vol.12, No 4., e1487, 39pp
Subjects: Methodology (stat.ME)
[66]  arXiv:1907.03025 (replaced) [pdf, other]
Title: Improving Lasso for model selection and prediction
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)
[67]  arXiv:1907.05911 (replaced) [pdf, other]
Title: Vector Quantized Bayesian Neural Network Inference for Data Streams
Comments: AAAI 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[68]  arXiv:1908.05365 (replaced) [pdf, other]
Title: End-to-End Learning from Complex Multigraphs with Latent-Graph Convolutional Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[69]  arXiv:1908.08783 (replaced) [pdf, other]
Title: Learning Fitness Functions for Machine Programming
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)
[70]  arXiv:1909.11651 (replaced) [pdf, other]
Title: Matching Embeddings for Domain Adaptation
Comments: 12 pages, 3 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[71]  arXiv:1909.11926 (replaced) [pdf, other]
Title: Hierarchical Neural Architecture Search via Operator Clustering
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[72]  arXiv:1910.02386 (replaced) [src]
Title: A New Graphical Device and Related Tests for the Shape of Non-parametric Regression Function
Comments: There were errors in mathematical proofs of Theorem 1 and related lemmas. Major revisions were needed
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[73]  arXiv:1910.07643 (replaced) [pdf, other]
Title: Dynamic Graph Convolutional Networks Using the Tensor M-Product
Comments: Accepted to SIAM International Conference on Data Mining (SDM) 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[74]  arXiv:1910.13603 (replaced) [pdf, other]
Title: When MAML Can Adapt Fast and How to Assist When It Cannot
Comments: Accepted at AISTATS 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[75]  arXiv:1911.00677 (replaced) [pdf, other]
Title: Fairness Violations and Mitigation under Covariate Shift
Comments: 11 pages main and 7 pages supplementary, To appear at ACM FAccT '21, Previous arXiv version arXiv:1911.00677v1 was presented at Workshop on Fair ML for Health '19
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
[76]  arXiv:1911.01702 (replaced) [pdf, other]
Title: DocParser: Hierarchical Structure Parsing of Document Renderings
Comments: AAAI 2021
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[77]  arXiv:1911.02160 (replaced) [pdf, other]
Title: Shrinkage with shrunken shoulders: inference via geometrically / uniformly ergodic Gibbs sampler
Comments: 23 pages, 8 figures, (18 pages, 3 figures of Supplement). Code available from this https URL
Subjects: Methodology (stat.ME); Statistics Theory (math.ST)
[78]  arXiv:1911.09049 (replaced) [pdf, other]
Title: Sharp hypotheses and bispatial inference
Comments: Corrected, rewritten and extended. *Final version*
Subjects: Other Statistics (stat.OT); Methodology (stat.ME)
[79]  arXiv:1912.00682 (replaced) [pdf, other]
Title: GeoTrackNet-A Maritime Anomaly Detector using Probabilistic Neural Network Representation of AIS Tracks and A Contrario Detection
Comments: IEEE Transactions on Intelligent Transportation Systems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[80]  arXiv:2001.02522 (replaced) [pdf]
Title: On Interpretability of Artificial Neural Networks: A Survey
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[81]  arXiv:2001.07859 (replaced) [pdf, other]
Title: A Deep Learning Algorithm for High-Dimensional Exploratory Item Factor Analysis
Comments: 30 pages; 12 figures; accepted for publication in Psychometrika
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
[82]  arXiv:2002.05474 (replaced) [pdf, ps, other]
Title: Metric-Free Individual Fairness in Online Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[83]  arXiv:2002.06247 (replaced) [pdf, other]
Title: Robust Policies For Proactive ICU Transfers
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
[84]  arXiv:2002.09402 (replaced) [pdf, other]
Title: Addressing Some Limitations of Transformers with Feedback Memory
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[85]  arXiv:2002.09843 (replaced) [pdf, other]
Title: Computation-efficient Deep Model Training for Ciphertext-based Cross-silo Federated Learning
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[86]  arXiv:2002.11809 (replaced) [pdf, ps, other]
Title: Assortativity and bidegree distributions on Bernoulli random graph superpositions
Comments: 26 pages
Subjects: Probability (math.PR); Social and Information Networks (cs.SI); Statistics Theory (math.ST)
[87]  arXiv:2003.00401 (replaced) [pdf, other]
Title: A flexible Bayesian framework to estimate age- and cause-specific child mortality over time from sample registration data
Comments: 16 pages, 4 figures, submitted to The Annals of Applied Statistics
Subjects: Applications (stat.AP)
[88]  arXiv:2004.04668 (replaced) [pdf, other]
Title: Test-Time Adaptable Neural Networks for Robust Medical Image Segmentation
Comments: Published in Medical Image Analysis journal: this https URL
Journal-ref: Medical Image Analysis, Volume 68, 2021, 101907, ISSN 1361-8415. http://www.sciencedirect.com/science/article/pii/S1361841520302711
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[89]  arXiv:2004.06496 (replaced) [pdf, other]
Title: Certifiable Robustness to Adversarial State Uncertainty in Deep Reinforcement Learning
Comments: arXiv admin note: text overlap with arXiv:1910.12908
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[90]  arXiv:2004.08492 (replaced) [pdf, other]
Title: Orbit: Probabilistic Forecast with Exponential Smoothing
Comments: arXiv admin note: text overlap with arXiv:1909.13316 by other authors
Subjects: Computation (stat.CO); Methodology (stat.ME)
[91]  arXiv:2004.11497 (replaced) [pdf, other]
Title: Causal Modeling with Stochastic Confounders
Comments: AISTATS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[92]  arXiv:2005.07606 (replaced) [src]
Title: Initializing Perturbations in Multiple Directions for Fast Adversarial Training
Comments: has no contribution
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[93]  arXiv:2006.00266 (replaced) [pdf, other]
Title: Functional additive models for optimizing individualized treatment rules
Comments: 24 pages, 7 figures, 2 tables
Subjects: Methodology (stat.ME)
[94]  arXiv:2006.01862 (replaced) [pdf, other]
Title: Consistent Estimators for Learning to Defer to an Expert
Comments: ICML 2020
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
[95]  arXiv:2006.07185 (replaced) [pdf, other]
Title: Grounding Language to Autonomously-Acquired Skills via Goal Generation
Comments: Published at ICLR 2021
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[96]  arXiv:2006.09275 (replaced) [pdf, other]
Title: Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes
Comments: 11 pages, 5 figures + SI: Updated based on the published version in PROTEINS. Presented at NeurIPS 2019 workshop Learning Meaningful Representations of Life
Subjects: Biomolecules (q-bio.BM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[97]  arXiv:2006.09985 (replaced) [pdf, other]
Title: An Efficient Spiking Neural Network for Recognizing Gestures with a DVS Camera on the Loihi Neuromorphic Processor
Comments: Accepted for publication at the 2020 International Joint Conference on Neural Networks (IJCNN)
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)
[98]  arXiv:2006.12655 (replaced) [pdf, other]
Title: Perceptual Adversarial Robustness: Defense Against Unseen Threat Models
Comments: Published in ICLR 2021. Code and data are available at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[99]  arXiv:2006.14448 (replaced) [pdf, other]
Title: Learning Task-General Representations with Generative Neuro-Symbolic Modeling
Journal-ref: International Conference on Learning Representations (ICLR 2021)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[100]  arXiv:2007.00140 (replaced) [pdf, other]
Title: RE-MIMO: Recurrent and Permutation Equivariant Neural MIMO Detection
Comments: copyright 2020 IEEE TSP. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Signal Processing (eess.SP); Machine Learning (stat.ML)
[101]  arXiv:2007.01659 (replaced) [pdf, other]
Title: Diagnostic Uncertainty Calibration: Towards Reliable Machine Predictions in Medical Domain
Comments: 31 pages, 6 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[102]  arXiv:2007.03746 (replaced) [pdf, ps, other]
Title: Transfer Learning for Motor Imagery Based Brain-Computer Interfaces: A Complete Pipeline
Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[103]  arXiv:2007.07617 (replaced) [pdf, other]
Title: SpaceNet: Make Free Space For Continual Learning
Comments: Accepted in Neurocomputing Journal
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[104]  arXiv:2007.10455 (replaced) [pdf, other]
Title: The multilayer random dot product graph
Comments: 45 pages, 15 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[105]  arXiv:2007.10976 (replaced) [pdf, ps, other]
Title: Interactive Inference under Information Constraints
Comments: Adding a section on information losses; improving presentation
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)
[106]  arXiv:2007.11202 (replaced) [pdf, other]
Title: MathNet: Haar-Like Wavelet Multiresolution-Analysis for Graph Representation and Learning
Comments: 32 pages, 6 figures, 6 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[107]  arXiv:2007.11573 (replaced) [pdf, other]
Title: Autonomous Tracking and State Estimation with Generalised Group Lasso
Comments: 14pags, 10 figures
Subjects: Methodology (stat.ME); Optimization and Control (math.OC)
[108]  arXiv:2008.01304 (replaced) [pdf, ps, other]
Title: The Exact Asymptotic Form of Bayesian Generalization Error in Latent Dirichlet Allocation
Authors: Naoki Hayashi
Comments: 20 pages, 3 figures, 2 tables. Accepted at Neural Networks (Elsevier)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[109]  arXiv:2008.02545 (replaced) [pdf, other]
Title: Deep neural networks adapt to intrinsic dimensionality beyond the target domain
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[110]  arXiv:2008.06529 (replaced) [pdf, other]
Title: Three Variants of Differential Privacy: Lossless Conversion and Applications
Comments: To appear in IEEE Journal on Selected Areas in Information Theory, Special Issue on Privacy and Security of Information Systems. arXiv admin note: text overlap with arXiv:2001.05990
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[111]  arXiv:2008.11435 (replaced) [pdf, other]
Title: Comparison of classical and Bayesian imaging in radio interferometry
Comments: 23 pages, 16 figures, 4 tables, data published at this https URL
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Applications (stat.AP)
[112]  arXiv:2008.11809 (replaced) [pdf, ps, other]
Title: Unlabeled Data Help in Graph-Based Semi-Supervised Learning: A Bayesian Nonparametrics Perspective
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
[113]  arXiv:2009.00437 (replaced) [pdf, other]
Title: NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size
Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021. arXiv admin note: substantial text overlap with arXiv:2001.00326
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[114]  arXiv:2009.08886 (replaced) [pdf, other]
Title: BNAS-v2: Memory-efficient and Performance-collapse-prevented Broad Neural Architecture Search
Comments: 12 pages, 11 figures, 3 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[115]  arXiv:2009.10622 (replaced) [pdf, ps, other]
Title: An $l_1$-oracle inequality for the Lasso in mixture-of-experts regression models
Comments: Corrected typos. Added new Section 4. Discussion and comparisons
Subjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
[116]  arXiv:2010.04592 (replaced) [pdf, other]
Title: Contrastive Learning with Hard Negative Samples
Comments: Published as a conference paper at ICLR 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[117]  arXiv:2010.05273 (replaced) [pdf, other]
Title: Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
Comments: ICLR 2021. Code: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[118]  arXiv:2010.11385 (replaced) [pdf, other]
Title: Dirichlet Process Mixture Models with Shrinkage Prior
Subjects: Methodology (stat.ME); Applications (stat.AP)
[119]  arXiv:2011.00417 (replaced) [pdf, other]
Title: DebiNet: Debiasing Linear Models with Nonlinear Overparameterized Neural Networks
Authors: Shiyun Xu, Zhiqi Bu
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[120]  arXiv:2011.03140 (replaced) [pdf, other]
Title: Prediction of Future Failures for Heterogeneous Reliability Field Data
Subjects: Methodology (stat.ME); Applications (stat.AP)
[121]  arXiv:2011.03789 (replaced) [pdf, ps, other]
Title: Estimation of smooth functionals in high-dimensional models: bootstrap chains and Gaussian approximation
Subjects: Statistics Theory (math.ST)
[122]  arXiv:2011.07466 (replaced) [pdf, other]
Title: CcGAN: Continuous Conditional Generative Adversarial Networks for Image Generation
Journal-ref: International Conference on Learning Representations 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[123]  arXiv:2011.14317 (replaced) [pdf, other]
Title: FROCC: Fast Random projection-based One-Class Classification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[124]  arXiv:2012.04798 (replaced) [pdf, other]
Title: Coupled Markov chain Monte Carlo for high-dimensional regression with Half-t priors
Comments: 37 pages, 11 figures
Subjects: Methodology (stat.ME); Computation (stat.CO)
[125]  arXiv:2012.13326 (replaced) [pdf, ps, other]
Title: A Tight Lower Bound for Uniformly Stable Algorithms
Authors: Qinghua Liu, Zhou Lu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[126]  arXiv:2101.03446 (replaced) [pdf, other]
Title: The shifted ODE method for underdamped Langevin MCMC
Subjects: Numerical Analysis (math.NA); Probability (math.PR); Statistics Theory (math.ST)
[127]  arXiv:2101.04789 (replaced) [pdf, ps, other]
Title: Improving Classification Accuracy with Graph Filtering
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[128]  arXiv:2101.06369 (replaced) [pdf, ps, other]
Title: Non-convex weakly smooth Langevin Monte Carlo using regularization
Subjects: Computation (stat.CO)
[129]  arXiv:2101.07987 (replaced) [pdf, other]
Title: matrixdist: An R Package for Inhomogeneous Phase-Type Distributions
Subjects: Computation (stat.CO)
[130]  arXiv:2101.08007 (replaced) [pdf, other]
Title: On the Non-Monotonicity of a Non-Differentially Mismeasured Binary Confounder
Authors: Jose M. Peña
Comments: arXiv admin note: text overlap with arXiv:2005.13245
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
[131]  arXiv:2101.09022 (replaced) [pdf]
Title: A heavy-tailed and overdispersed collective risk model
Subjects: Applications (stat.AP)
[ total of 131 entries: 1-131 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2101, contact, help  (Access key information)