Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 26 Jan 21
 [1] arXiv:2101.09297 [pdf, other]

Title: Addressing Spatially Structured Interference in Causal Analysis Using Propensity ScoresComments: 37 pages, 7 figures, being submittedSubjects: Applications (stat.AP)
Environmental epidemiologists are increasingly interested in establishing causality between exposures and health outcomes. A popular model for causal inference is the Rubin Causal Model (RCM), which typically seeks to estimate the average difference in study units' potential outcomes. An important assumption under RCM is no interference; that is, the potential outcomes of one unit are not affected by the exposure status of other units. The no interference assumption is violated if we expect spillover or diffusion of exposure effects based on units' proximity to other units and several other causal estimands arise. Air pollution epidemiology typically violates this assumption when we expect upwind events to affect downwind or nearby locations. This paper adapts causal assumptions from social network research to address interference and allow estimation of both direct and spillover causal effects. We use propensity scorebased methods to estimate these effects when considering the effects of the Environmental Protection Agency's 2005 nonattainment designations for particulate matter with aerodynamic diameter less than 2.5 micrograms per cubic meter (PM2.5) on lung cancer incidence using countylevel data obtained from the Surveillance, Epidemiology, and End Results (SEER) Program. We compare these methods in a rigorous simulation study that considers both spatially autocorrelated variables, interference, and missing confounders. We find that pruning and matching based on the propensity score produces the highest probability coverage of the true causal effects and lower mean squared error. When applied to the research question, we found protective direct and spillover causal effects.
 [2] arXiv:2101.09304 [pdf, other]

Title: Revisiting Identifying Assumptions for Population Size EstimationComments: 48 pages. The material presented in Appendix A previously appeared in an unpublished preprint written by the first author: arXiv:2008.09865Subjects: Methodology (stat.ME)
The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capturerecapture or multiplesystems estimation. This is fundamentally a missing data problem, where the number of unobserved individuals represents the missing data. As with any missing data problem, multiplesystems estimation requires users to make an untestable identifying assumption in order to estimate the population size from the observed data. Approaches to multiplesystems estimation often do not emphasize the role of the identifying assumption during model specification, which makes it difficult to decouple the specification of the model for the observed data from the identifying assumption. We present a reframing of the multiplesystems estimation problem that decouples the specification of the observeddata model from the identifying assumptions, and discuss how loglinear models and the associated nohighestorder interaction assumption fit into this framing. We present an approach to computation in the Bayesian setting which takes advantage of existing software and facilitates various sensitivity analyses. We demonstrate our approach in a case study of estimating the number of civilian casualties in the Kosovo war. Code used to produce this manuscript is available at https://github.com/aleshing/revisitingidentifyingassumptions.
 [3] arXiv:2101.09315 [pdf, ps, other]

Title: Tighter expected generalization error bounds via Wasserstein distanceComments: 22 pages: 12 of the main text, 2 of references, and 8 of appendicesSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
In this work, we introduce several expected generalization error bounds based on the Wasserstein distance. More precisely, we present fulldataset, singleletter, and randomsubset bounds on both the standard setting and the randomizedsubsample setting from Steinke and Zakynthinou [2020]. Moreover, we show that, when the loss function is bounded, these bounds recover from below (and thus are tighter than) current bounds based on the relative entropy and, for the standard setting, generate new, nonvacuous bounds also based on the relative entropy. Then, we show how similar bounds featuring the backward channel can be derived with the proposed proof techniques. Finally, we show how various new bounds based on different information measures (e.g., the lautum information or several $f$divergences) can be derived from the presented bounds.
 [4] arXiv:2101.09418 [pdf, other]

Title: A Geospatial Functional Model For OCO2 Data with Application on Imputation and Land Fraction EstimationSubjects: Applications (stat.AP)
Data from NASA's Orbiting Carbon Observatory2 (OCO2) satellite is essential to many carbon management strategies. A retrieval algorithm is used to estimate CO2 concentration using the radiance data measured by OCO2. However, due to factors such as cloud cover and cosmic rays, the spatial coverage of the retrieval algorithm is limited in some areas of critical importance for carbon cycle science. Mixed land/water pixels along the coastline are also not used in the retrieval processing due to the lack of valid ancillary variables including land fraction. We propose an approach to model spatial spectral data to solve these two problems by radiance imputation and land fraction estimation. The spectral observations are modeled as spatially indexed functional data with footprintspecific parameters and are reduced to much lower dimensions by functional principal component analysis. The principal component scores are modeled as random fields to account for the spatial dependence, and the missing spectral observations are imputed by kriging the principal component scores. The proposed method is shown to impute spectral radiance with high accuracy for observations over the Pacific Ocean. An unmixing approach based on this model provides much more accurate land fraction estimates in our validation study along Greece coastlines.
 [5] arXiv:2101.09424 [pdf, other]

Title: A ChangePoint Based Control Chart for Detecting Sparse Changes in HighDimensional Heteroscedastic DataSubjects: Methodology (stat.ME)
Because of the curseofdimensionality, highdimensional processes present challenges to traditional multivariate statistical process monitoring (SPM) techniques. In addition, the unknown underlying distribution and complicated dependency among variables such as heteroscedasticity increase uncertainty of estimated parameters, and decrease the effectiveness of control charts. In addition, the requirement of sufficient reference samples limits the application of traditional charts in high dimension low sample size scenarios (small n, large p). More difficulties appear in detecting and diagnosing abnormal behaviors that are caused by a small set of variables, i.e., sparse changes. In this article, we propose a changepoint based control chart to detect sparse shifts in the mean vector of highdimensional heteroscedastic processes. Our proposed method can start monitoring when the number of observations is a lot smaller than the dimensionality. The simulation results show its robustness to nonnormality and heteroscedasticity. A real data example is used to illustrate the effectiveness of the proposed control chart in highdimensional applications. Supplementary material and code are provided online.
 [6] arXiv:2101.09514 [pdf, ps, other]

Title: Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random VariablesSubjects: Computation (stat.CO)
We aim to estimate the probability that the sum of nonnegative independent and identically distributed random variables falls below a given threshold, i.e., $\mathbb{P}(\sum_{i=1}^{N}{X_i} \leq \gamma)$, via importance sampling (IS). We are particularly interested in the rare event regime when $N$ is large and/or $\gamma$ is small. The exponential twisting is a popular technique that, in most of the cases, compares favorably to existing estimators. However, it has several limitations: i) it assumes the knowledge of the moment generating function of $X_i$ and ii) sampling under the new measure is not straightforward and might be expensive. The aim of this work is to propose an alternative change of measure that yields, in the rare event regime corresponding to large $N$ and/or small $\gamma$, at least the same performance as the exponential twisting technique and, at the same time, does not introduce serious limitations. For distributions whose probability density functions (PDFs) are $\mathcal{O}(x^{d})$, as $x \rightarrow 0$ and $d>1$, we prove that the Gamma IS PDF with appropriately chosen parameters retrieves asymptotically, in the rare event regime, the same performance of the estimator based on the use of the exponential twisting technique. Moreover, in the Lognormal setting, where the PDF at zero vanishes faster than any polynomial, we numerically show that a Gamma IS PDF with optimized parameters clearly outperforms the exponential twisting change of measure. Numerical experiments validate the efficiency of the proposed estimator in delivering a highly accurate estimate in the regime of large $N$ and/or small $\gamma$.
 [7] arXiv:2101.09558 [pdf, other]

Title: The Gauss Hypergeometric Covariance Kernel for Modeling SecondOrder Stationary Random Fields in Euclidean Spaces: its Compact Support, Properties and Spectral RepresentationComments: 22 pagesSubjects: Statistics Theory (math.ST)
This paper presents a parametric family of compactlysupported positive semidefinite kernels aimed to model the covariance structure of secondorder stationary isotropic random fields defined in the $d$dimensional Euclidean space. Both the covariance and its spectral density have an analytic expression involving the hypergeometric functions ${}_2F_1$ and ${}_1F_2$, respectively, and four realvalued parameters related to the correlation range, smoothness and shape of the covariance. The presented hypergeometric kernel family contains, as special cases, the spherical, cubic, penta, Askey, generalized Wendland and truncated power covariances and, as asymptotic cases, the Mat\'ern, Laguerre, Tricomi, incomplete gamma and Gaussian covariances, among others. The parameter space of the univariate hypergeometric kernel is identified and its functional properties  continuity, smoothness, transitive upscaling (mont\'ee) and downscaling (descente)  are examined. Several sets of sufficient conditions are also derived to obtain valid stationary bivariate and multivariate covariance kernels, characterized by four matrixvalued parameters. Such kernels turn out to be versatile, insofar as the direct and crosscovariances do not necessarily have the same shapes, correlation ranges or behaviors at short scale, thus associated with vector random fields whose components are crosscorrelated but have different spatial structures.
 [8] arXiv:2101.09587 [pdf, other]

Title: Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in CancerAuthors: Zeya Wang, Veera Baladandayuthapan, Ahmed O. Kaseb, Hesham M. Amin, Manal M. Hassan, Wenyi Wang, Jeffrey S. MorrisSubjects: Applications (stat.AP); Methodology (stat.ME)
Graphical models are commonly used to discover associations within gene or protein networks for complex diseases such as cancer. Most existing methods estimate a single graph for a population, while in many cases, researchers are interested in characterizing the heterogeneity of individual networks across subjects with respect to subjectlevel covariates. Examples include assessments of how the network varies with patientspecific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subjectlevel covariates. Bayesian shrinkage algorithms are used to induce sparsity in the underlying graphical models. We assess our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity and a case study assessing how blood protein networks in hepatocellular carcinoma patients vary with severity of disease, measured by HepatoScore, a novel biomarker signature measuring disease severity.
 [9] arXiv:2101.09596 [pdf]

Title: The Role of Distributional Overlap on the Precision Gain of Bounds for GeneralizationAuthors: Wendy ChanSubjects: Methodology (stat.ME)
Over the past ten years, propensity score methods have made an important contribution to improving generalizations from studies that do not select samples randomly from a population of inference. However, these methods require assumptions and recent work has considered the role of bounding approaches that provide a range of treatment impact estimates that are consistent with the observable data. An important limitation to bound estimates is that they can be uninformatively wide. This has motivated research on the use of propensity score stratification to narrow bounds. This article assesses the role of distributional overlap in propensity scores on the effectiveness of stratification to tighten bounds. Using the results of two simulation studies and two case studies, I evaluate the relationship between distributional overlap and precision gain and discuss the implications when propensity score stratification is used as a method to improve precision in the bounding framework.
 [10] arXiv:2101.09604 [pdf, other]

Title: UltraNest  a robust, general purpose Bayesian inference engineAuthors: Johannes BuchnerComments: Longer version of the paper submitted to JOSS. UltraNest can be found at this https URLSubjects: Computation (stat.CO); Instrumentation and Methods for Astrophysics (astroph.IM)
UltraNest is a generalpurpose Bayesian inference package for parameter estimation and model comparison. It allows fitting arbitrary models specified as likelihood functions written in Python, C, C++, Fortran, Julia or R. With a focus on correctness and speed (in that order), UltraNest is especially useful for multimodal or nonGaussian parameter spaces, computational expensive models, in robust pipelines. Parallelisation to computing clusters and resuming incomplete runs is available.
 [11] arXiv:2101.09605 [pdf, other]

Title: Local linear tiebreaker designsSubjects: Methodology (stat.ME); Econometrics (econ.EM); Statistics Theory (math.ST)
Tiebreaker experimental designs are hybrids of Randomized Control Trials (RCTs) and Regression Discontinuity Designs (RDDs) in which subjects with moderate scores are placed in an RCT while subjects with extreme scores are deterministically assigned to the treatment or control group. The design maintains the benefits of randomization for causal estimation while avoiding the possibility of excluding the most deserving recipients from the treatment group. The causal effect estimator for a tiebreaker design can be estimated by fitting local linear regressions for both the treatment and control group, as is typically done for RDDs. We study the statistical efficiency of such local linear regressionbased causal estimators as a function of $\Delta$, the radius of the interval in which treatment randomization occurs. In particular, we determine the efficiency of the estimator as a function of $\Delta$ for a fixed, arbitrary bandwidth under the assumption of a uniform assignment variable. To generalize beyond uniform assignment variables and asymptotic regimes, we also demonstrate on the Angrist and Lavy (1999) classroom size dataset that prior to conducting an experiment, an experimental designer can estimate the efficiency for various experimental radii choices by using Monte Carlo as long as they have access to the distribution of the assignment variable. For both uniform and triangular kernels, we show that increasing the radius of randomized experiment interval will increase the efficiency until the radius is the size of the locallinear regression bandwidth, after which no additional efficiency benefits are conferred.
 [12] arXiv:2101.09675 [pdf, other]

Title: Nested Sampling MethodsAuthors: Johannes BuchnerComments: Comments are welcome. The opensource UltraNest package and astrostatistics tutorials can be found at this https URLSubjects: Computation (stat.CO); Instrumentation and Methods for Astrophysics (astroph.IM)
Nested sampling (NS) computes parameter posterior distributions and makes Bayesian model comparison computationally feasible. Its strengths are the unsupervised navigation of complex, potentially multimodal posteriors until a welldefined termination point. A systematic literature review of nested sampling algorithms and variants is presented. We focus on complete algorithms, including solutions to likelihoodrestricted prior sampling. A new formulation of NS is presented, which casts the parameter space exploration as a search on a tree. Previously published ways of obtaining robust error estimates and dynamic variations of the number of live points are presented as special cases of this formulation.
 [13] arXiv:2101.09711 [pdf, other]

Title: Testing for subsphericity when $n$ and $p$ are of different asymptotic orderAuthors: Joni VirtaComments: 12 pages, 1 figureSubjects: Statistics Theory (math.ST)
In this short note, we extend a classical test of subsphericity, based on the first two moments of the eigenvalues of the sample covariance matrix, to the highdimensional regime where the signal eigenvalues of the covariance matrix diverge to infinity and either $p/n \rightarrow 0$ or $p/n \rightarrow \infty$. In the latter case we further require that the divergence of the eigenvalues is suitably fast in a specific sense. Our work can be seen to complement that of Schott (2006) who established equivalent results for the case $p/n \rightarrow \gamma \in (0, \infty)$. Simulations are used to demonstrate the results, providing also evidence that the test might be further extendable to a wider asymptotic regime.
 [14] arXiv:2101.09747 [pdf, ps, other]

Title: Numerical issues in maximum likelihood parameter estimation for Gaussian process regressionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
This article focuses on numerical issues in maximum likelihood parameter estimation for Gaussian process regression (GPR). This article investigates the origin of the numerical issues and provides simple but effective improvement strategies. This work targets a basic problem but a host of studies, particularly in the literature of Bayesian optimization, rely on offtheshelf GPR implementations. For the conclusions of these studies to be reliable and reproducible, robust GPR implementations are critical.
 [15] arXiv:2101.09756 [pdf, other]

Title: Entropy Partial Transport with Tree Metrics: Theory and PracticeComments: To appear in AISTATS 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Optimal transport (OT) theory provides powerful tools to compare probability measures. However, OT is limited to nonnegative measures having the same mass, and suffers serious drawbacks about its computation and statistics. This leads to several proposals of regularized variants of OT in the recent literature. In this work, we consider an \textit{entropy partial transport} (EPT) problem for nonnegative measures on a tree having different masses. The EPT is shown to be equivalent to a standard complete OT problem on a onenode extended tree. We derive its dual formulation, then leverage this to propose a novel regularization for EPT which admits fast computation and negative definiteness. To our knowledge, the proposed regularized EPT is the first approach that yields a \textit{closedform} solution among available variants of unbalanced OT. For practical applications without priori knowledge about the tree structure for measures, we propose treesliced variants of the regularized EPT, computed by averaging the regularized EPT between these measures using random tree metrics, built adaptively from support data points. Exploiting the negative definiteness of our regularized EPT, we introduce a positive definite kernel, and evaluate it against other baselines on benchmark tasks such as document classification with word embedding and topological data analysis. In addition, we empirically demonstrate that our regularization also provides effective approximations.
 [16] arXiv:2101.09809 [pdf, other]

Title: NeurTFDR: Controlling FDR by Incorporating Feature HierarchySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Controlling false discovery rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the testlevel covariates while ignoring possible hierarchy among the covariates. This strategy may not be optimal for complex largescale problems, where hierarchical information often exists among those testlevel covariates. We propose NeurTFDR which boosts statistical power and controls FDR for multiple hypothesis testing while leveraging the hierarchy among testlevel covariates. Our method parametrizes the testlevel covariates as a neural network and adjusts the feature hierarchy through a regression framework, which enables flexible handling of highdimensional features as well as efficient endtoend optimization. We show that NeurTFDR has strong FDR guarantees and makes substantially more discoveries in synthetic and real datasets compared to competitive baselines.
 [17] arXiv:2101.09855 [pdf, other]

Title: Diffusion Asymptotics for Sequential ExperimentsSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG)
We propose a new diffusionasymptotic analysis for sequentially randomized experiments. Rather than taking sample size $n$ to infinity while keeping the problem parameters fixed, we let the mean signal level scale to the order $1/\sqrt{n}$ so as to preserve the difficulty of the learning task as $n$ gets large. In this regime, we show that the behavior of a class of methods for sequential experimentation converges to a diffusion limit. This connection enables us to make sharp performance predictions and obtain new insights on the behavior of Thompson sampling. Our diffusion asymptotics also help resolve a discrepancy between the $\Theta(\log(n))$ regret predicted by the fixedparameter, largesample asymptotics on the one hand, and the $\Theta(\sqrt{n})$ regret from worstcase, finitesample analysis on the other, suggesting that it is an appropriate asymptotic regime for understanding practical largescale sequential experiments.
 [18] arXiv:2101.09875 [pdf, other]

Title: Eigenconvergence of Gaussian kernelized graph Laplacian by manifold heat interpolationSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
This work studies the spectral convergence of graph Laplacian to the LaplaceBeltrami operator when the graph affinity matrix is constructed from $N$ random samples on a $d$dimensional manifold embedded in a possibly high dimensional space. By analyzing Dirichlet form convergence and constructing candidate approximate eigenfunctions via convolution with manifold heat kernel, we prove that, with Gaussian kernel, one can set the kernel bandwidth parameter $\epsilon \sim (\log N/ N)^{1/(d/2+2)}$ such that the eigenvalue convergence rate is $N^{1/(d/2+2)}$ and the eigenvector convergence in 2norm has rate $N^{1/(d+4)}$; When $\epsilon \sim N^{1/(d/2+3)}$, both eigenvalue and eigenvector rates are $N^{1/(d/2+3)}$. These rates are up to a $\log N$ factor and proved for finitely many lowlying eigenvalues. The result holds for unnormalized and randomwalk graph Laplacians when data are uniformly sampled on the manifold, as well as the densitycorrected graph Laplacian (where the affinity matrix is normalized by the degree matrix from both sides) with nonuniformly sampled data. As an intermediate result, we prove new pointwise and Dirichlet form convergence rates for the densitycorrected graph Laplacian. Numerical results are provided to verify the theory.
 [19] arXiv:2101.09942 [pdf, other]

Title: An EnvironmentallyAdaptive Hawkes Process with An Application to COVID19Subjects: Applications (stat.AP)
We proposed a new generalized model based on the classical Hawkes process with environmental multipliers, which is called an environmentallyadaptive Hawkes (EAH) model. Compared to the classical selfexciting Hawkes process, the EAH model exhibits more flexibility in a macro environmentally temporal sense, and can model more complex processes by using dynamic branching matrix. We demonstrate the welldefinedness of this EAH model. A more specified version of this new model is applied to model COVID19 pandemic data through an efficient EMlike algorithm. Consequently, the proposed model consistently outperforms the classical Hawkes process.
 [20] arXiv:2101.10005 [pdf, ps, other]

Title: Low incidence rate of COVID19 undermines confidence in estimation of the vaccine efficacyAuthors: Yasin MemariSubjects: Methodology (stat.ME); Applications (stat.AP)
Knowing the true effect size of clinical interventions in randomised clinical trials is key to informing the public health policies. Vaccine efficacy is defined in terms of the ratio of two risks, however only approximate methods are available for the variance of the 'risk ratio'. In this article, we show using a probabilistic model that uncertainty in the efficacy rate could be underestimated when the disease risk is low. Factoring in the baseline rate of the disease we estimate broader confidence intervals for the efficacy rates of the vaccines recently developed for COVID19. We propose a new method for calculating the sample size in casecontrol studies where the efficacy is of interest. We further discuss the deleterious effects of classification bias which is particularly relevant at low disease prevalence.
 [21] arXiv:2101.10058 [pdf, other]

Title: The EM Perspective of Directional Mean Shift AlgorithmSubjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
The directional mean shift (DMS) algorithm is a nonparametric method for pursuing local modes of densities defined by kernel density estimators on the unit hypersphere. In this paper, we show that any DMS iteration can be viewed as a generalized ExpectationMaximization (EM) algorithm; in particular, when the von Mises kernel is applied, it becomes an exact EM algorithm. Under the (generalized) EM framework, we provide a new proof for the ascending property of density estimates and demonstrate the global convergence of directional mean shift sequences. Finally, we give a new insight into the linear convergence of the DMS algorithm.
 [22] arXiv:2101.10103 [pdf, other]

Title: sensobol: an R package to compute variancebased sensitivity indicesSubjects: Computation (stat.CO); Applications (stat.AP)
The R package "sensobol" provides several functions to conduct variancebased uncertainty and sensitivity analysis, from the estimation of sensitivity indices to the visual representation of the results. It implements several stateoftheart first and totalorder estimators and allows the computation of up to thirdorder effects, as well as of the approximation error, in a swift and userfriendly way. Its flexibility makes it also appropriate for models with either a scalar or a multivariate output. We illustrate its functionality by conducting a variancebased sensitivity analysis of three classic models: the Sobol' (1998) G function, the logistic population growth model of Verhulst (1845), and the spruce budworm and forest model of Ludwig, Jones and Holling (1976).
Crosslists for Tue, 26 Jan 21
 [23] arXiv:2101.09351 (crosslist from physics.aoph) [pdf]

Title: Hourly evolution of intraurban temperature variability across the local climate zones. The case of MadridComments: 7 figures, 8 tables, 1 appendixSubjects: Atmospheric and Oceanic Physics (physics.aoph); Data Analysis, Statistics and Probability (physics.dataan); Applications (stat.AP)
Field measurement campaigns have grown exponentially in recent years, stemming from the need for reliable data to validate urban climate models and obtain a better understanding of urban climate features. Also contributing to this growth is the Local Climate Zone (LCZ) scheme, firstly developed to enhance the accuracy in the contextualisation of urban measurements, and lately used for characterising urban areas. Due to its relative novelty, researchers are still investigating the potential of LCZs and its indicators for urban temperature variability detection. In this respect, the present study introduces the results of an extensive monitoring campaign carried out in the city of Madrid over a twoyear period (20162018). The aim of this work is to further examine the relationships between LCZs and air temperature differences, with emphasis on their hourly and seasonal evolution. A graphical and statistical analysis to identify temperature variability trends for each LCZ is performed. Results support the existing evidence suggesting a high level of effectiveness in capturing the heat island (UHI) profile of different urban areas, while underperforming when it comes to capturing diurnal temperature variability. The incorporation of indicators that explain the daytime temperature variation phenomenon into the LCZ scheme is therefore recommended, warranting further research.
 [24] arXiv:2101.09352 (crosslist from qbio.NC) [pdf, other]

Title: ConexConnect: Learning Patterns in Extremal Brain Connectivity From MultiChannel EEG DataSubjects: Neurons and Cognition (qbio.NC); Applications (stat.AP); Methodology (stat.ME)
Epilepsy is a chronic neurological disorder affecting more than 50 million people globally. An epileptic seizure acts like a temporary shock to the neuronal system, disrupting normal electrical activity in the brain. Epilepsy is frequently diagnosed with electroencephalograms (EEGs). Current methods study the timevarying spectra and coherence but do not directly model changes in extreme behavior. Thus, we propose a new approach to characterize brain connectivity based on the joint tail behavior of the EEGs. Our proposed method, the conditional extremal dependence for brain connectivity (ConexConnect), is a pioneering approach that links the association between extreme values of higher oscillations at a reference channel with the other brain network channels. Using the ConexConnect method, we discover changes in the extremal dependence driven by the activity at the foci of the epileptic seizure. Our modelbased approach reveals that, preseizure, the dependence is notably stable for all channels when conditioning on extreme values of the focal seizure area. Postseizure, by contrast, the dependence between channels is weaker, and dependence patterns are more "chaotic". Moreover, in terms of spectral decomposition, we find that high values of the highfrequency Gammaband are the most relevant features to explain the conditional extremal dependence of brain connectivity.
 [25] arXiv:2101.09382 (crosslist from math.OC) [pdf, other]

Title: A measure of the importance of roads based on topography and traffic intensityComments: 35 pages, 7 figuresSubjects: Optimization and Control (math.OC); Graphics (cs.GR); Applications (stat.AP)
Mathematical models of street traffic allowing assessment of the importance of their individual segments for the functionality of the street system is considering. Based on methods of cooperative games and the reliability theory the suitable measure is constructed. The main goal is to analyze methods for assessing the importance (rank) of road fragments, including their functions. A relevance of these elements for effective accessibility for the entire system will be considered.
 [26] arXiv:2101.09394 (crosslist from econ.EM) [pdf, other]

Title: Predicting Recession Probabilities Using Term Spreads: New Evidence from a Machine Learning ApproachSubjects: Econometrics (econ.EM); Machine Learning (stat.ML)
The literature on using yield curves to forecast recessions typically measures the term spread as the difference between the 10year and the threemonth Treasury rates. Furthermore, using the term spread constrains the long and shortterm interest rates to have the same absolute effect on the recession probability. In this study, we adopt a machine learning method to investigate whether the predictive ability of interest rates can be improved. The machine learning algorithm identifies the best maturity pair, separating the effects of interest rates from those of the term spread. Our comprehensive empirical exercise shows that, despite the likelihood gain, the machine learning approach does not significantly improve the predictive accuracy, owing to the estimation error. Our finding supports the conventional use of the 10yearthreemonth Treasury yield spread. This is robust to the forecasting horizon, control variable, sample period, and oversampling of the recession observations.
 [27] arXiv:2101.09395 (crosslist from qfin.ST) [pdf, other]

Title: Unraveling S&P500 stock volatility and networks  An encoding and decoding approachSubjects: Statistical Finance (qfin.ST); Applications (stat.AP)
We extend the Hierarchical Factor Segmentation(HFS) algorithm for discovering multiple volatility states process hidden within each individual S&P500 stock's return time series. Then we develop an associative measure to link stocks into directed networks of various scales of associations. Such networks shed lights on which stocks would likely stimulate or even promote, if not cause, volatility on other linked stocks. Our computing endeavors starting from encoding events of large return on the original time axis to transform the original return time series into a recurrencetime process on discretetimeaxis. By adopting BIC and clustering analysis, we identify potential multiple volatility states, and then apply the extended HFS algorithm on the recurrence time series to discover its underlying volatility state process. Our decoding approach is found favorably compared with Viterbi's in experiments involving both light and heavy tail distributions. After recovering the volatility state process back to the original timeaxis, we decode and represent stock dynamics of each stock. Our measurement of association is measured through overlapping concurrent volatility states upon a chosen window. Consequently, we establish datadriven associative networks for S&P500 stocks to discover their global dependency relational groupings with respect to various strengths of links.
 [28] arXiv:2101.09398 (crosslist from econ.EM) [pdf, other]

Title: A DesignBased Perspective on Synthetic Control MethodsSubjects: Econometrics (econ.EM); Methodology (stat.ME)
Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a designbased perspective, assuming a model for the selection of the treated unit(s), e.g., random selection as guaranteed in a randomized experiment. We show that SC methods offer benefits even in settings with randomized assignment, and that the design perspective offers new insights into SC methods for observational data. A first insight is that the standard SC estimator is not unbiased under random assignment. We propose a simple modification of the SC estimator that guarantees unbiasedness in this setting and derive its exact, randomizationbased, finite sample variance. We also propose an unbiased estimator for this variance. We show in settings with real data that under random assignment this Modified Unbiased Synthetic Control (MUSC) estimator can have a root meansquared error (RMSE) that is substantially lower than that of the differenceinmeans estimator. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected. The improvement is most likely to be substantial if the number of pretreatment periods is large relative to the number of control units.
 [29] arXiv:2101.09436 (crosslist from cs.LG) [pdf, other]

Title: Hierarchical Domain Invariant Variational AutoEncoding with weak domain supervisionSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We address the task of domain generalization, where the goal is to train a predictive model based on a number of domains such that it is able to generalize to a new, previously unseen domain. We choose a generative approach within the framework of variational autoencoders and propose a weakly supervised algorithm that is able to account for incomplete and hierarchical domain information. We show that our method is able to learn representations that disentangle domainspecific information from classlabel specific information even in complex settings where an unobserved substructure is present in domains. Our interpretable method outperforms previously proposed generative algorithms for domain generalization and achieves competitive performance compared to stateoftheart approaches, which are based on complex imageprocessing steps, on the standard domain generalization benchmark dataset PACS.
 [30] arXiv:2101.09438 (crosslist from cs.LG) [pdf, other]

Title: An Optimal Reduction of TVDenoising to Adaptive Online LearningSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider the problem of estimating a function from $n$ noisy samples whose discrete Total Variation (TV) is bounded by $C_n$. We reveal a deep connection to the seemingly disparate problem of Strongly Adaptive online learning (Daniely et.al, 2015) and provide an $O(n \log n)$ time algorithm that attains the near minimax optimal rate of $\tilde O (n^{1/3}C_n^{2/3})$ under squared error loss. The resulting algorithm runs online and optimally adapts to the unknown smoothness parameter $C_n$. This leads to a new and more versatile alternative to waveletsbased methods for (1) adaptively estimating TV bounded functions; (2) online forecasting of TV bounded trends in time series.
 [31] arXiv:2101.09446 (crosslist from cs.LG) [pdf, other]

Title: Unlabeled Principal Component AnalysisSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We consider the problem of principal component analysis from a data matrix where the entries of each column have undergone some unknown permutation, termed Unlabeled Principal Component Analysis (UPCA). Using algebraic geometry, we establish that for generic enough data, and up to a permutation of the coordinates of the ambient space, there is a unique subspace of minimal dimension that explains the data. We show that a permutationinvariant system of polynomial equations has finitely many solutions, with each solution corresponding to a row permutation of the groundtruth data matrix. Allowing for missing entries on top of permutations leads to the problem of unlabeled matrix completion, for which we give theoretical results of similar flavor. We also propose a twostage algorithmic pipeline for UPCA suitable for the practically relevant case where only a fraction of the data has been permuted. StageI of this pipeline employs robustPCA methods to estimate the groundtruth columnspace. Equipped with the columnspace, stageII applies methods for linear regression without correspondences to restore the permuted data. A computational study reveals encouraging findings, including the ability of UPCA to handle face images from the Extended YaleB database with arbitrarily permuted patches of arbitrary size in $0.3$ seconds on a standard desktop computer.
 [32] arXiv:2101.09460 (crosslist from cs.LG) [pdf, other]

Title: Feature Selection Using Reinforcement LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
With the decreasing cost of data collection, the space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially. Therefore, identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model. In addition, identifying such features is critical for interpretability, prediction accuracy and optimal computation cost. While statistical methods such as subset selection, shrinkage, dimensionality reduction have been applied in selecting the best set of features, some other approaches in literature have approached feature selection task as a search problem where each state in the search space is a possible feature subset. In this paper, we solved the feature selection problem using Reinforcement Learning. Formulating the state space as a Markov Decision Process (MDP), we used Temporal Difference (TD) algorithm to select the best subset of features. Each state was evaluated using a robust and low cost classifier algorithm which could handle any nonlinearities in the dataset.
 [33] arXiv:2101.09512 (crosslist from cs.LG) [pdf, other]

Title: Unsupervised clustering of series using dynamic programmingAuthors: Karthigan Sinnathamby, ChangYu Hou, Lalitha Venkataramanan, VasileiosMarios Gkortsas, François FleuretSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We are interested in clustering parts of a given single multivariate series in an unsupervised manner. We would like to segment and cluster the series such that the resulting blocks present in each cluster are coherent with respect to a known model (e.g. physics model). Data points are said to be coherent if they can be described using this model with the same parameters. We have designed an algorithm based on dynamic programming with constraints on the number of clusters, the number of transitions as well as the minimal size of a block such that the clusters are coherent with this process. We present an usecase: clustering of petrophysical series using the WaxmanSmits equation.
 [34] arXiv:2101.09577 (crosslist from cs.LG) [pdf, other]

Title: ReliefE: Feature Ranking in Highdimensional Spaces via Manifold EmbeddingsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Feature ranking has been widely adopted in machine learning applications such as highthroughput biology and social sciences. The approaches of the popular Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances. Despite their high utility, these algorithms can be computationally expensive and notwell suited for highdimensional sparse input spaces. In contrast, recent embeddingbased methods learn compact, lowdimensional representations, potentially facilitating downstream learning capabilities of conventional learners. This paper explores how the Relief branch of algorithms can be adapted to benefit from (Riemannian) manifoldbased embeddings of instance and target spaces, where a given embedding's dimensionality is intrinsic to the dimensionality of the considered data set. The developed ReliefE algorithm is faster and can result in better feature rankings, as shown by our evaluation on 20 reallife data sets for multiclass and multilabel classification tasks. The utility of ReliefE for highdimensional data sets is ensured by its implementation that utilizes sparse matrix algebraic operations. Finally, the relation of ReliefE to other ranking algorithms is studied via the Fuzzy Jaccard Index.
 [35] arXiv:2101.09611 (crosslist from cs.SI) [pdf, other]

Title: Hypergraph clustering: from blockmodels to modularityComments: 25 pages + 5 pages of supplementary information, 3 tables, 4 figuresSubjects: Social and Information Networks (cs.SI); Discrete Mathematics (cs.DM); Data Analysis, Statistics and Probability (physics.dataan); Physics and Society (physics.socph); Machine Learning (stat.ML)
Hypergraphs are a natural modeling paradigm for a wide range of complex relational systems with multibody interactions. A standard analysis task is to identify clusters of closely related or densely interconnected nodes. While many probabilistic generative models for graph clustering have been proposed, there are relatively few such models for hypergraphs. We propose a Poisson degreecorrected hypergraph stochastic blockmodel (DCHSBM), an expressive generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Maximumlikelihood inference in the DCHSBM naturally leads to a clustering objective that generalizes the popular modularity objective for graphs. We derive a general Louvaintype algorithm for this objective, as well as a a faster, specialized "AllOrNothing" (AON) variant in which edges are expected to lie fully within clusters. This special case encompasses a recent proposal for modularity in hypergraphs, while also incorporating flexible resolution and edgesize parameters. We show that hypergraph Louvain is highly scalable, including as an example an experiment on a synthetic hypergraph of one million nodes. We also demonstrate through synthetic experiments that the detectability regimes for hypergraph community detection differ from methods based on dyadic graph projections. In particular, there are regimes in which hypergraph methods can recover planted partitions even though graph based methods necessarily fail due to informationtheoretic limits. We use our model to analyze different patterns of higherorder structure in school contact networks, U.S. congressional bill cosponsorship, U.S. congressional committees, product categories in copurchasing behavior, and hotel locations from web browsing sessions, that it is able to recover ground truth clusters in empirical data sets exhibiting the corresponding higherorder structure.
 [36] arXiv:2101.09612 (crosslist from cs.LG) [pdf, ps, other]

Title: On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear WidthsAuthors: Quynh NguyenSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper studies the global convergence of gradient descent for deep ReLU networks under the square loss. For this setting, the current stateoftheart results show that gradient descent converges to a global optimum if the widths of all the hidden layers scale at least as $\Omega(N^8)$ ($N$ being the number of training samples). In this paper, we discuss a simple proof framework which allows us to improve the existing overparameterization condition to linear, quadratic and cubic widths (depending on the type of initialization scheme and/or the depth of the network).
 [37] arXiv:2101.09645 (crosslist from cs.LG) [pdf, other]

Title: MultiTask Time Series Forecasting With Shared AttentionComments: Accepted by ICDMW 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Time series forecasting is a key component in many industrial and business decision processes and recurrent neural network (RNN) based models have achieved impressive progress on various time series forecasting tasks. However, most of the existing methods focus on singletask forecasting problems by learning separately based on limited supervised objectives, which often suffer from insufficient training instances. As the Transformer architecture and other attentionbased models have demonstrated its great capability of capturing long term dependency, we propose two selfattention based sharing schemes for multitask time series forecasting which can train jointly across multiple tasks. We augment a sequence of paralleled Transformer encoders with an external public multihead attention function, which is updated by all data of all tasks. Experiments on a number of realworld multitask time series forecasting tasks show that our proposed architectures can not only outperform the stateoftheart singletask forecasting baselines but also outperform the RNNbased multitask forecasting method.
 [38] arXiv:2101.09682 (crosslist from qfin.PR) [pdf, ps, other]

Title: Solving optimal stopping problems with Deep QLearningComments: 17 pagesSubjects: Pricing of Securities (qfin.PR); Machine Learning (stat.ML)
We propose a reinforcement learning (RL) approach to model optimal exercise strategies for optiontype products. We pursue the RL avenue in order to learn the optimal actionvalue function of the underlying stopping problem. In addition to retrieving the optimal Qfunction at any time step, one can also price the contract at inception. We first discuss the standard setting with one exercise right, and later extend this framework to the case of multiple stopping opportunities in the presence of constraints. We propose to approximate the Qfunction with a deep neural network, which does not require the specification of basis functions as in the leastsquares Monte Carlo framework and is scalable to higher dimensions. We derive a lower bound on the option price obtained from the trained neural network and an upper bound from the dual formulation of the stopping problem, which can also be expressed in terms of the Qfunction. Our methodology is illustrated with examples covering the pricing of swing options.
 [39] arXiv:2101.09689 (crosslist from cs.IT) [pdf, ps, other]

Title: A Linear Reduction Method for Local Differential Privacy and LogliftSubjects: Information Theory (cs.IT); Applications (stat.AP)
This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and loglift at the same time. It is revealed that both LDP and loglift are inversely proportional to the statistical distance between conditional probability $P_{YS}(xs)$ and marginal probability $P_{Y}(x)$: the closer the two probabilities are, the more private $Y$ is. Specifying $P_{YS}(xs)$ that linearly reduces this distance $P_{YS}(xs)  P_Y(x) = (1\alpha)P_{XS}(xs)  P_X(x),\forall s,x$ for some $\alpha \in (0,1]$, we study the problem of how to generate $Y$ from the original data $S$ and $X$. The Markov randomization/sanitization scheme $P_{YX}(xx') = P_{YS,X}(xs,x')$ is obtained by solving linear equations. The optimal nonMarkov sanitization, the transition probability $P_{YS,X}(xs,x')$ that depends on $S$, can be determined by maximizing the data utility subject to linear equality constraints. We compute the solution for two linear utility function: the expected distance and total variance distance. It is shown that the nonMarkov randomization significantly improves data utility and the marginal probability $P_X(x)$ remains the same after the linear sanitization method: $P_Y(x) = P_X(x), \forall x \in \mathcal{X}$.
 [40] arXiv:2101.09763 (crosslist from cs.LG) [pdf, other]

Title: Analysing the Noise Model Error for Realistic Noisy Label DataComments: Accepted at AAAI 2021, additional material at this https URLSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors. A popular technique to overcome the negative effects of these noisy labels is noise modelling where the underlying noise process is modelled. In this work, we study the quality of these estimated noise models from the theoretical side by deriving the expected error of the noise model. Apart from evaluating the theoretical results on commonly used synthetic noise, we also publish NoisyNER, a new noisy label dataset from the NLP domain that was obtained through a realistic distant supervision technique. It provides seven sets of labels with differing noise patterns to evaluate different noise levels on the same instances. Parallel, clean labels are available making it possible to study scenarios where a small amount of goldstandard data can be leveraged. Our theoretical results and the corresponding experiments give insights into the factors that influence the noise model estimation like the noise distribution and the sampling technique.
 [41] arXiv:2101.09844 (crosslist from physics.dataan) [pdf, other]

Title: Pattern Ensembling for Spatial Trajectory ReconstructionComments: 11 pages, 5 figuresSubjects: Data Analysis, Statistics and Probability (physics.dataan); Machine Learning (cs.LG); Machine Learning (stat.ML)
Digital sensing provides an unprecedented opportunity to assess and understand mobility. However, incompleteness, missing information, possible inaccuracies, and temporal heterogeneity in the geolocation data can undermine its applicability. As mobility patterns are often repeated, we propose a method to use similar trajectory patterns from the local vicinity and probabilistically ensemble them to robustly reconstruct missing or unreliable observations. We evaluate the proposed approach in comparison with traditional functional trajectory interpolation using a case of sea vessel trajectory data provided by The Automatic Identification System (AIS). By effectively leveraging the similarities in realworld trajectories, our pattern ensembling method helps to reconstruct missing trajectory segments of extended length and complex geometry. It can be used for locating mobile objects when temporary unobserved as well as for creating an evenly sampled trajectory interpolation useful for further trajectory mining.
 [42] arXiv:2101.09957 (crosslist from cs.LG) [pdf, other]

Title: Activation Functions in Artificial Neural Networks: A Systematic OverviewAuthors: Johannes LedererSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Activation functions shape the outputs of artificial neurons and, therefore, are integral parts of neural networks in general and deep learning in particular. Some activation functions, such as logistic and relu, have been used for many decades. But with deep learning becoming a mainstream research topic, new activation functions have mushroomed, leading to confusion in both theory and practice. This paper provides an analytic yet uptodate overview of popular activation functions and their properties, which makes it a timely resource for anyone who studies or applies neural networks.
 [43] arXiv:2101.09960 (crosslist from econ.GN) [pdf, other]

Title: Political Regime and COVID 19 death rate: efficient, biasing or simply different autocracies ?Subjects: General Economics (econ.GN); Applications (stat.AP)
The difference in COVID 19 death rates across political regimes has caught a lot of attention. The "efficient autocracy" view suggests that autocracies may be more efficient at putting in place policies that contain COVID 19 spread. On the other hand, the "biasing autocracy" view underlines that autocracies may be under reporting their COVID 19 data. We use fixed effect panel regression methods to discriminate between the two sides of the debate. Our results show that a third view may in fact be prevailing: once predetermined characteristics of countries are accounted for, COVID 19 death rates equalize across political regimes. The difference in death rate across political regime seems therefore to be primarily due to omitted variable bias.
 [44] arXiv:2101.09973 (crosslist from cs.LG) [pdf, ps, other]

Title: Approximating Probability Distributions by ReLU NetworksComments: Longer version of a paper accepted for presentation at the ITW 2020Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
How many neurons are needed to approximate a target probability distribution using a neural network with a given input distribution and approximation error? This paper examines this question for the case when the input distribution is uniform, and the target distribution belongs to the class of histogram distributions. We obtain a new upper bound on the number of required neurons, which is strictly better than previously existing upper bounds. The key ingredient in this improvement is an efficient construction of the neural nets representing piecewise linear functions. We also obtain a lower bound on the minimum number of neurons needed to approximate the histogram distributions.
 [45] arXiv:2101.10037 (crosslist from cs.LG) [pdf, other]

Title: Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time SeriesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the computational complexity and convergence. Thus, this work focuses on the computational less expensive Online Gradient Descent optimization method, which became popular for learning of neural networks in recent years. For the iterative training of such models, we propose a new approach combining different Online Gradient Descent learners (such as Adam, AMSGrad, Adagrad, Nesterov) to achieve fast convergence. The evaluation on synthetic data and experimental datasets show that the proposed approach outperforms the existing methods resulting in an overall lower prediction error.
 [46] arXiv:2101.10102 (crosslist from cs.LG) [pdf, ps, other]

Title: Probabilistic Robustness Analysis for DNNs based on PAC LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
This paper proposes a black box based approach for analysing deep neural networks (DNNs). We view a DNN as a function $\boldsymbol{f}$ from inputs to outputs, and consider the local robustness property for a given input. Based on scenario optimization technique in robust control design, we learn the score difference function $f_if_\ell$ with respect to the target label $\ell$ and attacking label $i$. We use a linear template over the input pixels, and learn the corresponding coefficients of the score difference function, based on a reduction to a linear programming (LP) problems. To make it scalable, we propose optimizations including components based learning and focused learning. The learned function offers a probably approximately correct (PAC) guarantee for the robustness property. Since the score difference function is an approximation of the local behaviour of the DNN, it can be used to generate potential adversarial examples, and the original network can be used to check whether they are spurious or not. Finally, we focus on the input pixels with large absolute coefficients, and use them to explain the attacking scenario. We have implemented our approach in a prototypical tool DeepPAC. Our experimental results show that our framework can handle very large neural networks like ResNet152 with $6.5$M neurons, and often generates adversarial examples which are very close to the decision boundary.
 [47] arXiv:2101.10123 (crosslist from cs.LG) [pdf, other]

Title: Conditional Generative Models for Counterfactual ExplanationsComments: 12 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Counterfactual instances offer humaninterpretable insight into the local behaviour of machine learning models. We propose a general framework to generate sparse, indistribution counterfactual model explanations which match a desired target prediction with a conditional generative model, allowing batches of counterfactual instances to be generated with a single forward pass. The method is flexible with respect to the type of generative model used as well as the task of the underlying predictive model. This allows straightforward application of the framework to different modalities such as images, time series or tabular data as well as generative model paradigms such as GANs or autoencoders and predictive tasks like classification or regression. We illustrate the effectiveness of our method on image (CelebA), time series (ECG) and mixedtype tabular (Adult Census) data.
 [48] arXiv:2101.10160 (crosslist from cs.LG) [pdf, other]

Title: Measuring Dependence with Matrixbased Entropy FunctionalComments: Accepted at AAAI21. An interpretable and differentiable dependence (or independence) measure that can be used to 1) train deep network under covariate shift and nonGaussian noise; 2) implement a deep deterministic information bottleneck; and 3) understand the dynamics of learning of CNN. Code available at this https URLSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
Measuring the dependence of data plays a central role in statistics and machine learning. In this work, we summarize and generalize the main idea of existing informationtheoretic dependence measures into a higherlevel perspective by the Shearer's inequality. Based on our generalization, we then propose two measures, namely the matrixbased normalized total correlation ($T_\alpha^*$) and the matrixbased normalized dual total correlation ($D_\alpha^*$), to quantify the dependence of multiple variables in arbitrary dimensional space, without explicit estimation of the underlying data distributions. We show that our measures are differentiable and statistically more powerful than prevalent ones. We also show the impact of our measures in four different machine learning problems, namely the gene regulatory network inference, the robust machine learning under covariate shift and nonGaussian noises, the subspace outlier detection, and the understanding of the learning dynamics of convolutional neural networks (CNNs), to demonstrate their utilities, advantages, as well as implications to those problems. Code of our dependence measure is available at: https://bit.ly/AAAIdependence
 [49] arXiv:2101.10189 (crosslist from math.OC) [pdf, other]

Title: Surrogate Models for Optimization of Dynamical SystemsSubjects: Optimization and Control (math.OC); Machine Learning (stat.ML)
Driven by increased complexity of dynamical systems, the solution of system of differential equations through numerical simulation in optimization problems has become computationally expensive. This paper provides a smart data driven mechanism to construct low dimensional surrogate models. These surrogate models reduce the computational time for solution of the complex optimization problems by using training instances derived from the evaluations of the true objective functions. The surrogate models are constructed using combination of proper orthogonal decomposition and radial basis functions and provides system responses by simple matrix multiplication. Using relative maximum absolute error as the measure of accuracy of approximation, it is shown surrogate models with latin hypercube sampling and spline radial basis functions dominate variable order methods in computational time of optimization, while preserving the accuracy. These surrogate models also show robustness in presence of model nonlinearities. Therefore, these computational efficient predictive surrogate models are applicable in various fields, specifically to solve inverse problems and optimal control problems, some examples of which are demonstrated in this paper.
 [50] arXiv:2101.10229 (crosslist from cs.LG) [pdf, other]

Title: Universal Approximation Properties for ODENet and ResNetSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Classical Analysis and ODEs (math.CA); Numerical Analysis (math.NA); Machine Learning (stat.ML)
We prove a universal approximation property (UAP) for a class of ODENet and a class of ResNet, which are used in many deep learning algorithms. The UAP can be stated as follows. Let $n$ and $m$ be the dimension of input and output data, and assume $m\leq n$. Then we show that ODENet width $n+m$ with any nonpolynomial continuous activation function can approximate any continuous function on a compact subset on $\mathbb{R}^n$. We also show that ResNet has the same property as the depth tends to infinity. Furthermore, we derive explicitly the gradient of a loss function with respect to a certain tuning variable. We use this to construct a learning algorithm for ODENet. To demonstrate the usefulness of this algorithm, we apply it to a regression problem, a binary classification, and a multinomial classification in MNIST.
 [51] arXiv:2101.10255 (crosslist from econ.EM) [pdf, ps, other]

Title: Consistent specification testing under spatial dependenceSubjects: Econometrics (econ.EM); Statistics Theory (math.ST)
We propose a seriesbased nonparametric specification test for a regression function when data are spatially dependent, the `space' being of a general economic or social nature. Dependence can be parametric, parametric with increasing dimension, semiparametric or any combination thereof, thus covering a vast variety of settings. These include spatial error models of varying types and levels of complexity. Under a new smooth spatial dependence condition, our test statistic is asymptotically standard normal. To prove the latter property, we establish a central limit theorem for quadratic forms in linear processes in an increasing dimension setting. Finite sample performance is investigated in a simulation study and empirical examples illustrate the test with realworld data.
 [52] arXiv:2101.10261 (crosslist from cs.LG) [pdf, other]

Title: Discrete Choice Analysis with Machine Learning CapabilitiesSubjects: Machine Learning (cs.LG); Econometrics (econ.EM); Methodology (stat.ME)
This paper discusses capabilities that are essential to models applied in policy analysis settings and the limitations of direct applications of offtheshelf machine learning methodologies to such settings. Traditional econometric methodologies for building discrete choice models for policy analysis involve combining data with modeling assumptions guided by subjectmatter considerations. Such considerations are typically most useful in specifying the systematic component of random utility discrete choice models but are typically of limited aid in determining the form of the random component. We identify an area where machine learning paradigms can be leveraged, namely in specifying and systematically selecting the best specification of the random component of the utility equations. We review two recent novel applications where mixedinteger optimization and crossvalidation are used to algorithmically select optimal specifications for the random utility components of nested logit and logit mixture models subject to interpretability constraints.
 [53] arXiv:2101.10266 (crosslist from cs.LG) [pdf, other]

Title: COVID19 Outbreak Prediction and Analysis using Self Reported SymptomsAuthors: Rohan Sukumaran, Parth Patwa, T V Sethuraman, Sheshank Shankar, Rishank Kanaparti, Joseph Bae, Yash Mathur, Abhishek Singh, Ayush Chopra, Myungsun Kang, Priya Ramaswamy, Ramesh RaskarComments: 14 pages, 16 FiguresSubjects: Machine Learning (cs.LG); Applications (stat.AP)
The COVID19 pandemic has challenged scientists and policymakers internationally to develop novel approaches to public health policy. Furthermore, it has also been observed that the prevalence and spread of COVID19 vary across different spatial, temporal, and demographics. Despite ramping up testing, we still are not at the required level in most parts of the globe. Therefore, we utilize selfreported symptoms survey data to understand trends in the spread of COVID19. The aim of this study is to segment populations that are highly susceptible. In order to understand such populations, we perform exploratory data analysis, outbreak prediction, and timeseries forecasting using public health and policy datasets. From our studies, we try to predict the likely % of the population that tested positive for COVID19 based on selfreported symptoms. Our findings reaffirm the predictive value of symptoms, such as anosmia and ageusia. And we forecast that % of the population having COVID19like illness (CLI) and those tested positive as 0.15% and 1.14% absolute error respectively. These findings could help aid faster development of the public health policy, particularly in areas with low levels of testing and having a greater reliance on selfreported symptoms. Our analysis sheds light on identifying clinical attributes of interest across different demographics. We also provide insights into the effects of various policy enactments on COVID19 prevalence.
Replacements for Tue, 26 Jan 21
 [54] arXiv:1709.08238 (replaced) [pdf, other]

Title: Counterparty Credit Limits: The Impact of a RiskMitigation Measure on Everyday TradingSubjects: Trading and Market Microstructure (qfin.TR); Econometrics (econ.EM); Probability (math.PR); Applications (stat.AP); Computation (stat.CO)
 [55] arXiv:1809.02383 (replaced) [pdf, other]

Title: Groupbased Learning of Disentangled Representations with Generalizability for Novel ContentsAuthors: Haruo HosoyaJournalref: published in IJCAI 2019Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [56] arXiv:1902.00097 (replaced) [pdf, other]

Title: Ensembling methods for countrywide short term forecasting of gas demandJournalref: Int. J. Oil, Gas and Coal Technology, Vol. 26, No. 2, pp.184201 (2021)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [57] arXiv:1902.01289 (replaced) [pdf, other]

Title: Diagnostics for Stochastic Gaussian Process EmulatorsSubjects: Methodology (stat.ME)
 [58] arXiv:1905.01435 (replaced) [pdf, ps, other]

Title: Tight Regret Bounds for Infinitearmed Linear Contextual BanditsComments: 10 pages, AISTATS 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [59] arXiv:1905.03337 (replaced) [pdf, other]

Title: Optimal Rerandomization via a Criterion that Provides Insurance Against Failed ExperimentsComments: 27 pages, 5 figures, 2 tables, 2 algorithmsSubjects: Methodology (stat.ME)
 [60] arXiv:1905.09780 (replaced) [pdf, other]

Title: Bayesian Optimization with Approximate Set KernelsComments: 18 pages, 7 figures, 5 tables, accepted for publication in Machine Learning JournalSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [61] arXiv:1906.02314 (replaced) [pdf, ps, other]

Title: A Tunable Loss Function for Robust Classification: Calibration, Landscape, and GeneralizationAuthors: Tyler Sypherd, Mario Diaz, John Kevin Cava, Gautam Dasarathy, Peter Kairouz, Lalitha SankarComments: Submitted to TIT. Many new theoretical and experimental resultsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [62] arXiv:1906.06463 (replaced) [pdf, other]

Title: Linear Aggregation in Treebased EstimatorsSubjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
 [63] arXiv:1906.10470 (replaced) [pdf, other]

Title: An Unsupervised Bayesian Neural Network for Truth Discovery in Social NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [64] arXiv:1907.02109 (replaced) [pdf, other]

Title: A unified approach to mixedinteger optimization problems with logical constraintsComments: Revised version (including title change). The old title was "A unified approach to mixedinteger optimization: Nonlinear formulations and scalable algorithms"Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [65] arXiv:1907.02579 (replaced) [pdf, other]

Title: Particularities and commonalities of singular spectrum analysis as a method of time series analysis and signal processingAuthors: Nina GolyandinaJournalref: WIREs Computational Statistics, 2020, Vol.12, No 4., e1487, 39ppSubjects: Methodology (stat.ME)
 [66] arXiv:1907.03025 (replaced) [pdf, other]

Title: Improving Lasso for model selection and predictionSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
 [67] arXiv:1907.05911 (replaced) [pdf, other]

Title: Vector Quantized Bayesian Neural Network Inference for Data StreamsComments: AAAI 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [68] arXiv:1908.05365 (replaced) [pdf, other]

Title: EndtoEnd Learning from Complex Multigraphs with LatentGraph Convolutional NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
 [69] arXiv:1908.08783 (replaced) [pdf, other]

Title: Learning Fitness Functions for Machine ProgrammingAuthors: Shantanu Mandal, Todd A. Anderson, Javier S. Turek, Justin Gottschlich, Shengtian Zhou, Abdullah MuzahidSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [70] arXiv:1909.11651 (replaced) [pdf, other]

Title: Matching Embeddings for Domain AdaptationAuthors: Manuel PérezCarrasco, Guillermo CabreraVives, Pavlos Protopapas, Nicolás Astorga, Marouan BelhajComments: 12 pages, 3 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [71] arXiv:1909.11926 (replaced) [pdf, other]

Title: Hierarchical Neural Architecture Search via Operator ClusteringSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [72] arXiv:1910.02386 (replaced) [src]

Title: A New Graphical Device and Related Tests for the Shape of Nonparametric Regression FunctionComments: There were errors in mathematical proofs of Theorem 1 and related lemmas. Major revisions were neededSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [73] arXiv:1910.07643 (replaced) [pdf, other]

Title: Dynamic Graph Convolutional Networks Using the Tensor MProductComments: Accepted to SIAM International Conference on Data Mining (SDM) 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [74] arXiv:1910.13603 (replaced) [pdf, other]

Title: When MAML Can Adapt Fast and How to Assist When It CannotComments: Accepted at AISTATS 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [75] arXiv:1911.00677 (replaced) [pdf, other]

Title: Fairness Violations and Mitigation under Covariate ShiftComments: 11 pages main and 7 pages supplementary, To appear at ACM FAccT '21, Previous arXiv version arXiv:1911.00677v1 was presented at Workshop on Fair ML for Health '19Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
 [76] arXiv:1911.01702 (replaced) [pdf, other]

Title: DocParser: Hierarchical Structure Parsing of Document RenderingsComments: AAAI 2021Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [77] arXiv:1911.02160 (replaced) [pdf, other]

Title: Shrinkage with shrunken shoulders: inference via geometrically / uniformly ergodic Gibbs samplerComments: 23 pages, 8 figures, (18 pages, 3 figures of Supplement). Code available from this https URLSubjects: Methodology (stat.ME); Statistics Theory (math.ST)
 [78] arXiv:1911.09049 (replaced) [pdf, other]

Title: Sharp hypotheses and bispatial inferenceAuthors: Russell J. BowaterComments: Corrected, rewritten and extended. *Final version*Subjects: Other Statistics (stat.OT); Methodology (stat.ME)
 [79] arXiv:1912.00682 (replaced) [pdf, other]

Title: GeoTrackNetA Maritime Anomaly Detector using Probabilistic Neural Network Representation of AIS Tracks and A Contrario DetectionComments: IEEE Transactions on Intelligent Transportation SystemsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [80] arXiv:2001.02522 (replaced) [pdf]

Title: On Interpretability of Artificial Neural Networks: A SurveySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [81] arXiv:2001.07859 (replaced) [pdf, other]

Title: A Deep Learning Algorithm for HighDimensional Exploratory Item Factor AnalysisComments: 30 pages; 12 figures; accepted for publication in PsychometrikaSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [82] arXiv:2002.05474 (replaced) [pdf, ps, other]

Title: MetricFree Individual Fairness in Online LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [83] arXiv:2002.06247 (replaced) [pdf, other]

Title: Robust Policies For Proactive ICU TransfersSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [84] arXiv:2002.09402 (replaced) [pdf, other]

Title: Addressing Some Limitations of Transformers with Feedback MemorySubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [85] arXiv:2002.09843 (replaced) [pdf, other]

Title: Computationefficient Deep Model Training for Ciphertextbased Crosssilo Federated LearningSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [86] arXiv:2002.11809 (replaced) [pdf, ps, other]

Title: Assortativity and bidegree distributions on Bernoulli random graph superpositionsComments: 26 pagesSubjects: Probability (math.PR); Social and Information Networks (cs.SI); Statistics Theory (math.ST)
 [87] arXiv:2003.00401 (replaced) [pdf, other]

Title: A flexible Bayesian framework to estimate age and causespecific child mortality over time from sample registration dataAuthors: Austin E Schumacher, Tyler H McCormick, Jon Wakefield, Yue Chu, Jamie Perin, Francisco Villavicencio, Noah Simon, Li LiuComments: 16 pages, 4 figures, submitted to The Annals of Applied StatisticsSubjects: Applications (stat.AP)
 [88] arXiv:2004.04668 (replaced) [pdf, other]

Title: TestTime Adaptable Neural Networks for Robust Medical Image SegmentationComments: Published in Medical Image Analysis journal: this https URLJournalref: Medical Image Analysis, Volume 68, 2021, 101907, ISSN 13618415. http://www.sciencedirect.com/science/article/pii/S1361841520302711Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [89] arXiv:2004.06496 (replaced) [pdf, other]

Title: Certifiable Robustness to Adversarial State Uncertainty in Deep Reinforcement LearningComments: arXiv admin note: text overlap with arXiv:1910.12908Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [90] arXiv:2004.08492 (replaced) [pdf, other]

Title: Orbit: Probabilistic Forecast with Exponential SmoothingComments: arXiv admin note: text overlap with arXiv:1909.13316 by other authorsSubjects: Computation (stat.CO); Methodology (stat.ME)
 [91] arXiv:2004.11497 (replaced) [pdf, other]

Title: Causal Modeling with Stochastic ConfoundersComments: AISTATS 2021Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [92] arXiv:2005.07606 (replaced) [src]

Title: Initializing Perturbations in Multiple Directions for Fast Adversarial TrainingComments: has no contributionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [93] arXiv:2006.00266 (replaced) [pdf, other]

Title: Functional additive models for optimizing individualized treatment rulesComments: 24 pages, 7 figures, 2 tablesSubjects: Methodology (stat.ME)
 [94] arXiv:2006.01862 (replaced) [pdf, other]

Title: Consistent Estimators for Learning to Defer to an ExpertComments: ICML 2020Subjects: Machine Learning (cs.LG); HumanComputer Interaction (cs.HC); Machine Learning (stat.ML)
 [95] arXiv:2006.07185 (replaced) [pdf, other]

Title: Grounding Language to AutonomouslyAcquired Skills via Goal GenerationComments: Published at ICLR 2021Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [96] arXiv:2006.09275 (replaced) [pdf, other]

Title: Hierarchical, rotationequivariant neural networks to select structural models of protein complexesAuthors: Stephan Eismann, Raphael J.L. Townshend, Nathaniel Thomas, Milind Jagota, Bowen Jing, Ron O. DrorComments: 11 pages, 5 figures + SI: Updated based on the published version in PROTEINS. Presented at NeurIPS 2019 workshop Learning Meaningful Representations of LifeSubjects: Biomolecules (qbio.BM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [97] arXiv:2006.09985 (replaced) [pdf, other]

Title: An Efficient Spiking Neural Network for Recognizing Gestures with a DVS Camera on the Loihi Neuromorphic ProcessorComments: Accepted for publication at the 2020 International Joint Conference on Neural Networks (IJCNN)Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [98] arXiv:2006.12655 (replaced) [pdf, other]

Title: Perceptual Adversarial Robustness: Defense Against Unseen Threat ModelsComments: Published in ICLR 2021. Code and data are available at this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [99] arXiv:2006.14448 (replaced) [pdf, other]

Title: Learning TaskGeneral Representations with Generative NeuroSymbolic ModelingJournalref: International Conference on Learning Representations (ICLR 2021)Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [100] arXiv:2007.00140 (replaced) [pdf, other]

Title: REMIMO: Recurrent and Permutation Equivariant Neural MIMO DetectionComments: copyright 2020 IEEE TSP. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksSubjects: Signal Processing (eess.SP); Machine Learning (stat.ML)
 [101] arXiv:2007.01659 (replaced) [pdf, other]

Title: Diagnostic Uncertainty Calibration: Towards Reliable Machine Predictions in Medical DomainComments: 31 pages, 6 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [102] arXiv:2007.03746 (replaced) [pdf, ps, other]

Title: Transfer Learning for Motor Imagery Based BrainComputer Interfaces: A Complete PipelineSubjects: Signal Processing (eess.SP); HumanComputer Interaction (cs.HC); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [103] arXiv:2007.07617 (replaced) [pdf, other]

Title: SpaceNet: Make Free Space For Continual LearningComments: Accepted in Neurocomputing JournalSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [104] arXiv:2007.10455 (replaced) [pdf, other]

Title: The multilayer random dot product graphComments: 45 pages, 15 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [105] arXiv:2007.10976 (replaced) [pdf, ps, other]

Title: Interactive Inference under Information ConstraintsComments: Adding a section on information losses; improving presentationSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [106] arXiv:2007.11202 (replaced) [pdf, other]

Title: MathNet: HaarLike Wavelet MultiresolutionAnalysis for Graph Representation and LearningComments: 32 pages, 6 figures, 6 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [107] arXiv:2007.11573 (replaced) [pdf, other]

Title: Autonomous Tracking and State Estimation with Generalised Group LassoComments: 14pags, 10 figuresSubjects: Methodology (stat.ME); Optimization and Control (math.OC)
 [108] arXiv:2008.01304 (replaced) [pdf, ps, other]

Title: The Exact Asymptotic Form of Bayesian Generalization Error in Latent Dirichlet AllocationAuthors: Naoki HayashiComments: 20 pages, 3 figures, 2 tables. Accepted at Neural Networks (Elsevier)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [109] arXiv:2008.02545 (replaced) [pdf, other]

Title: Deep neural networks adapt to intrinsic dimensionality beyond the target domainSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [110] arXiv:2008.06529 (replaced) [pdf, other]

Title: Three Variants of Differential Privacy: Lossless Conversion and ApplicationsComments: To appear in IEEE Journal on Selected Areas in Information Theory, Special Issue on Privacy and Security of Information Systems. arXiv admin note: text overlap with arXiv:2001.05990Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [111] arXiv:2008.11435 (replaced) [pdf, other]

Title: Comparison of classical and Bayesian imaging in radio interferometryAuthors: Philipp Arras, Hertzog L. Bester, Richard A. Perley, Reimar Leike, Oleg Smirnov, Rüdiger Westermann, Torsten A. EnßlinComments: 23 pages, 16 figures, 4 tables, data published at this https URLSubjects: Instrumentation and Methods for Astrophysics (astroph.IM); Applications (stat.AP)
 [112] arXiv:2008.11809 (replaced) [pdf, ps, other]

Title: Unlabeled Data Help in GraphBased SemiSupervised Learning: A Bayesian Nonparametrics PerspectiveSubjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
 [113] arXiv:2009.00437 (replaced) [pdf, other]

Title: NATSBench: Benchmarking NAS Algorithms for Architecture Topology and SizeComments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021. arXiv admin note: substantial text overlap with arXiv:2001.00326Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [114] arXiv:2009.08886 (replaced) [pdf, other]

Title: BNASv2: Memoryefficient and Performancecollapseprevented Broad Neural Architecture SearchComments: 12 pages, 11 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [115] arXiv:2009.10622 (replaced) [pdf, ps, other]

Title: An $l_1$oracle inequality for the Lasso in mixtureofexperts regression modelsComments: Corrected typos. Added new Section 4. Discussion and comparisonsSubjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
 [116] arXiv:2010.04592 (replaced) [pdf, other]

Title: Contrastive Learning with Hard Negative SamplesComments: Published as a conference paper at ICLR 2021Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [117] arXiv:2010.05273 (replaced) [pdf, other]

Title: Federated Learning via Posterior Averaging: A New Perspective and Practical AlgorithmsComments: ICLR 2021. Code: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [118] arXiv:2010.11385 (replaced) [pdf, other]

Title: Dirichlet Process Mixture Models with Shrinkage PriorSubjects: Methodology (stat.ME); Applications (stat.AP)
 [119] arXiv:2011.00417 (replaced) [pdf, other]
 [120] arXiv:2011.03140 (replaced) [pdf, other]

Title: Prediction of Future Failures for Heterogeneous Reliability Field DataSubjects: Methodology (stat.ME); Applications (stat.AP)
 [121] arXiv:2011.03789 (replaced) [pdf, ps, other]

Title: Estimation of smooth functionals in highdimensional models: bootstrap chains and Gaussian approximationAuthors: Vladimir KoltchinskiiSubjects: Statistics Theory (math.ST)
 [122] arXiv:2011.07466 (replaced) [pdf, other]

Title: CcGAN: Continuous Conditional Generative Adversarial Networks for Image GenerationJournalref: International Conference on Learning Representations 2021Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [123] arXiv:2011.14317 (replaced) [pdf, other]

Title: FROCC: Fast Random projectionbased OneClass ClassificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [124] arXiv:2012.04798 (replaced) [pdf, other]

Title: Coupled Markov chain Monte Carlo for highdimensional regression with Halft priorsComments: 37 pages, 11 figuresSubjects: Methodology (stat.ME); Computation (stat.CO)
 [125] arXiv:2012.13326 (replaced) [pdf, ps, other]

Title: A Tight Lower Bound for Uniformly Stable AlgorithmsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [126] arXiv:2101.03446 (replaced) [pdf, other]

Title: The shifted ODE method for underdamped Langevin MCMCSubjects: Numerical Analysis (math.NA); Probability (math.PR); Statistics Theory (math.ST)
 [127] arXiv:2101.04789 (replaced) [pdf, ps, other]

Title: Improving Classification Accuracy with Graph FilteringAuthors: Mounia Hamidouche, Carlos Lassance, Yuqing Hu, Lucas Drumetz, Bastien Pasdeloup, Vincent GriponSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [128] arXiv:2101.06369 (replaced) [pdf, ps, other]

Title: Nonconvex weakly smooth Langevin Monte Carlo using regularizationSubjects: Computation (stat.CO)
 [129] arXiv:2101.07987 (replaced) [pdf, other]

Title: matrixdist: An R Package for Inhomogeneous PhaseType DistributionsSubjects: Computation (stat.CO)
 [130] arXiv:2101.08007 (replaced) [pdf, other]

Title: On the NonMonotonicity of a NonDifferentially Mismeasured Binary ConfounderAuthors: Jose M. PeñaComments: arXiv admin note: text overlap with arXiv:2005.13245Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [131] arXiv:2101.09022 (replaced) [pdf]

Title: A heavytailed and overdispersed collective risk modelSubjects: Applications (stat.AP)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2101, contact, help (Access key information)