We gratefully acknowledge support from
the Simons Foundation and member institutions.


New submissions

[ total of 151 entries: 1-151 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 7 Feb 23

[1]  arXiv:2302.01952 [pdf, other]
Title: On a continuous time model of gradient descent dynamics and instability in deep learning
Comments: Transactions of Machine Learning Research, 2023
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.

[2]  arXiv:2302.01974 [pdf, other]
Title: Characterization and estimation of high dimensional sparse regression parameters under linear inequality constraints
Comments: 25 pages, 12 figures
Subjects: Methodology (stat.ME)

Modern statistical problems often involve such linear inequality constraints on model parameters. Ignoring natural parameter constraints usually results in less efficient statistical procedures. To this end, we define a notion of `sparsity' for such restricted sets using lower-dimensional features. We allow our framework to be flexible so that the number of restrictions may be higher than the number of parameters. One such situation arise in estimation of monotone curve using a non parametric approach e.g. splines. We show that the proposed notion of sparsity agrees with the usual notion of sparsity in the unrestricted case and proves the validity of the proposed definition as a measure of sparsity. The proposed sparsity measure also allows us to generalize popular priors for sparse vector estimation to the constrained case.

[3]  arXiv:2302.01982 [pdf, other]
Title: multi-GPA-Tree: Statistical Approach for Pleiotropy Informed and Functional Annotation Tree Guided Prioritization of GWAS Results
Comments: 25 pages, 6 figures, 1 table
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)

Genome-wide association studies (GWAS) have successfully identified over two hundred thousand genotype-trait associations. Yet some challenges remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), most with small or moderate effect sizes, making them difficult to detect. Second, many complex traits share a common genetic basis due to `pleiotropy' and and though few methods consider it, leveraging pleiotropy can improve statistical power to detect genotype-trait associations with weaker effect sizes. Third, currently available statistical methods are limited in explaining the functional mechanisms through which genetic variants are associated with specific or multiple traits. We propose multi-GPA-Tree to address these challenges. The multi-GPA-Tree approach can identify risk SNPs associated with single as well as multiple traits while also identifying the combinations of functional annotations that can explain the mechanisms through which risk-associated SNPs are linked with the traits.
First, we implemented simulation studies to evaluate the proposed multi-GPA-Tree method and compared its performance with an existing statistical approach.The results indicate that multi-GPA-Tree outperforms the existing statistical approach in detecting risk-associated SNPs for multiple traits. Second, we applied multi-GPA-Tree to a systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), and to a Crohn's disease (CD) and ulcertive colitis (UC) GWAS, and functional annotation data including GenoSkyline and GenoSkylinePlus. Our results demonstrate that multi-GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits.

[4]  arXiv:2302.02015 [pdf, other]
Title: Non-greedy Tree-based Learning for Estimating Global Optimal Dynamic Treatment Decision Rules with Continuous Treatment Dosage
Authors: Chang Wang, Lu Wang
Subjects: Methodology (stat.ME)

Dynamic treatment regime (DTR) plays a critical role in precision medicine when assigning patient-specific treatments at multiple stages and optimizing a long term clinical outcome. However, most of existing work about DTRs have been focused on categorical treatment scenarios, instead of continuous treatment options. Also, the performances of regular black-box machine learning methods and regular tree learning methods are lack of interpretability and global optimality respectively. In this paper, we propose a non-greedy global optimization method for dose search, namely Global Optimal Dosage Tree-based learning method (GoDoTree), which combines a robust estimation of the counterfactual outcome mean with an interpretable and non-greedy decision tree for estimating the global optimal dynamic dosage treatment regime in a multiple-stage setting. GoDoTree-Learning recursively estimates how the counterfactual outcome mean depends on a continuous treatment dosage using doubly robust estimators at each stage, and optimizes the stage-specific decision tree in a non-greedy way. We conduct simulation studies to evaluate the finite sample performance of the proposed method and apply it to a real data application for optimal warfarin dose finding.

[5]  arXiv:2302.02024 [pdf, other]
Title: A Simple Approach for Local and Global Variable Importance in Nonlinear Regression Models
Subjects: Methodology (stat.ME)

The ability to interpret machine learning models has become increasingly important as their usage in data science continues to rise. Most current interpretability methods are optimized to work on either (\textit{i}) a global scale, where the goal is to rank features based on their contributions to overall variation in an observed population, or (\textit{ii}) the local level, which aims to detail on how important a feature is to a particular individual in the dataset. In this work, we present the ``GlObal And Local Score'' (GOALS) operator: a simple \textit{post hoc} approach to simultaneously assess local and global feature variable importance in nonlinear models. Motivated by problems in statistical genetics, we demonstrate our approach using Gaussian process regression where understanding how genetic markers affect trait architecture both among individuals and across populations is of high interest. With detailed simulations and real data analyses, we illustrate the flexible and efficient utility of GOALS over state-of-the-art variable importance strategies.

[6]  arXiv:2302.02033 [pdf, ps, other]
Title: An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

This work studies the pure-exploration setting for the convex hull feasibility (CHF) problem where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHF problem in the one-dimensional setting. We present the first asymptotically optimal algorithm called Thompson-CHF, whose modular design consists of a stopping rule and a sampling rule. In addition, we provide an extension of the algorithm that generalizes several important problems in the multi-armed bandit literature. Finally, we further investigate the Gaussian bandit case with unknown variances and address how the Thompson-CHF algorithm can be adjusted to be asymptotically optimal in this setting.

[7]  arXiv:2302.02043 [pdf, other]
Title: mixdistreg: An R Package for Fitting Mixture of Experts Distributional Regression with Adaptive First-order Methods
Authors: David Rügamer
Subjects: Computation (stat.CO)

This paper presents a high-level description of the R software package mixdistreg to fit mixture of experts distributional regression models. The proposed framework is implemented in R using the deepregression software template, which is based on TensorFlow and follows the neural structured additive learning principle. The software comprises various approaches as special cases, including mixture density networks and mixture regression approaches. Various code examples are given to demonstrate the package's functionality.

[8]  arXiv:2302.02053 [pdf, other]
Title: Model-based Smoothing with Integrated Wiener Processes and Overlapping Splines
Subjects: Methodology (stat.ME)

In many applications that involve the inference of an unknown smooth function, the inference of its derivatives will often be just as important as that of the function itself. To make joint inferences of the function and its derivatives, a class of Gaussian processes called $p^{\text{th}}$ order Integrated Wiener's Process (IWP), is considered. Methods for constructing a finite element (FEM) approximation of an IWP exist but have focused only on the order $p = 2$ case which does not allow appropriate inference for derivatives, and their computational feasibility relies on additional approximation to the FEM itself. In this article, we propose an alternative FEM approximation, called overlapping splines (O-spline), which pursues computational feasibility directly through the choice of test functions, and mirrors the construction of an IWP as the Ospline results from the multiple integrations of these same test functions. The O-spline approximation applies for any order $p \in \mathbb{Z}^+$, is computationally efficient and provides consistent inference for all derivatives up to order $p-1$. It is shown both theoretically, and empirically through simulation, that the O-spline approximation converges to the true IWP as the number of knots increases. We further provide a unified and interpretable way to define priors for the smoothing parameter based on the notion of predictive standard deviation (PSD), which is invariant to the order $p$ and the placement of the knot. Finally, we demonstrate the practical use of the O-spline approximation through simulation studies and an analysis of COVID death rates where the inference is carried on both the function and its derivatives where the latter has an important interpretation in terms of the course of the pandemic.

[9]  arXiv:2302.02110 [pdf, other]
Title: A Scalar-on-Quantile-Function Approach for Estimating Short-term Health Effects of Environmental Exposures
Subjects: Applications (stat.AP)

Environmental epidemiologic studies routinely utilize aggregate health outcomes to estimate effects of short-term (e.g., daily) exposures that are available at increasingly fine spatial resolutions. However, areal averages are typically used to derive population-level exposure, which cannot capture the spatial variation and individual heterogeneity in exposures that may occur within the spatial and temporal unit of interest (e.g., within day or ZIP code). We propose a general modeling approach to incorporate within-unit exposure heterogeneity in health analyses via exposure quantile functions. Furthermore, by viewing the exposure quantile function as a functional covariate, our approach provides additional flexibility in characterizing associations at different quantile levels. We apply the proposed approach to an analysis of air pollution and emergency department (ED) visits in Atlanta over four years. The analysis utilizes daily ZIP code-level distributions of personal exposures to four traffic-related ambient air pollutants simulated from the Stochastic Human Exposure and Dose Simulator. Our analyses find that effects of carbon monoxide on respiratory and cardiovascular disease ED visits are more pronounced with changes in lower quantiles of the population-level exposure. Software for implement is provided in the R package nbRegQF.

[10]  arXiv:2302.02156 [pdf, other]
Title: Challenges of cellwise outliers
Subjects: Methodology (stat.ME); Computation (stat.CO)

It is well-known that real data often contain outliers. The term outlier typically refers to a case, that is, a row of the $n \times d$ data matrix. In recent times a different type has come into focus, the cellwise outliers. These are suspicious cells (entries) that can occur anywhere in the data matrix. Even a relatively small proportion of outlying cells can contaminate over half the rows, which is a problem for rowwise robust methods. In this article we discuss the challenges posed by cellwise outliers, and some methods developed so far to deal with them. We obtain new results on cellwise breakdown values for location, covariance and regression. We also propose a cellwise robust method for correspondence analysis, with real data illustrations. The paper concludes by formulating some points for debate.

[11]  arXiv:2302.02228 [pdf, other]
Title: Counterfactual Identifiability of Bijective Causal Models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widely-used causal models in the literature. We establish their counterfactual identifiability for three common causal structures with unobserved confounding, and propose a practical learning method that casts learning a BGM as structured generative modeling. Learned BGMs enable efficient counterfactual estimation and can be obtained using a variety of deep conditional generative models. We evaluate our techniques in a visual task and demonstrate its application in a real-world video streaming simulation task.

[12]  arXiv:2302.02247 [pdf, ps, other]
Title: Spectral Density Estimation of Function-Valued Spatial Processes
Comments: 84 pages, 0 figures
Subjects: Statistics Theory (math.ST)

The spectral density function describes the second-order properties of a stationary stochastic process on $\mathbb{R}^d$. This paper considers the nonparametric estimation of the spectral density of a continuous-time stochastic process taking values in a separable Hilbert space. Our estimator is based on kernel smoothing and can be applied to a wide variety of spatial sampling schemes including those in which data are observed at irregular spatial locations. Thus, it finds immediate applications in Spatial Statistics, where irregularly sampled data naturally arise. The rates for the bias and variance of the estimator are obtained under general conditions in a mixed-domain asymptotic setting. When the data are observed on a regular grid, the optimal rate of the estimator matches the minimax rate for the class of covariance functions that decay according to a power law. The asymptotic normality of the spectral density estimator is also established under general conditions for Gaussian Hilbert-space valued processes. Finally, with a view towards practical applications the asymptotic results are specialized to the case of discretely-sampled functional data in a reproducing kernel Hilbert space.

[13]  arXiv:2302.02254 [pdf, other]
Title: Getting to "rate-optimal'' in ranking & selection
Journal-ref: Proceedings of the 2021 Winter Simulation Conference
Subjects: Computation (stat.CO); Statistics Theory (math.ST)

In their 2004 seminal paper, Glynn and Juneja formally and precisely established the rate-optimal, probability-of-incorrect-selection, replication allocation scheme for selecting the best of k simulated systems. In the case of independent, normally distributed outputs this allocation has a simple form that depends in an intuitively appealing way on the true means and variances. Of course the means and (typically) variances are unknown, but the rate-optimal allocation provides a target for implementable, dynamic, data-driven policies to achieve. In this paper we compare the empirical behavior of four related replication-allocation policies: mCEI from Chen and Rzyhov and our new gCEI policy that both converge to the Glynn and Juneja allocation; AOMAP from Peng and Fu that converges to the OCBA optimal allocation; and TTTS from Russo that targets the rate of convergence of the posterior probability of incorrect selection. We find that these policies have distinctly different behavior in some settings.

[14]  arXiv:2302.02286 [pdf, other]
Title: Optimal subsampling for the Cox proportional hazards model with massive survival data
Subjects: Computation (stat.CO)

The use of massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for the Cox proportional hazards model with time-dependent covariates when the sample is extraordinarily large but computing resources are relatively limited. A subsample estimator is developed by maximizing the weighted partial likelihood; it is shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expressions. Simulation studies show that the proposed method can satisfactorily approximate the estimator of the full dataset. The proposed method is then applied to corporate loan and breast cancer datasets, with different censoring rates, and the outcomes confirm its practical advantages.

[15]  arXiv:2302.02288 [pdf, other]
Title: Efficient Adaptive Sobel and Joint Significance Tests for Mediation Effects
Authors: Haixiang Zhang
Subjects: Methodology (stat.ME); Applications (stat.AP)

Mediation analysis is an important statistical tool in many research fields. Particularly, the Sobel test and joint significance test are two popular statistical test methods for mediation effects when we perform mediation analysis in practice. However, the drawback of both mediation testing methods is arising from the conservative type I error, which has reduced their powers and imposed restrictions on their popularity and usefulness. As a matter of fact, this limitation is long-standing for both methods in the medation analysis literature. To deal with this issue, we propose the adaptive Sobel test and adaptive joint significance test for mediation effects, which have significant improvements over the traditional Sobel and joint significance test methods. Meanwhile, our method is user-friendly and intelligible without involving more complicated procedures. The explicit expressions for sizes and powers are derived, which ensure the theoretical rationality of our method. Furthermore, we extend the proposed adaptive Sobel and joint significance tests for multiple mediators with family-wise error rate control. Extensive simulations are conducted to evaluate the performance of our mediation testing procedure. Finally, we illustrate the usefulness of our method by analysing three real-world datasets with continuous, binary and time-to-event outcomes, respectively.

[16]  arXiv:2302.02304 [pdf, other]
Title: Crowdsourcing Utilizing Subgroup Structure of Latent Factor Modeling
Subjects: Methodology (stat.ME)

Crowdsourcing has emerged as an alternative solution for collecting large scale labels. However, the majority of recruited workers are not domain experts, so their contributed labels could be noisy. In this paper, we propose a two-stage model to predict the true labels for multicategory classification tasks in crowdsourcing. In the first stage, we fit the observed labels with a latent factor model and incorporate subgroup structures for both tasks and workers through a multi-centroid grouping penalty. Group-specific rotations are introduced to align workers with different task categories to solve multicategory crowdsourcing tasks. In the second stage, we propose a concordance-based approach to identify high-quality worker subgroups who are relied upon to assign labels to tasks. In theory, we show the estimation consistency of the latent factors and the prediction consistency of the proposed method. The simulation studies show that the proposed method outperforms the existing competitive methods, assuming the subgroup structures within tasks and workers. We also demonstrate the application of the proposed method to real world problems and show its superiority.

[17]  arXiv:2302.02310 [pdf, other]
Title: $\ell_1$-penalized Multinomial Regression: Estimation, inference, and prediction, with an application to risk factor identification for different dementia subtypes
Comments: 23 pages, 3 figures, 20 tables
Subjects: Methodology (stat.ME); Applications (stat.AP)

High-dimensional multinomial regression models are very useful in practice but receive less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast-based $\ell_1$-penalized multinomial regression model and extend the debiasing method to the multinomial case, which provides a valid confidence interval for each coefficient and $p$-value of the individual hypothesis test. We apply the debiasing method to identify some important predictors in the progression into dementia of different subtypes. Results of intensive simulations show the superiority of the debiasing method compared to some other inference methods.

[18]  arXiv:2302.02406 [pdf, other]
Title: Pre-screening breast cancer with machine learning and deep learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We suggest that deep learning can be used for pre-screening cancer by analyzing demographic and anthropometric information of patients, as well as biological markers obtained from routine blood samples and relative risks obtained from meta-analysis and international databases. We applied feature selection algorithms to a database of 116 women, including 52 healthy women and 64 women diagnosed with breast cancer, to identify the best pre-screening predictors of cancer. We utilized the best predictors to perform k-fold Monte Carlo cross-validation experiments that compare deep learning against traditional machine learning algorithms. Our results indicate that a deep learning model with an input-layer architecture that is fine-tuned using feature selection can effectively distinguish between patients with and without cancer. Additionally, compared to machine learning, deep learning has the lowest uncertainty in its predictions. These findings suggest that deep learning algorithms applied to cancer pre-screening offer a radiation-free, non-invasive, and affordable complement to screening methods based on imagery. The implementation of deep learning algorithms in cancer pre-screening offer opportunities to identify individuals who may require imaging-based screening, can encourage self-examination, and decrease the psychological externalities associated with false positives in cancer screening. The integration of deep learning algorithms for both screening and pre-screening will ultimately lead to earlier detection of malignancy, reducing the healthcare and societal burden associated to cancer treatment.

[19]  arXiv:2302.02415 [pdf, ps, other]
Title: On Kronecker Separability of Multiway Covariance
Comments: 15 pages
Subjects: Statistics Theory (math.ST); Methodology (stat.ME)

Multiway data analysis is aimed at inferring patterns from data represented as a multi-dimensional array. Estimating covariance from multiway data is a fundamental statistical task, however, the intrinsic high dimensionality poses significant statistical and computational challenges. Recently, several factorized covariance models, paired with estimation algorithms, have been proposed to circumvent these obstacles. Despite several promising results on the algorithmic front, it remains under-explored whether and when such a model is valid. To address this question, we define the notion of Kronecker-separable multiway covariance, which can be written as a sum of $r$ tensor products of mode-wise covariances. The question of whether a given covariance can be represented as a separable multiway covariance is then reduced to an equivalent question about separability of quantum states. Using this equivalence, it follows directly that a generic multiway covariance tends to be non-separable (even if $r \to \infty$), and moreover, finding its best separable approximation is NP-hard. These observations imply that factorized covariance models are restrictive and should be used only when there is a compelling rationale for such a model.

[20]  arXiv:2302.02432 [pdf, other]
Title: Tighter Information-Theoretic Generalization Bounds from Supersamples
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

We present a variety of novel information-theoretic generalization bounds for learning algorithms, from the supersample setting of Steinke & Zakynthinou (2020)-the setting of the "conditional mutual information" framework. Our development exploits projecting the loss pair (obtained from a training instance and a testing instance) down to a single number and correlating loss values with a Rademacher sequence (and its shifted variants). The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness, and bounds for interpolating algorithms etc. We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.

[21]  arXiv:2302.02455 [pdf, other]
Title: ODEWS: The Overdraft Early Warning System
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)

When a customer overdraws their account and their balance is negative they are assessed an overdraft fee. Americans pay approximately \$15 billion in unnecessary overdraft fees a year, often in \$35 increments; users of the Mint personal finance app pay approximately \$250 million in fees a year in particular. These overdraft fees are an excessive financial burden and lead to cascading overdraft fees trapping customers in financial hardship. To address this problem, we have created an ML-driven overdraft early warning system (ODEWS) that assesses a customer's risk of overdrafting within the next week using their banking and transaction data in the Mint app. At-risk customers are sent an alert so they can take steps to avoid the fee, ultimately changing their behavior and financial habits. The system deployed resulted in a \$3 million savings in overdraft fees for Mint customers compared to a control group. Moreover, the methodology outlined here can be generalized to provide ML-driven personalized financial advice for many different personal finance goals--increase credit score, build emergency savings fund, pay down debut, allocate capital for investment.

[22]  arXiv:2302.02457 [pdf, other]
Title: Scalable inference in functional linear regression with streaming data
Subjects: Methodology (stat.ME)

Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on the batch learning setting. In this paper, we tackle these issues by developing functional stochastic gradient descent algorithms and proposing an online bootstrap resampling procedure to systematically study the inference problem for functional linear regression. In particular, the proposed estimation and inference procedures use only one pass over the data; thus they are easy to implement and suitable to the situation where data arrive in a streaming manner. Furthermore, we establish the convergence rate as well as the asymptotic distribution of the proposed estimator. Meanwhile, the proposed perturbed estimator from the bootstrap procedure is shown to enjoy the same theoretical properties, which provide the theoretical justification for our online inference tool. As far as we know, this is the first inference result on the functional linear regression model with streaming data. Simulation studies are conducted to investigate the finite-sample performance of the proposed procedure. An application is illustrated with the Beijing multi-site air-quality data.

[23]  arXiv:2302.02468 [pdf, other]
Title: Circular and spherical projected Cauchy distributions
Comments: Preprint
Subjects: Methodology (stat.ME)

Two new distributions are proposed: the circular projected and the spherical projected Cauchy distributions. A special case of the circular projected Cauchy coincides with the wrapped Cauchy distribution, and for this, a generalization is suggested that offers better fit via the inclusion of an extra parameter. For the spherical case, by imposing two conditions on the scatter matrix we end up with an elliptically symmetric distribution. All distributions allow for a closed-form normalizing constant and straightforward random values generation, while their parameters can be estimated via maximum likelihood. The bias of the estimated parameters is assessed via numerical studies, while exhibitions using real data compare them further to some existing models indicating better fits.

[24]  arXiv:2302.02476 [pdf, other]
Title: Estimating Time-Varying Networks for High-Dimensional Time Series
Subjects: Methodology (stat.ME); Econometrics (econ.EM)

We explore time-varying networks for high-dimensional locally stationary time series, using the large VAR model framework with both the transition and (error) precision matrices evolving smoothly over time. Two types of time-varying graphs are investigated: one containing directed edges of Granger causality linkages, and the other containing undirected edges of partial correlation linkages. Under the sparse structural assumption, we propose a penalised local linear method with time-varying weighted group LASSO to jointly estimate the transition matrices and identify their significant entries, and a time-varying CLIME method to estimate the precision matrices. The estimated transition and precision matrices are then used to determine the time-varying network structures. Under some mild conditions, we derive the theoretical properties of the proposed estimates including the consistency and oracle properties. In addition, we extend the methodology and theory to cover highly-correlated large-scale time series, for which the sparsity assumption becomes invalid and we allow for common factors before estimating the factor-adjusted time-varying networks. We provide extensive simulation studies and an empirical application to a large U.S. macroeconomic dataset to illustrate the finite-sample performance of our methods.

[25]  arXiv:2302.02482 [pdf, other]
Title: Continuously Indexed Graphical Models
Subjects: Statistics Theory (math.ST); Probability (math.PR); Methodology (stat.ME)

Let $X = \{X_{u}\}_{u \in U}$ be a real-valued Gaussian process indexed by a set $U$. It can be thought of as an undirected graphical model with every random variable $X_{u}$ serving as a vertex. We characterize this graph in terms of the covariance of $X$ through its reproducing kernel property. Unlike other characterizations in the literature, our characterization does not restrict the index set $U$ to be finite or countable, and hence can be used to model the intrinsic dependence structure of stochastic processes in continuous time/space. Consequently, the said characterization is not (and apparently cannot be) of the inverse-zero type. This poses novel challenges for the problem of recovery of the dependence structure from a sample of independent realizations of $X$, also known as structure estimation. We propose a methodology that circumvents these issues, by targeting the recovery of the underlying graph up to a finite resolution, which can be arbitrarily fine and is limited only by the available sample size. The recovery is shown to be consistent so long as the graph is sufficiently regular in an appropriate sense, and convergence rates are provided. Our methodology is illustrated by simulation and two data analyses.

[26]  arXiv:2302.02486 [pdf, other]
Title: The Difference-of-Log-Normals Distribution: Properties, Estimation, and Growth
Authors: Robert Parham
Subjects: Methodology (stat.ME); Statistics Theory (math.ST); General Finance (q-fin.GN)

This paper describes the Difference-of-Log-Normals (DLN) distribution. A companion paper makes the case that the DLN is a fundamental distribution in nature, and shows how a simple application of the CLT gives rise to the DLN in many disparate phenomena. Here, I characterize its PDF, CDF, moments, and parameter estimators; generalize it to N-dimensions using spherical distribution theory; describe methods to deal with its signature ``double-exponential'' nature; and use it to generalize growth measurement to possibly-negative variates distributing DLN. I also conduct Monte-Carlo experiments to establish some properties of the estimators and measures described.

[27]  arXiv:2302.02488 [pdf, other]
Title: A three-state coupled Markov switching model for COVID-19 outbreaks across Quebec based on hospital admissions
Subjects: Applications (stat.AP); Populations and Evolution (q-bio.PE)

Recurrent COVID-19 outbreaks have placed immense strain on the hospital system in Quebec. We develop a Bayesian three-state coupled Markov switching model to analyze COVID-19 outbreaks across Quebec based on admissions in the 30 largest hospitals. Within each catchment area we assume the existence of three states for the disease: absence, a new state meant to account for many zeroes in some of the smaller areas, endemic and outbreak. Then we assume the disease switches between the three states in each area through a series of coupled nonhomogeneous hidden Markov chains. Unlike previous approaches, the transition probabilities may depend on covariates and the occurrence of outbreaks in neighboring areas, to account for geographical outbreak spread. Additionally, to prevent rapid switching between endemic and outbreak periods we introduce clone states into the model which enforce minimum endemic and outbreak durations. We make some interesting findings such as that mobility in retail and recreation venues had a strong positive association with the development and persistence of new COVID-19 outbreaks in Quebec. Based on model comparison our contributions show promise in improving state estimation retrospectively and in real-time, especially when there are smaller areas and highly spatially synchronized outbreaks, and they offer new and interesting epidemiological interpretations.

[28]  arXiv:2302.02497 [pdf, other]
Title: High-dimensional Location Estimation via Norm Concentration for Subgamma Vectors
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $\lambda$, and want to estimate $\lambda$ as precisely as possible. Asymptotically, the maximum likelihood estimate achieves the Cram\'er-Rao bound of error $\mathcal N(0, \frac{1}{n\mathcal I})$, where $\mathcal I$ is the Fisher information of $f$. However, the $n$ required for convergence depends on $f$, and may be arbitrarily large. We build on the theory using \emph{smoothed} estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$-smoothed distribution. As $n \to \infty$, $r \to 0$ at an explicit rate and this converges to the Cram\'er-Rao bound. We (1) improve the prior work for 1-dimensional $f$ to converge for constant failure probability in addition to high probability, and (2) extend the theory to high-dimensional distributions. In the process, we prove a new bound on the norm of a high-dimensional random variable whose 1-dimensional projections are subgamma, which may be of independent interest.

[29]  arXiv:2302.02544 [pdf, other]
Title: Sequential change detection via backward confidence sequences
Comments: 24 pages, 10 figures
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $\theta$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $\theta$, then we can also successfully perform SCD for $\theta$. This is accomplished by checking if two CSs -- one forwards and the other backwards -- ever fail to intersect. Since the literature on CSs has been rapidly evolving recently, the reduction provided in this paper immediately solves several old and new change detection problems. Further, our "backward CS", constructed by reversing time, is new and potentially of independent interest. We provide strong nonasymptotic guarantees on the frequency of false alarms and detection delay, and demonstrate numerical effectiveness on several problems.

[30]  arXiv:2302.02590 [pdf, ps, other]
Title: Consensus dynamics and coherence in hierarchical small-world networks
Subjects: Methodology (stat.ME); Discrete Mathematics (cs.DM)

The hierarchical small-world network is a real-world network. It models well the benefit transmission web of the pyramid selling in China and many other countries. In this paper, by applying the spectral graph theory, we study three important aspects of the consensus problem in the hierarchical small-world network: convergence speed, communication time-delay robustness, and network coherence. Firstly, we explicitly determine the Laplacian eigenvalues of the hierarchical small-world network by making use of its treelike structure. Secondly, we find that the consensus algorithm on the hierarchical small-world network converges faster than that on some well-studied sparse networks, but is less robust to time delay. The closed-form of the first-order and the second-order network coherence are also derived. Our result shows that the hierarchical small-world network has an optimal structure of noisy consensus dynamics. Therefore, we provide a positive answer to two open questions of Yi \emph{et al}. Finally, we argue that some network structure characteristics, such as large maximum degree, small average path length, and large vertex and edge connectivity, are responsible for the strong robustness with respect to external perturbations.

[31]  arXiv:2302.02613 [pdf, ps, other]
Title: An asymptotic behavior of a finite-section of the optimal causal filter
Authors: Junho Yang
Subjects: Statistics Theory (math.ST)

We derive an $L_1$-bound between the coefficients of the optimal causal filter applied to the data-generating process and its approximation based on finite sample observations. Here, we assume that the data-generating process is second-order stationary with either short or long memory autocovariances. To obtain the $L_1$-bound, we first provide an exact expression of the causal filter coefficients and their approximation in terms of the absolute convergent series of the multistep ahead infinite and finite predictor coefficients, respectively. Then, we prove a so-called uniform-type Baxter's inequality to obtain a bound for the difference between the two multistep ahead predictor coefficients (under both short and memory time series). The $L_1$-approximation error bound of the causal filter coefficients can be used to evaluate the quality of the predictions of time series through the mean squared error criterion.

[32]  arXiv:2302.02670 [pdf, other]
Title: Random Forests for time-fixed and time-dependent predictors: The DynForest R package
Authors: Anthony Devaux (BPH), Cécile Proust-Lima (BPH), Robin Genuer (BPH)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The R package DynForest implements random forests for predicting a categorical or a (multiple causes) time-to-event outcome based on time-fixed and time-dependent predictors. Through the random forests, the time-dependent predictors can be measured with error at subject-specific times, and they can be endogeneous (i.e., impacted by the outcome process). They are modeled internally using flexible linear mixed models (thanks to lcmm package) with time-associations pre-specified by the user. DynForest computes dynamic predictions that take into account all the information from time-fixed and time-dependent predictors. DynForest also provides information about the most predictive variables using variable importance and minimal depth. Variable importance can also be computed on groups of variables. To display the results, several functions are available such as summary and plot functions. This paper aims to guide the user with a step-by-step example of the different functions for fitting random forests within DynForest.

[33]  arXiv:2302.02672 [pdf, other]
Title: Identifiability of latent-variable and structural-equation models: from linear to nonlinear
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.

[34]  arXiv:2302.02718 [pdf, other]
Title: A Log-Linear Non-Parametric Online Changepoint Detection Algorithm based on Functional Pruning
Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)

Online changepoint detection aims to detect anomalies and changes in real-time in high-frequency data streams, sometimes with limited available computational resources. This is an important task that is rooted in many real-world applications, including and not limited to cybersecurity, medicine and astrophysics. While fast and efficient online algorithms have been recently introduced, these rely on parametric assumptions which are often violated in practical applications. Motivated by data streams from the telecommunications sector, we build a flexible nonparametric approach to detect a change in the distribution of a sequence. Our procedure, NP-FOCuS, builds a sequential likelihood ratio test for a change in a set of points of the empirical cumulative density function of our data. This is achieved by keeping track of the number of observations above or below those points. Thanks to functional pruning ideas, NP-FOCuS has a computational cost that is log-linear in the number of observations and is suitable for high-frequency data streams. In terms of detection power, NP-FOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings. We demonstrate the utility of the procedure on both simulated and real data.

[35]  arXiv:2302.02766 [pdf, other]
Title: Generalization Bounds with Data-dependent Fractal Dimensions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of "geometric stability" and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed data-dependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings.

[36]  arXiv:2302.02768 [pdf, other]
Title: Network Autoregression for Incomplete Matrix-Valued Time Series
Subjects: Methodology (stat.ME)

We study the dynamics of matrix-valued time series with observed network structures by proposing a matrix network autoregression model with row and column networks of the subjects. We incorporate covariate information and a low rank intercept matrix. We allow incomplete observations in the matrices and the missing mechanism can be covariate dependent. To estimate the model, a two-step estimation procedure is proposed. The first step aims to estimate the network autoregression coefficients, and the second step aims to estimate the regression parameters, which are matrices themselves. Theoretically, we first separately establish the asymptotic properties of the autoregression coefficients and the error bounds of the regression parameters. Subsequently, a bias reduction procedure is proposed to reduce the asymptotic bias and the theoretical property of the debiased estimator is studied. Lastly, we illustrate the usefulness of the proposed method through a number of numerical studies and an analysis of a Yelp data set.

[37]  arXiv:2302.02774 [pdf, other]
Title: The SSL Interplay: Augmentations, Inductive Bias, and Generalization
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)

Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup, and highlight several insights for SSL practitioners that arise from our theory.

[38]  arXiv:2302.02859 [pdf, other]
Title: A Fast Bootstrap Algorithm for Causal Inference with Large Data
Comments: 46 pages
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)

Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this paper, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.

[39]  arXiv:2302.02923 [pdf, other]
Title: In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)

Personalized treatment effect estimates are often of interest in high-stakes applications -- thus, before deploying a model estimating such effects in practice, one needs to be sure that the best candidate from the ever-growing machine learning toolbox for this task was chosen. Unfortunately, due to the absence of counterfactual information in practice, it is usually not possible to rely on standard validation metrics for doing so, leading to a well-known model selection dilemma in the treatment effect estimation literature. While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner', we therefore empirically investigate success- and failure modes of different selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the DGP used for testing, and provide interesting insights into the relative (dis)advantages of different criteria alongside desiderata for the design of further illuminating empirical studies in this context.

[40]  arXiv:2302.02942 [pdf, other]
Title: Empirical quantification of predictive uncertainty due to model discrepancy by training with an ensemble of experimental designs: an application to ion channel kinetics
Comments: 23 pages, 9 figures
Subjects: Computation (stat.CO); Dynamical Systems (math.DS); Optimization and Control (math.OC); Quantitative Methods (q-bio.QM)

When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises - where a mathematical model fails to recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for making accurate estimates of uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data to train their models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments, which are used to investigate the kinetics of the hERG potassium ion channel. Here, `information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. Typically, assuming independent observational errors and training a model to an individual experiment results in parameter estimates with very little dependence on observational noise. Moreover, parameter sets arising from the same model applied to different experiments often conflict - indicative of model discrepancy. Our methods will help select more suitable mathematical models of hERG for future studies, and will be widely applicable to a range of biological modelling problems.

[41]  arXiv:2302.02950 [pdf, other]
Title: A multidimensional objective prior distribution from a scoring rule
Comments: 20 pages, 5 figures, 10 tables
Subjects: Methodology (stat.ME)

The construction of objective priors is, at best, challenging for multidimensional parameter spaces. A common practice is to assume independence and set up the joint prior as the product of marginal distributions obtained via "standard" objective methods, such as Jeffreys or reference priors. However, the assumption of independence a priori is not always reasonable, and whether it can be viewed as strictly objective is still open to discussion. In this paper, by extending a previously proposed objective approach based on scoring rules for the one dimensional case, we propose a novel objective prior for multidimensional parameter spaces which yields a dependence structure. The proposed prior has the appealing property of being proper and does not depend on the chosen model; only on the parameter space considered.

[42]  arXiv:2302.02954 [pdf, other]
Title: Maximum likelihood estimator for skew Brownian motion: the convergence rate
Subjects: Statistics Theory (math.ST); Probability (math.PR)

We give a thorough description of the asymptotic property of the maximum likelihood estimator (MLE) of the skewness parameter of a Skew Brownian Motion (SBM). Thanks to recent results on the Central Limit Theorem of the rate of convergence of estimators for the SBM, we prove a conjecture left open that the MLE has asymptotically a mixed normal distribution involving the local time with a rate of convergence of order $1/4$. We also give a series expansion of the MLE and study the asymptotic behavior of the score and its derivatives, as well as their variation with the skewness parameter. In particular, we exhibit a specific behavior when the SBM is actually a Brownian motion, and quantify the explosion of the coefficients of the expansion when the skewness parameter is close to $-1$ or $1$.

[43]  arXiv:2302.03026 [pdf, other]
Title: Sampling-Based Accuracy Testing of Posterior Estimators for General Inference
Comments: 15 pages
Subjects: Machine Learning (stat.ML); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Methodology (stat.ME)

Parameter inference, i.e. inferring the posterior distribution of the parameters of a statistical model given some data, is a central problem to many scientific disciplines. Posterior inference with generative models is an alternative to methods such as Markov Chain Monte Carlo, both for likelihood-based and simulation-based inference. However, assessing the accuracy of posteriors encoded in generative models is not straightforward. In this paper, we introduce `distance to random point' (DRP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators.
Our method differs from previously-existing coverage-based methods, which require posterior evaluations. We prove that our approach is necessary and sufficient to show that a posterior estimator is optimal. We demonstrate the method on a variety of synthetic examples, and show that DRP can be used to test the results of posterior inference analyses in high-dimensional spaces. We also show that our method can detect non-optimal inferences in cases where existing methods fail.

Cross-lists for Tue, 7 Feb 23

[44]  arXiv:2302.02009 (cross-list from cs.LG) [pdf, other]
Title: Domain Adaptation via Rebalanced Sub-domain Alignment
Comments: 20 pages, 6 figures, 4 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in real-world scenarios. To address this limitation, we propose a novel generalization bound that reweights source classification error by aligning source and target sub-domains. We prove that our proposed generalization bound is at least as strong as existing bounds under realistic assumptions, and we empirically show that it is much stronger on real-world data. We then propose an algorithm to minimize this novel generalization bound. We demonstrate by numerical experiments that this approach improves performance in shifted class distribution scenarios compared to state-of-the-art methods.

[45]  arXiv:2302.02056 (cross-list from cs.DS) [pdf, other]
Title: Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting
Comments: 28 pages, 5 figures
Subjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Computation (stat.CO)

Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data.

[46]  arXiv:2302.02061 (cross-list from cs.LG) [pdf, other]
Title: Reinforcement Learning with History-Dependent Dynamic Contexts
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

[47]  arXiv:2302.02092 (cross-list from cs.LG) [pdf, other]
Title: Interpolation for Robust Learning: Data Augmentation on Geodesics
Comments: 33 pages, 3 figures, 18 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connecting subpopulation distributions. (3) Additionally, we provide a theoretical guarantee of robustness improvement and investigate how the geodesic location and the sample size contribute, respectively. Experimental validations of the proposed strategy on four datasets, including CIFAR-100 and ImageNet, establish the efficacy of our method, e.g., our method improves the baselines' certifiable robustness on CIFAR10 up to $7.7\%$, with $16.8\%$ on empirical robustness on CIFAR-100. Our work provides a new perspective of model robustness through the lens of Wasserstein geodesic-based interpolation with a practical off-the-shelf strategy that can be combined with existing robust training methods.

[48]  arXiv:2302.02139 (cross-list from cs.LG) [pdf, other]
Title: Structural Explanations for Graph Neural Networks using HSIC
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph neural networks (GNNs) are a type of neural model that tackle graphical tasks in an end-to-end manner. Recently, GNNs have been receiving increased attention in machine learning and data mining communities because of the higher performance they achieve in various tasks, including graph classification, link prediction, and recommendation. However, the complicated dynamics of GNNs make it difficult to understand which parts of the graph features contribute more strongly to the predictions. To handle the interpretability issues, recently, various GNN explanation methods have been proposed. In this study, a flexible model agnostic explanation method is proposed to detect significant structures in graphs using the Hilbert-Schmidt independence criterion (HSIC), which captures the nonlinear dependency between two variables through kernels. More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lasso-based node explanation method. The group and fused regularization with GraphLIME enables the interpretation of GNNs in substructure units. Then, we show that the proposed approach can be used for the explanation of sequential graph classification tasks. Through experiments, it is demonstrated that our method can identify crucial structures in a target graph in various settings.

[49]  arXiv:2302.02155 (cross-list from cs.LG) [pdf, other]
Title: Guaranteed Tensor Recovery Fused Low-rankness and Smoothness
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

The tensor data recovery task has thus attracted much research attention in recent years. Solving such an ill-posed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor priors, i.e., global low-rankness (L) and local smoothness (S) across different tensor modes, which are always encoded as a sum of two separate regularization terms into the recovery models. However, unlike the primary theoretical developments on low-rank tensor recovery, these joint L+S models have no theoretical exact-recovery guarantees yet, making the methods lack reliability in real practice. To this crucial issue, in this work, we build a unique regularization term, which essentially encodes both L and S priors of a tensor simultaneously. Especially, by equipping this single regularizer into the recovery models, we can rigorously prove the exact recovery guarantees for two typical tensor recovery tasks, i.e., tensor completion (TC) and tensor robust principal component analysis (TRPCA). To the best of our knowledge, this should be the first exact-recovery results among all related L+S methods for tensor recovery. Significant recovery accuracy improvements over many other SOTA methods in several TC and TRPCA tasks with various kinds of visual tensor data are observed in extensive experiments. Typically, our method achieves a workable performance when the missing rate is extremely large, e.g., 99.5%, for the color image inpainting task, while all its peers totally fail in such challenging case.

[50]  arXiv:2302.02200 (cross-list from math.CO) [pdf, other]
Title: Rank-based linkage I: triplet comparisons and oriented simplicial complexes
Comments: 37 pages, 12 figures
Subjects: Combinatorics (math.CO); Statistics Theory (math.ST)

Rank-based linkage is a new tool for summarizing a collection $S$ of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on $S$. Rank-based linkage is applied to the $K$-nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected $K$-nearest neighbor graph on $S$. In $|S| K^2$ steps it builds an edge-weighted linkage graph $(S, \mathcal{L}, \sigma)$ where $\sigma(\{x, y\})$ is called the in-sway between objects $x$ and $y$. Take $\mathcal{L}_t$ to be the links whose in-sway is at least $t$, and partition $S$ into components of the graph $(S, \mathcal{L}_t)$, for varying $t$. Rank-based linkage is a functor from a category of out-ordered digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart`` the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Open combinatorial problems are presented in the last section.

[51]  arXiv:2302.02224 (cross-list from cs.LG) [pdf, other]
Title: TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This work investigates the intersection of cross modal learning and semi supervised learning, where we aim to improve the supervised learning performance of the primary modality by borrowing missing information from an unlabeled modality. We investigate this problem from a Nadaraya Watson (NW) kernel regression perspective and show that this formulation implicitly leads to a kernelized cross attention module. To this end, we propose The Attention Patch (TAP), a simple neural network plugin that allows data level knowledge transfer from the unlabeled modality. We provide numerical simulations on three real world datasets to examine each aspect of TAP and show that a TAP integration in a neural network can improve generalization performance using the unlabeled modality.

[52]  arXiv:2302.02252 (cross-list from cs.LG) [pdf, other]
Title: Reinforcement Learning in Low-Rank MDPs with Density Features
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

MDPs with low-rank transitions -- that is, the transition matrix can be factored into the product of two matrices, left and right -- is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for value-based learning and has been studied extensively. In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plug-in solutions for convex RL. In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage. In the absence of strong assumptions like reachability, this incompatibility easily leads to exponential error blow-up, which we overcome via novel technical tools. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.

[53]  arXiv:2302.02277 (cross-list from cs.LG) [pdf, other]
Title: SE(3) diffusion model with application to protein backbone generation
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.

[54]  arXiv:2302.02323 (cross-list from cs.LG) [pdf, other]
Title: Improving Fair Training under Correlation Shifts
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Model fairness is an essential element for Trustworthy AI. While many techniques for model fairness have been proposed, most of them assume that the training and deployment data distributions are identical, which is often not true in practice. In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We introduce the notion of correlation shifts, which can explicitly capture the change of the above bias. Second, we propose a novel pre-processing step that samples the input data to reduce correlation shifts and thus enables the in-processing approaches to overcome their limitations. We formulate an optimization problem for adjusting the data ratio among labels and sensitive groups to reflect the shifted correlation. A key benefit of our approach lies in decoupling the roles of pre- and in-processing approaches: correlation adjustment via pre-processing and unfairness mitigation on the processed data via in-processing. Experiments show that our framework effectively improves existing in-processing fair algorithms w.r.t. accuracy and fairness, both on synthetic and real datasets.

[55]  arXiv:2302.02392 (cross-list from cs.LG) [pdf, ps, other]
Title: Refined Value-Based Offline RL under Realizability and Partial Coverage
Comments: Under review
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropy-regularized) Q-function of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Q-functions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Q-functions with $L^2$-convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.

[56]  arXiv:2302.02420 (cross-list from cs.LG) [pdf, other]
Title: Direct Uncertainty Quantification
Comments: 21 pages, 16 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Traditional neural networks are simple to train but they produce overconfident predictions, while Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming. This paper introduces a new approach, direct uncertainty quantification (DirectUQ), that combines their advantages where the neural network directly models uncertainty in output space, and captures both aleatoric and epistemic uncertainty. DirectUQ can be derived as an alternative variational lower bound, and hence benefits from collapsed variational inference that provides improved regularizers. On the other hand, like non-probabilistic models, DirectUQ enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. Experiments show that DirectUQ and ensembles of DirectUQ provide a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.

[57]  arXiv:2302.02460 (cross-list from cs.LG) [pdf, other]
Title: Nonparametric Density Estimation under Distribution Drift
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study nonparametric density estimation in non-stationary drift settings. Given a sequence of independent samples taken from a distribution that gradually changes in time, the goal is to compute the best estimate for the current distribution. We prove tight minimax risk bounds for both discrete and continuous smooth densities, where the minimum is over all possible estimates and the maximum is over all possible distributions that satisfy the drift constraints. Our technique handles a broad class of drift models, and generalizes previous results on agnostic learning under drift.

[58]  arXiv:2302.02526 (cross-list from cs.LG) [pdf, other]
Title: On Private and Robust Bandits
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism only. For two different heavy-tailed settings, we give specific schemes of \texttt{PRM}, which enable us to achieve nearly-optimal regret. As by-products of our main results, we also give the first minimax lower bound for private heavy-tailed MABs (i.e., without contamination). Moreover, our two proposed truncation-based \texttt{PRM} achieve the optimal trade-off between estimation accuracy, privacy and robustness. Finally, we support our theoretical results with experimental studies.

[59]  arXiv:2302.02552 (cross-list from cs.LG) [pdf, other]
Title: Adapting to Continuous Covariate Shift via Online Density Ratio Estimation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the \emph{covariate shift}, where the input distributions of data change from training to testing stages while the input-conditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario -- \emph{continuous} covariate shift -- in which the test data appear sequentially, and their distributions can shift continuously. Our goal is to adaptively train the predictor such that its prediction risk accumulated over time can be minimized. Starting with the importance-weighted learning, we show the method works effectively if the time-varying density ratios of test and train inputs can be accurately estimated. However, existing density ratio estimation methods would fail due to data scarcity at each time step. To this end, we propose an online method that can appropriately reuse historical information. Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor. Empirical results also validate the effectiveness.

[60]  arXiv:2302.02560 (cross-list from cs.LG) [pdf, other]
Title: Causal Shift-Response Functions with Neural Networks: The Health Benefits of Lowering Air Quality Standards in the US
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Policymakers are required to evaluate the health benefits of reducing the National Ambient Air Quality Standards (NAAQS; i.e., the safety standards) for fine particulate matter PM 2.5 before implementing new policies. We formulate this objective as a shift-response function (SRF) and develop methods to analyze the problem using methods for causal inference, specifically under the stochastic interventions framework. SRFs model the average change in an outcome of interest resulting from a hypothetical shift in the observed exposure distribution. We propose a new broadly applicable doubly-robust method to learn SRFs using targeted regularization with neural networks. We evaluate our proposed method under various benchmarks specific for marginal estimates as a function of continuous exposure. Finally, we implement our estimator in the motivating application that considers the potential reduction in deaths from lowering the NAAQS from the current level of 12 $\mu g/m^3$ to levels that are recently proposed by the Environmental Protection Agency in the US (10, 9, and 8 $\mu g/m^3$).

[61]  arXiv:2302.02570 (cross-list from cs.AI) [pdf, other]
Title: Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals' outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means -- we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semi-synthetic as well as real case study data and show improved estimation accuracy across the board.

[62]  arXiv:2302.02571 (cross-list from cs.LG) [pdf, other]
Title: Offline Learning in Markov Games with General Function Approximation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner. By using Bellman-consistent pessimism, we obtain interval estimation for policies' returns, and use both the upper and the lower bounds to obtain a relaxation on the gap of a candidate policy, which becomes our optimization objective. Our results generalize prior works and provide several additional insights. Importantly, we require a data coverage condition that improves over the recently proposed "unilateral concentrability". Our condition allows selective coverage of deviation policies that optimally trade-off between their greediness (as approximate best responses) and coverage, and we show scenarios where this leads to significantly better guarantees. As a new connection, we also show how our algorithmic framework can subsume seemingly different solution concepts designed for the special case of two-player zero-sum games.

[63]  arXiv:2302.02589 (cross-list from cs.LG) [pdf, other]
Title: $z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm but suffers from high communication cost when training large-scale machine learning models. Sign-based methods, such as SignSGD \cite{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, sign-based algorithms could diverge under heterogeneous data, which thus motivated the development of advanced techniques, such as the error-feedback method and stochastic sign-based compression, to fix this issue. Nevertheless, these methods still suffer from slower convergence rates. Besides, none of them allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing stochastic sign-based methods. More importantly, the unified noisy perturbation scheme enables the development of the very first sign-based FedAvg algorithm ($z$-SignFedAvg) to accelerate the convergence. Theoretically, we show that $z$-SignFedAvg achieves a faster convergence rate than existing sign-based methods and, under the uniformly distributed noise, can enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that the $z$-SignFedAvg can achieve competitive empirical performance on real datasets and outperforms existing schemes.

[64]  arXiv:2302.02605 (cross-list from cs.LG) [pdf, other]
Title: Toward Large Kernel Models
Comments: Code is available at github.com/EigenPro/EigenPro3
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.

[65]  arXiv:2302.02622 (cross-list from cs.CV) [pdf, other]
Title: Uncertainty Calibration and its Application to Object Detection
Authors: Fabian Küppers
Comments: PhD thesis at University of Wuppertal, cite by: 'Fabian K\"uppers. "Uncertainty Calibration and its Application to Object Detection." PhD Thesis, University of Wuppertal, January 2023'
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)

Image-based environment perception is an important component especially for driver assistance systems or autonomous driving. In this scope, modern neuronal networks are used to identify multiple objects as well as the according position and size information within a single frame. The performance of such an object detection model is important for the overall performance of the whole system. However, a detection model might also predict these objects under a certain degree of uncertainty. [...]
In this work, we examine the semantic uncertainty (which object type?) as well as the spatial uncertainty (where is the object and how large is it?). We evaluate if the predicted uncertainties of an object detection model match with the observed error that is achieved on real-world data. In the first part of this work, we introduce the definition for confidence calibration of the semantic uncertainty in the context of object detection, instance segmentation, and semantic segmentation. We integrate additional position information in our examinations to evaluate the effect of the object's position on the semantic calibration properties. Besides measuring calibration, it is also possible to perform a post-hoc recalibration of semantic uncertainty that might have turned out to be miscalibrated. [...]
The second part of this work deals with the spatial uncertainty obtained by a probabilistic detection model. [...] We review and extend common calibration methods so that it is possible to obtain parametric uncertainty distributions for the position information in a more flexible way.
In the last part, we demonstrate a possible use-case for our derived calibration methods in the context of object tracking. [...] We integrate our previously proposed calibration techniques and demonstrate the usefulness of semantic and spatial uncertainty calibration in a subsequent process. [...]

[66]  arXiv:2302.02648 (cross-list from cs.HC) [pdf]
Title: First steps towards quantum machine learning applied to the classification of event-related potentials
Authors: Grégoire Cattan, Alexandre Quemy (PUT), Anton Andreev (GIPSA-Services)
Comments: in French language
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

Low information transfer rate is a major bottleneck for brain-computer interfaces based on non-invasive electroencephalography (EEG) for clinical applications. This led to the development of more robust and accurate classifiers. In this study, we investigate the performance of quantum-enhanced support vector classifier (QSVC). Training (predicting) balanced accuracy of QSVC was 83.17 (50.25) %. This result shows that the classifier was able to learn from EEG data, but that more research is required to obtain higher predicting accuracy. This could be achieved by a better configuration of the classifier, such as increasing the number of shots.

[67]  arXiv:2302.02747 (cross-list from econ.EM) [pdf, other]
Title: Testing Quantile Forecast Optimality
Subjects: Econometrics (econ.EM); Methodology (stat.ME)

Quantile forecasts made across multiple horizons have become an important output of many financial institutions, central banks and international organisations. This paper proposes misspecification tests for such quantile forecasts that assess optimality over a set of multiple forecast horizons and/or quantiles. The tests build on multiple Mincer-Zarnowitz quantile regressions cast in a moment equality framework. Our main test is for the null hypothesis of autocalibration, a concept which assesses optimality with respect to the information contained in the forecasts themselves. We provide an extension that allows to test for optimality with respect to larger information sets and a multivariate extension. Importantly, our tests do not just inform about general violations of optimality, but may also provide useful insights into specific forms of sub-optimality. A simulation study investigates the finite sample performance of our tests, and two empirical applications to financial returns and U.S. macroeconomic series illustrate that our tests can yield interesting insights into quantile forecast sub-optimality and its causes.

[68]  arXiv:2302.02865 (cross-list from cs.LG) [pdf, other]
Title: Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Contrastively trained encoders have recently been proven to invert the data-generating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, real-world observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the data-generating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty.

[69]  arXiv:2302.02876 (cross-list from cs.LG) [pdf, other]
Title: Variational Information Pursuit for Interpretable Predictions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short query-answer chains, a greedy procedure called Information Pursuit (IP) is used, which adaptively chooses queries in order of information gain. Generative models are employed to learn the distribution of query-answers and labels, which is in turn used to estimate the most informative query. However, learning and inference with a full generative model of the data is often intractable for complex tasks. In this work, we propose Variational Information Pursuit (V-IP), a variational characterization of IP which bypasses the need for learning generative models. V-IP is based on finding a query selection strategy and a classifier that minimizes the expected cross-entropy between true and predicted labels. We then demonstrate that the IP strategy is the optimal solution to this problem. Therefore, instead of learning generative models, we can use our optimal strategy to directly pick the most informative query given any history. We then develop a practical algorithm by defining a finite-dimensional parameterization of our strategy and classifier using deep networks and train them end-to-end using our objective. Empirically, V-IP is 10-100x faster than IP on different Vision and NLP tasks with competitive performance. Moreover, V-IP finds much shorter query chains when compared to reinforcement learning which is typically used in sequential-decision-making problems. Finally, we demonstrate the utility of V-IP on challenging tasks like medical diagnosis where the performance is far superior to the generative modelling approach.

[70]  arXiv:2302.02895 (cross-list from cs.CG) [pdf, other]
Title: Flexible and Probabilistic Topology Tracking with Partial Optimal Transport
Subjects: Computational Geometry (cs.CG); Applications (stat.AP)

In this paper, we present a flexible and probabilistic framework for tracking topological features in time-varying scalar fields using merge trees and partial optimal transport. Merge trees are topological descriptors that record the evolution of connected components in the sublevel sets of scalar fields. We present a new technique for modeling and comparing merge trees using tools from partial optimal transport. In particular, we model a merge tree as a measure network, that is, a network equipped with a probability distribution, and define a notion of distance on the space of merge trees inspired by partial optimal transport. Such a distance offers a new and flexible perspective for encoding intrinsic and extrinsic information in the comparative measures of merge trees. More importantly, it gives rise to a partial matching between topological features in time-varying data, thus enabling flexible topology tracking for scientific simulations. Furthermore, such partial matching may be interpreted as probabilistic coupling between features at adjacent time steps, which gives rise to probabilistic tracking graphs. We derive a stability result for our distance and provide numerous experiments indicating the efficacy of distance in extracting meaningful feature tracks.

[71]  arXiv:2302.02941 (cross-list from cs.LG) [pdf, other]
Title: On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology
Comments: 24 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as over-squashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute (access) time. Our analysis provides a unified framework to study different recent methods introduced to cope with over-squashing and serves as a justification for a class of methods that fall under `graph rewiring'.

[72]  arXiv:2302.02951 (cross-list from cond-mat.stat-mech) [pdf, other]
Title: Noise-cleaning the precision matrix of fMRI time series
Comments: 15 pages, 12 figures (of which 12 pages, 3 figures in the main text)
Subjects: Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)

We present a comparison between various algorithms of inference of covariance and precision matrices in small datasets of real vectors, of the typical length and dimension of human brain activity time series retrieved by functional Magnetic Resonance Imaging (fMRI). Assuming a Gaussian model underlying the neural activity, the problem consists in denoising the empirically observed matrices in order to obtain a better estimator of the true precision and covariance matrices. We consider several standard noise-cleaning algorithms and compare them on two types of datasets. The first type are time series of fMRI brain activity of human subjects at rest. The second type are synthetic time series sampled from a generative Gaussian model of which we can vary the fraction of dimensions per sample q = N/T and the strength of off-diagonal correlations. The reliability of each algorithm is assessed in terms of test-set likelihood and, in the case of synthetic data, of the distance from the true precision matrix. We observe that the so called Optimal Rotationally Invariant Estimator, based on Random Matrix Theory, leads to a significantly lower distance from the true precision matrix in synthetic data, and higher test likelihood in natural fMRI data. We propose a variant of the Optimal Rotationally Invariant Estimator in which one of its parameters is optimised by cross-validation. In the severe undersampling regime (large q) typical of fMRI series, it outperforms all the other estimators. We furthermore propose a simple algorithm based on an iterative likelihood gradient ascent, providing an accurate estimation for weakly correlated datasets.

[73]  arXiv:2302.02971 (cross-list from cs.LG) [pdf, other]
Title: U-Clip: On-Average Unbiased Stochastic Gradient Clipping
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

U-Clip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the U-Clip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates $u_i$ as long as $\sum_{i=1}^t (u_i - g_i) = o(t)$ where $g_i$ are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.

[74]  arXiv:2302.02988 (cross-list from cs.LG) [pdf, other]
Title: Asymptotically Minimax Optimal Fixed-Budget Best Arm Identification for Expected Simple Regret Minimization
Subjects: Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

We investigate fixed-budget best arm identification (BAI) for expected simple regret minimization. In each round of an adaptive experiment, a decision maker draws one of multiple treatment arms based on past observations and subsequently observes the outcomes of the chosen arm. After the experiment, the decision maker recommends a treatment arm with the highest projected outcome. We evaluate this decision in terms of the expected simple regret, a difference between the expected outcomes of the best and recommended treatment arms. Due to the inherent uncertainty, we evaluate the regret using the minimax criterion. For distributions with fixed variances (location-shift models), such as Gaussian distributions, we derive asymptotic lower bounds for the worst-case expected simple regret. Then, we show that the Random Sampling (RS)-Augmented Inverse Probability Weighting (AIPW) strategy proposed by Kato et al. (2022) is asymptotically minimax optimal in the sense that the leading factor of its worst-case expected simple regret asymptotically matches our derived worst-case lower bound. Our result indicates that, for location-shift models, the optimal RS-AIPW strategy draws treatment arms with varying probabilities based on their variances. This result contrasts with the results of Bubeck et al. (2011), which shows that drawing each treatment arm with an equal ratio is minimax optimal in a bounded outcome setting.

[75]  arXiv:2302.02991 (cross-list from eess.IV) [pdf, other]
Title: Optimal Transport Guided Unsupervised Learning for Enhancing low-quality Retinal Images
Comments: Accepted as a conference paper to 20th IEEE International Symposium on Biomedical Imaging(ISBI 2023)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Real-world non-mydriatic retinal fundus photography is prone to artifacts, imperfections and low-quality when certain ocular or systemic co-morbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective end-to-end framework for enhancing poor-quality retinal fundus images. Leveraging the optimal transport theory, we proposed an unpaired image-to-image translation scheme for transporting low-quality images to their high-quality counterparts. We theoretically proved that a Generative Adversarial Networks (GAN) model with a generator and discriminator is sufficient for this task. Furthermore, to mitigate the inconsistency of information between the low-quality images and their enhancements, an information consistency mechanism was proposed to maximally maintain structural consistency (optical discs, blood vessels, lesions) between the source and enhanced domains. Extensive experiments were conducted on the EyeQ dataset to demonstrate the superiority of our proposed method perceptually and quantitatively.

[76]  arXiv:2302.03003 (cross-list from eess.IV) [pdf, other]
Title: OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing
Comments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the \emph{Optimal Transport (OT)} theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as \emph{regularization by enhancing (RE)}. We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.

[77]  arXiv:2302.03020 (cross-list from cs.LG) [pdf, other]
Title: RLSbench: Domain Adaptation Under Relaxed Label Shift
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Despite the emergence of principled methods for domain adaptation under label shift, the sensitivity of these methods for minor shifts in the class conditional distributions remains precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with shifts in label proportions. While several papers attempt to adapt these heuristics to accommodate shifts in label proportions, inconsistencies in evaluation criteria, datasets, and baselines, make it hard to assess the state of the art. In this paper, we introduce RLSbench, a large-scale relaxed label shift benchmark, consisting of >500 distribution shift pairs that draw on 14 datasets across vision, tabular, and language modalities and compose them with varying label proportions. First, we evaluate 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most deep domain adaptation heuristics: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. The meta-algorithm improves existing domain adaptation heuristics often by 2--10\% accuracy points under extreme label proportion shifts and has little (i.e., <0.5\%) effect when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmi-lab/RLSbench.

Replacements for Tue, 7 Feb 23

[78]  arXiv:1809.02727 (replaced) [pdf, ps, other]
Title: Decentralized Differentially Private Without-Replacement Stochastic Gradient Descent
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[79]  arXiv:1908.07521 (replaced) [pdf, other]
Title: Distributed Hypothesis Testing over a Noisy Channel: Error-exponents Trade-off
Subjects: Other Statistics (stat.OT); Information Theory (cs.IT)
[80]  arXiv:1909.06094 (replaced) [pdf, other]
Title: Estimating Fisher Information Matrix in Latent Variable Models based on the Score Function
Authors: Maud Delattre (MIA-Paris), Estelle Kuhn (MaIAGE)
Subjects: Methodology (stat.ME)
[81]  arXiv:1911.00648 (replaced) [pdf, other]
Title: salmon: A Symbolic Linear Regression Package for Python
Comments: Accepted in the Journal of Statistical Software
Subjects: Computation (stat.CO)
[82]  arXiv:2002.01444 (replaced) [pdf, other]
Title: Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem
Comments: 27 pages, 6 figures, with additional experiments exploiting sparsity
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
[83]  arXiv:2004.04464 (replaced) [pdf, other]
Title: A Characteristic Function for Shapley-Value-Based\\Attribution of Anomaly Scores
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[84]  arXiv:2005.01026 (replaced) [pdf, other]
Title: Multi-Center Federated Learning: Clients Clustering for Better Personalization
Comments: This paper has two duplicated versions: 2005.01026 and 2108.08647. The first one 2005.01026 is the right one, and the second one 2108.08647 should be deleted because it always causes misoperating
Journal-ref: World Wide Web,26,(2003),481-500
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[85]  arXiv:2009.07703 (replaced) [pdf, other]
Title: Efficient Variational Bayes Learning of Graphical Models with Smooth Structural Changes
Journal-ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2023)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[86]  arXiv:2009.12682 (replaced) [pdf, other]
Title: Decision-Aware Conditional GANs for Time Series Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[87]  arXiv:2101.02094 (replaced) [pdf, ps, other]
Title: Bernstein-Type Bounds for Beta Distribution
Authors: Maciej Skorski
Comments: major revision - fixed a mistake in the proof
Subjects: Probability (math.PR); Statistics Theory (math.ST); Applications (stat.AP)
[88]  arXiv:2102.11050 (replaced) [pdf, other]
Title: Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization
Authors: Rad Niazadeh (1), Negin Golrezaei (2), Joshua Wang (3), Fransisca Susan (4), Ashwinkumar Badanidiyuru (3) ((1) Chicago Booth School of Business, Operations Management, (2) MIT Sloan School of Management, Operations Management, (3) Google Research Mountain View, (4) MIT Operations Research Center)
Comments: 87 pages, 2 figures. Management Science (2022)
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
[89]  arXiv:2105.03529 (replaced) [pdf, other]
Title: Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data
Subjects: Applications (stat.AP)
[90]  arXiv:2105.10590 (replaced) [pdf, other]
Title: Parallelizing Contextual Bandits
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
[91]  arXiv:2106.01128 (replaced) [pdf, other]
Title: Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[92]  arXiv:2106.14045 (replaced) [pdf, other]
Title: The mbsts package: Multivariate Bayesian Structural Time Series Models in R
Authors: Ning Ning, Jinwen Qiu
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Mathematical Software (cs.MS); Applications (stat.AP); Computation (stat.CO)
[93]  arXiv:2107.00371 (replaced) [pdf, other]
Title: Sparse GCA and Thresholded Gradient Descent
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[94]  arXiv:2108.12600 (replaced) [pdf, other]
Title: A robust fusion-extraction procedure with summary statistics in the presence of biased sources
Subjects: Methodology (stat.ME)
[95]  arXiv:2109.02959 (replaced) [pdf, other]
Title: Fast approximations of pseudo-observations in the context of right-censoring and interval-censoring
Authors: Olivier Bouaziz (MAP5 - UMR 8145)
Subjects: Statistics Theory (math.ST)
[96]  arXiv:2109.09367 (replaced) [pdf, other]
Title: Extending Bootstrap AMG for Clustering of Attributed Graphs
Comments: 32 pages, 12 figures, preprint
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Statistics Theory (math.ST)
[97]  arXiv:2111.03289 (replaced) [pdf, ps, other]
Title: Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
Comments: accepted to neurips'22
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[98]  arXiv:2111.04964 (replaced) [pdf, other]
Title: On Representation Knowledge Distillation for Graph Neural Networks
Comments: IEEE Transactions on Neural Networks and Learning Representation (TNNLS), Special Issue on Deep Neural Networks for Graphs: Theory, Models, Algorithms and Applications
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[99]  arXiv:2112.09036 (replaced) [pdf, other]
Title: The Dual PC Algorithm and the Role of Gaussianity for Structure Learning of Bayesian Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
[100]  arXiv:2112.10753 (replaced) [pdf, other]
Title: Strong Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Markov Jump Linear Systems
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
[101]  arXiv:2201.12064 (replaced) [pdf, other]
Title: Multiscale Graph Comparison via the Embedded Laplacian Discrepancy
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[102]  arXiv:2202.01666 (replaced) [pdf, other]
Title: Proportional Fairness in Federated Learning
Comments: Accepted at TMLR 2023, typos fixed
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
[103]  arXiv:2202.03835 (replaced) [pdf, other]
Title: A covariant, discrete time-frequency representation tailored for zero-based signal detection
Comments: Accepted for publication in IEEE Transactions on Signal Processing on May, 26, 2022
Subjects: Signal Processing (eess.SP); Statistics Theory (math.ST); Methodology (stat.ME)
[104]  arXiv:2202.04912 (replaced) [pdf, other]
Title: Random Forest Weighted Local Fréchet Regression
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[105]  arXiv:2202.07098 (replaced) [pdf, ps, other]
Title: Statistical Inference After Adaptive Sampling for Longitudinal Data
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)
[106]  arXiv:2203.00614 (replaced) [pdf, other]
Title: Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space
Comments: 53 pages (11 pages for Appendix), 24 figures
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[107]  arXiv:2203.10563 (replaced) [pdf, other]
Title: Application of de-shape synchrosqueezing to estimate gait cadence from a single-sensor accelerometer placed in different body locations
Subjects: Applications (stat.AP)
[108]  arXiv:2204.00180 (replaced) [pdf, other]
Title: Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification Approach
Authors: Filip Obradović
Comments: For associated code, see: this https URL
Subjects: Applications (stat.AP); Econometrics (econ.EM)
[109]  arXiv:2204.07879 (replaced) [pdf, ps, other]
Title: Polynomial-time sparse measure recovery
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[110]  arXiv:2204.08964 (replaced) [pdf, other]
Title: Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains
Comments: 25 pages 7 figures
Subjects: Quantum Physics (quant-ph); Mathematical Physics (math-ph); Statistics Theory (math.ST)
[111]  arXiv:2205.04107 (replaced) [pdf, other]
Title: Inference of multivariate exponential Hawkes processes with inhibition and application to neuronal activity
Authors: Anna Bonnet (LPSM (UMR\_8001)), Miguel Martinez Herrera (LPSM (UMR\_8001)), Maxime Sangnier (LPSM (UMR\_8001))
Subjects: Methodology (stat.ME)
[112]  arXiv:2205.13496 (replaced) [pdf, other]
Title: Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis
Comments: Published in NeurIPS 2022
Journal-ref: NeurIPS 2022
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[113]  arXiv:2205.14504 (replaced) [pdf, other]
Title: Bayesian prediction via nonparametric transformation models
Comments: The corresponding R package BuLTM is available on GitHub this https URL
Subjects: Methodology (stat.ME)
[114]  arXiv:2206.02617 (replaced) [pdf, other]
Title: Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[115]  arXiv:2206.02659 (replaced) [pdf, other]
Title: Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees
Comments: 36 pages, 5 figures, 8 tables (Fixed typos). ICML 2022
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)
[116]  arXiv:2206.12680 (replaced) [pdf, other]
Title: Topology-aware Generalization of Decentralized SGD
Comments: Accepted for publication in the 39th International Conference on Machine Learning (ICML 2022)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[117]  arXiv:2206.14275 (replaced) [pdf, other]
Title: Dynamic CoVaR Modeling
Subjects: Econometrics (econ.EM); Statistics Theory (math.ST); Risk Management (q-fin.RM); Methodology (stat.ME)
[118]  arXiv:2207.00357 (replaced) [pdf, other]
Title: Efficient parameter estimation for parabolic SPDEs based on a log-linear model for realized volatilities
Subjects: Statistics Theory (math.ST)
[119]  arXiv:2207.02287 (replaced) [pdf, other]
Title: Branching Processes in Random Environments with Thresholds
Comments: 47 pages, 3 figures, 5 tables
Subjects: Probability (math.PR); Statistics Theory (math.ST)
[120]  arXiv:2207.08038 (replaced) [pdf, other]
Title: A Singular Woodbury and Pseudo-Determinant Matrix Identities and Application to Gaussian Process Regression
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO)
[121]  arXiv:2207.14088 (replaced) [pdf, other]
Title: On the Sequential Probability Ratio Test in Hidden Markov Models
Comments: 28 pages, 10 figures, submitted to CONCUR 2022
Subjects: Probability (math.PR); Logic in Computer Science (cs.LO); Statistics Theory (math.ST)
[122]  arXiv:2208.00959 (replaced) [pdf, other]
Title: HUG model: an interaction point process for Bayesian detection of multiple sources in groundwaters from hydrochemical data
Authors: Christophe Reype (IECL, PASTA), Radu S. Stoica (IECL, PASTA), Antonin Richard, Madalina Deaconu (IECL, PASTA)
Subjects: Applications (stat.AP); Statistics Theory (math.ST); Methodology (stat.ME)
[123]  arXiv:2208.07132 (replaced) [pdf, other]
Title: Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2D2M2 prior
Comments: 61 pages, 21 figures, 9 tables
Subjects: Methodology (stat.ME); Computation (stat.CO)
[124]  arXiv:2209.05153 (replaced) [pdf, other]
Title: The test of exponentiality based on the mean residual life function revisited
Authors: Bruno Ebner
Comments: 16 pages, 1 figure, 5 tables
Subjects: Statistics Theory (math.ST)
[125]  arXiv:2209.07791 (replaced) [pdf, ps, other]
Title: Maximum likelihood estimation and prediction error for a Mat{é}rn model on the circle
Authors: Sébastien Petit (L2S, LNE )
Subjects: Statistics Theory (math.ST)
[126]  arXiv:2209.12651 (replaced) [pdf, other]
Title: Learning Variational Models with Unrolling and Bilevel Optimization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[127]  arXiv:2209.13570 (replaced) [pdf, other]
Title: Hierarchical Sliced Wasserstein Distance
Comments: Accepted to ICLR 2023, 29 pages, 8 figures, 3 tables,
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[128]  arXiv:2209.15414 (replaced) [pdf, other]
Title: Predicting the power grid frequency of European islands
Comments: 17 pages
Subjects: Applications (stat.AP); Machine Learning (cs.LG); Systems and Control (eess.SY); Data Analysis, Statistics and Probability (physics.data-an)
[129]  arXiv:2210.00635 (replaced) [pdf, other]
Title: Robust Empirical Risk Minimization with Tolerance
Comments: 22 pages, 1 figure, To appear at ALT'23
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[130]  arXiv:2210.00895 (replaced) [pdf, other]
Title: On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits
Authors: Antoine Barrier (UMPA-ENSL, LMO, CELESTE), Aurélien Garivier (UMPA-ENSL, LIP), Gilles Stoltz (LMO, CELESTE)
Journal-ref: ALT 2023 - The 34th International Conference on Algorithmic Learning Theory, Feb 2023, Singapour, Singapore
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
[131]  arXiv:2210.06819 (replaced) [pdf, other]
Title: Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence
Comments: 14 pages in main text; 51 pages including bibliography and appendix. Published in Transcation on Machine Learning Research(TMLR), 2023. this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[132]  arXiv:2210.14860 (replaced) [pdf, other]
Title: Dependence matters: Statistical models to identify the drivers of tie formation in economic networks
Subjects: Methodology (stat.ME); Applications (stat.AP)
[133]  arXiv:2211.08572 (replaced) [pdf, other]
Title: Bayesian Fixed-Budget Best-Arm Identification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[134]  arXiv:2211.09259 (replaced) [pdf, other]
Title: The Missing Indicator Method: From Low to High Dimensions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[135]  arXiv:2211.09403 (replaced) [pdf, other]
Title: Learning Mixtures of Markov Chains and MDPs
Comments: 51 pages (13 page paper, 38 page appendix). Paper restructured and refined, corrections made to proofs, experiments added
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[136]  arXiv:2211.10747 (replaced) [pdf, other]
Title: Exploring validation metrics for offline model-based optimisation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[137]  arXiv:2211.13793 (replaced) [pdf, other]
Title: Tensor Decomposition of Large-scale Clinical EEGs Reveals Interpretable Patterns of Brain Physiology
Comments: 4 pages, 3 Figures, 2 Tables; Accepted at IEEE NER 2023
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Applications (stat.AP)
[138]  arXiv:2211.14296 (replaced) [pdf, other]
Title: A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation
Comments: Accepted at ICLR2023 (notable-top-25%), Website: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
[139]  arXiv:2211.14908 (replaced) [pdf, other]
Title: A Permutation-free Kernel Two-Sample Test
Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), with an oral presentation
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[140]  arXiv:2212.09178 (replaced) [pdf, ps, other]
Title: Support Vector Regression: Risk Quadrangle Framework
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[141]  arXiv:2212.09844 (replaced) [pdf, other]
Title: Counterfactual Risk Assessments under Unmeasured Confounding
Subjects: Econometrics (econ.EM); Computers and Society (cs.CY); Machine Learning (cs.LG); Methodology (stat.ME)
[142]  arXiv:2212.12749 (replaced) [pdf, other]
Title: Deep Latent State Space Models for Time-Series Generation
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[143]  arXiv:2301.07067 (replaced) [pdf, other]
Title: Transformers as Algorithms: Generalization and Stability in In-context Learning
Comments: Revised version significantly improves the stability guarantees and provides new experiments
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[144]  arXiv:2301.09479 (replaced) [pdf, other]
Title: Modality-Agnostic Variational Compression of Implicit Neural Representations
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[145]  arXiv:2301.09861 (replaced) [pdf]
Title: A convolutional neural network of low complexity for tumor anomaly detection
Comments: This article has been accepted for publication in the 8th International Congress on Information and Communication Technology (ICICT 2023, Springer)
Subjects: Image and Video Processing (eess.IV); Applications (stat.AP)
[146]  arXiv:2301.12003 (replaced) [pdf, other]
Title: Minimizing Trajectory Curvature of ODE-based Generative Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[147]  arXiv:2301.13112 (replaced) [pdf, other]
Title: Benchmarking optimality of time series classification methods in distinguishing diffusions
Comments: 21 pages, 8 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[148]  arXiv:2301.13857 (replaced) [pdf, other]
Title: Learning in POMDPs is Sample-Efficient with Hindsight Observability
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[149]  arXiv:2302.00704 (replaced) [pdf, other]
Title: Pathologies of Predictive Diversity in Deep Ensembles
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[150]  arXiv:2302.00814 (replaced) [pdf, other]
Title: Stochastic Contextual Bandits with Long Horizon Rewards
Comments: 47 pages, to appear at AAAI 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[151]  arXiv:2302.01425 (replaced) [pdf, other]
Title: Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Comments: 23 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ total of 151 entries: 1-151 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2302, contact, help  (Access key information)