Statistics
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 7 Feb 23
 [1] arXiv:2302.01952 [pdf, other]

Title: On a continuous time model of gradient descent dynamics and instability in deep learningComments: Transactions of Machine Learning Research, 2023Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The recipe behind the success of deep learning has been the combination of neural networks and gradientbased optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the tradeoff between training stability and test set evaluation performance.
 [2] arXiv:2302.01974 [pdf, other]

Title: Characterization and estimation of high dimensional sparse regression parameters under linear inequality constraintsComments: 25 pages, 12 figuresSubjects: Methodology (stat.ME)
Modern statistical problems often involve such linear inequality constraints on model parameters. Ignoring natural parameter constraints usually results in less efficient statistical procedures. To this end, we define a notion of `sparsity' for such restricted sets using lowerdimensional features. We allow our framework to be flexible so that the number of restrictions may be higher than the number of parameters. One such situation arise in estimation of monotone curve using a non parametric approach e.g. splines. We show that the proposed notion of sparsity agrees with the usual notion of sparsity in the unrestricted case and proves the validity of the proposed definition as a measure of sparsity. The proposed sparsity measure also allows us to generalize popular priors for sparse vector estimation to the constrained case.
 [3] arXiv:2302.01982 [pdf, other]

Title: multiGPATree: Statistical Approach for Pleiotropy Informed and Functional Annotation Tree Guided Prioritization of GWAS ResultsComments: 25 pages, 6 figures, 1 tableSubjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO)
Genomewide association studies (GWAS) have successfully identified over two hundred thousand genotypetrait associations. Yet some challenges remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), most with small or moderate effect sizes, making them difficult to detect. Second, many complex traits share a common genetic basis due to `pleiotropy' and and though few methods consider it, leveraging pleiotropy can improve statistical power to detect genotypetrait associations with weaker effect sizes. Third, currently available statistical methods are limited in explaining the functional mechanisms through which genetic variants are associated with specific or multiple traits. We propose multiGPATree to address these challenges. The multiGPATree approach can identify risk SNPs associated with single as well as multiple traits while also identifying the combinations of functional annotations that can explain the mechanisms through which riskassociated SNPs are linked with the traits.
First, we implemented simulation studies to evaluate the proposed multiGPATree method and compared its performance with an existing statistical approach.The results indicate that multiGPATree outperforms the existing statistical approach in detecting riskassociated SNPs for multiple traits. Second, we applied multiGPATree to a systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), and to a Crohn's disease (CD) and ulcertive colitis (UC) GWAS, and functional annotation data including GenoSkyline and GenoSkylinePlus. Our results demonstrate that multiGPATree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking riskassociated SNPs with complex traits.  [4] arXiv:2302.02015 [pdf, other]

Title: Nongreedy Treebased Learning for Estimating Global Optimal Dynamic Treatment Decision Rules with Continuous Treatment DosageSubjects: Methodology (stat.ME)
Dynamic treatment regime (DTR) plays a critical role in precision medicine when assigning patientspecific treatments at multiple stages and optimizing a long term clinical outcome. However, most of existing work about DTRs have been focused on categorical treatment scenarios, instead of continuous treatment options. Also, the performances of regular blackbox machine learning methods and regular tree learning methods are lack of interpretability and global optimality respectively. In this paper, we propose a nongreedy global optimization method for dose search, namely Global Optimal Dosage Treebased learning method (GoDoTree), which combines a robust estimation of the counterfactual outcome mean with an interpretable and nongreedy decision tree for estimating the global optimal dynamic dosage treatment regime in a multiplestage setting. GoDoTreeLearning recursively estimates how the counterfactual outcome mean depends on a continuous treatment dosage using doubly robust estimators at each stage, and optimizes the stagespecific decision tree in a nongreedy way. We conduct simulation studies to evaluate the finite sample performance of the proposed method and apply it to a real data application for optimal warfarin dose finding.
 [5] arXiv:2302.02024 [pdf, other]

Title: A Simple Approach for Local and Global Variable Importance in Nonlinear Regression ModelsSubjects: Methodology (stat.ME)
The ability to interpret machine learning models has become increasingly important as their usage in data science continues to rise. Most current interpretability methods are optimized to work on either (\textit{i}) a global scale, where the goal is to rank features based on their contributions to overall variation in an observed population, or (\textit{ii}) the local level, which aims to detail on how important a feature is to a particular individual in the dataset. In this work, we present the ``GlObal And Local Score'' (GOALS) operator: a simple \textit{post hoc} approach to simultaneously assess local and global feature variable importance in nonlinear models. Motivated by problems in statistical genetics, we demonstrate our approach using Gaussian process regression where understanding how genetic markers affect trait architecture both among individuals and across populations is of high interest. With detailed simulations and real data analyses, we illustrate the flexible and efficient utility of GOALS over stateoftheart variable importance strategies.
 [6] arXiv:2302.02033 [pdf, ps, other]

Title: An Asymptotically Optimal Algorithm for the OneDimensional Convex Hull Feasibility ProblemSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This work studies the pureexploration setting for the convex hull feasibility (CHF) problem where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHF problem in the onedimensional setting. We present the first asymptotically optimal algorithm called ThompsonCHF, whose modular design consists of a stopping rule and a sampling rule. In addition, we provide an extension of the algorithm that generalizes several important problems in the multiarmed bandit literature. Finally, we further investigate the Gaussian bandit case with unknown variances and address how the ThompsonCHF algorithm can be adjusted to be asymptotically optimal in this setting.
 [7] arXiv:2302.02043 [pdf, other]

Title: mixdistreg: An R Package for Fitting Mixture of Experts Distributional Regression with Adaptive Firstorder MethodsAuthors: David RügamerSubjects: Computation (stat.CO)
This paper presents a highlevel description of the R software package mixdistreg to fit mixture of experts distributional regression models. The proposed framework is implemented in R using the deepregression software template, which is based on TensorFlow and follows the neural structured additive learning principle. The software comprises various approaches as special cases, including mixture density networks and mixture regression approaches. Various code examples are given to demonstrate the package's functionality.
 [8] arXiv:2302.02053 [pdf, other]

Title: Modelbased Smoothing with Integrated Wiener Processes and Overlapping SplinesSubjects: Methodology (stat.ME)
In many applications that involve the inference of an unknown smooth function, the inference of its derivatives will often be just as important as that of the function itself. To make joint inferences of the function and its derivatives, a class of Gaussian processes called $p^{\text{th}}$ order Integrated Wiener's Process (IWP), is considered. Methods for constructing a finite element (FEM) approximation of an IWP exist but have focused only on the order $p = 2$ case which does not allow appropriate inference for derivatives, and their computational feasibility relies on additional approximation to the FEM itself. In this article, we propose an alternative FEM approximation, called overlapping splines (Ospline), which pursues computational feasibility directly through the choice of test functions, and mirrors the construction of an IWP as the Ospline results from the multiple integrations of these same test functions. The Ospline approximation applies for any order $p \in \mathbb{Z}^+$, is computationally efficient and provides consistent inference for all derivatives up to order $p1$. It is shown both theoretically, and empirically through simulation, that the Ospline approximation converges to the true IWP as the number of knots increases. We further provide a unified and interpretable way to define priors for the smoothing parameter based on the notion of predictive standard deviation (PSD), which is invariant to the order $p$ and the placement of the knot. Finally, we demonstrate the practical use of the Ospline approximation through simulation studies and an analysis of COVID death rates where the inference is carried on both the function and its derivatives where the latter has an important interpretation in terms of the course of the pandemic.
 [9] arXiv:2302.02110 [pdf, other]

Title: A ScalaronQuantileFunction Approach for Estimating Shortterm Health Effects of Environmental ExposuresSubjects: Applications (stat.AP)
Environmental epidemiologic studies routinely utilize aggregate health outcomes to estimate effects of shortterm (e.g., daily) exposures that are available at increasingly fine spatial resolutions. However, areal averages are typically used to derive populationlevel exposure, which cannot capture the spatial variation and individual heterogeneity in exposures that may occur within the spatial and temporal unit of interest (e.g., within day or ZIP code). We propose a general modeling approach to incorporate withinunit exposure heterogeneity in health analyses via exposure quantile functions. Furthermore, by viewing the exposure quantile function as a functional covariate, our approach provides additional flexibility in characterizing associations at different quantile levels. We apply the proposed approach to an analysis of air pollution and emergency department (ED) visits in Atlanta over four years. The analysis utilizes daily ZIP codelevel distributions of personal exposures to four trafficrelated ambient air pollutants simulated from the Stochastic Human Exposure and Dose Simulator. Our analyses find that effects of carbon monoxide on respiratory and cardiovascular disease ED visits are more pronounced with changes in lower quantiles of the populationlevel exposure. Software for implement is provided in the R package nbRegQF.
 [10] arXiv:2302.02156 [pdf, other]

Title: Challenges of cellwise outliersSubjects: Methodology (stat.ME); Computation (stat.CO)
It is wellknown that real data often contain outliers. The term outlier typically refers to a case, that is, a row of the $n \times d$ data matrix. In recent times a different type has come into focus, the cellwise outliers. These are suspicious cells (entries) that can occur anywhere in the data matrix. Even a relatively small proportion of outlying cells can contaminate over half the rows, which is a problem for rowwise robust methods. In this article we discuss the challenges posed by cellwise outliers, and some methods developed so far to deal with them. We obtain new results on cellwise breakdown values for location, covariance and regression. We also propose a cellwise robust method for correspondence analysis, with real data illustrations. The paper concludes by formulating some points for debate.
 [11] arXiv:2302.02228 [pdf, other]

Title: Counterfactual Identifiability of Bijective Causal ModelsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widelyused causal models in the literature. We establish their counterfactual identifiability for three common causal structures with unobserved confounding, and propose a practical learning method that casts learning a BGM as structured generative modeling. Learned BGMs enable efficient counterfactual estimation and can be obtained using a variety of deep conditional generative models. We evaluate our techniques in a visual task and demonstrate its application in a realworld video streaming simulation task.
 [12] arXiv:2302.02247 [pdf, ps, other]

Title: Spectral Density Estimation of FunctionValued Spatial ProcessesComments: 84 pages, 0 figuresSubjects: Statistics Theory (math.ST)
The spectral density function describes the secondorder properties of a stationary stochastic process on $\mathbb{R}^d$. This paper considers the nonparametric estimation of the spectral density of a continuoustime stochastic process taking values in a separable Hilbert space. Our estimator is based on kernel smoothing and can be applied to a wide variety of spatial sampling schemes including those in which data are observed at irregular spatial locations. Thus, it finds immediate applications in Spatial Statistics, where irregularly sampled data naturally arise. The rates for the bias and variance of the estimator are obtained under general conditions in a mixeddomain asymptotic setting. When the data are observed on a regular grid, the optimal rate of the estimator matches the minimax rate for the class of covariance functions that decay according to a power law. The asymptotic normality of the spectral density estimator is also established under general conditions for Gaussian Hilbertspace valued processes. Finally, with a view towards practical applications the asymptotic results are specialized to the case of discretelysampled functional data in a reproducing kernel Hilbert space.
 [13] arXiv:2302.02254 [pdf, other]

Title: Getting to "rateoptimal'' in ranking & selectionJournalref: Proceedings of the 2021 Winter Simulation ConferenceSubjects: Computation (stat.CO); Statistics Theory (math.ST)
In their 2004 seminal paper, Glynn and Juneja formally and precisely established the rateoptimal, probabilityofincorrectselection, replication allocation scheme for selecting the best of k simulated systems. In the case of independent, normally distributed outputs this allocation has a simple form that depends in an intuitively appealing way on the true means and variances. Of course the means and (typically) variances are unknown, but the rateoptimal allocation provides a target for implementable, dynamic, datadriven policies to achieve. In this paper we compare the empirical behavior of four related replicationallocation policies: mCEI from Chen and Rzyhov and our new gCEI policy that both converge to the Glynn and Juneja allocation; AOMAP from Peng and Fu that converges to the OCBA optimal allocation; and TTTS from Russo that targets the rate of convergence of the posterior probability of incorrect selection. We find that these policies have distinctly different behavior in some settings.
 [14] arXiv:2302.02286 [pdf, other]

Title: Optimal subsampling for the Cox proportional hazards model with massive survival dataSubjects: Computation (stat.CO)
The use of massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for the Cox proportional hazards model with timedependent covariates when the sample is extraordinarily large but computing resources are relatively limited. A subsample estimator is developed by maximizing the weighted partial likelihood; it is shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expressions. Simulation studies show that the proposed method can satisfactorily approximate the estimator of the full dataset. The proposed method is then applied to corporate loan and breast cancer datasets, with different censoring rates, and the outcomes confirm its practical advantages.
 [15] arXiv:2302.02288 [pdf, other]

Title: Efficient Adaptive Sobel and Joint Significance Tests for Mediation EffectsAuthors: Haixiang ZhangSubjects: Methodology (stat.ME); Applications (stat.AP)
Mediation analysis is an important statistical tool in many research fields. Particularly, the Sobel test and joint significance test are two popular statistical test methods for mediation effects when we perform mediation analysis in practice. However, the drawback of both mediation testing methods is arising from the conservative type I error, which has reduced their powers and imposed restrictions on their popularity and usefulness. As a matter of fact, this limitation is longstanding for both methods in the medation analysis literature. To deal with this issue, we propose the adaptive Sobel test and adaptive joint significance test for mediation effects, which have significant improvements over the traditional Sobel and joint significance test methods. Meanwhile, our method is userfriendly and intelligible without involving more complicated procedures. The explicit expressions for sizes and powers are derived, which ensure the theoretical rationality of our method. Furthermore, we extend the proposed adaptive Sobel and joint significance tests for multiple mediators with familywise error rate control. Extensive simulations are conducted to evaluate the performance of our mediation testing procedure. Finally, we illustrate the usefulness of our method by analysing three realworld datasets with continuous, binary and timetoevent outcomes, respectively.
 [16] arXiv:2302.02304 [pdf, other]

Title: Crowdsourcing Utilizing Subgroup Structure of Latent Factor ModelingSubjects: Methodology (stat.ME)
Crowdsourcing has emerged as an alternative solution for collecting large scale labels. However, the majority of recruited workers are not domain experts, so their contributed labels could be noisy. In this paper, we propose a twostage model to predict the true labels for multicategory classification tasks in crowdsourcing. In the first stage, we fit the observed labels with a latent factor model and incorporate subgroup structures for both tasks and workers through a multicentroid grouping penalty. Groupspecific rotations are introduced to align workers with different task categories to solve multicategory crowdsourcing tasks. In the second stage, we propose a concordancebased approach to identify highquality worker subgroups who are relied upon to assign labels to tasks. In theory, we show the estimation consistency of the latent factors and the prediction consistency of the proposed method. The simulation studies show that the proposed method outperforms the existing competitive methods, assuming the subgroup structures within tasks and workers. We also demonstrate the application of the proposed method to real world problems and show its superiority.
 [17] arXiv:2302.02310 [pdf, other]

Title: $\ell_1$penalized Multinomial Regression: Estimation, inference, and prediction, with an application to risk factor identification for different dementia subtypesComments: 23 pages, 3 figures, 20 tablesSubjects: Methodology (stat.ME); Applications (stat.AP)
Highdimensional multinomial regression models are very useful in practice but receive less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrastbased $\ell_1$penalized multinomial regression model and extend the debiasing method to the multinomial case, which provides a valid confidence interval for each coefficient and $p$value of the individual hypothesis test. We apply the debiasing method to identify some important predictors in the progression into dementia of different subtypes. Results of intensive simulations show the superiority of the debiasing method compared to some other inference methods.
 [18] arXiv:2302.02406 [pdf, other]

Title: Prescreening breast cancer with machine learning and deep learningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We suggest that deep learning can be used for prescreening cancer by analyzing demographic and anthropometric information of patients, as well as biological markers obtained from routine blood samples and relative risks obtained from metaanalysis and international databases. We applied feature selection algorithms to a database of 116 women, including 52 healthy women and 64 women diagnosed with breast cancer, to identify the best prescreening predictors of cancer. We utilized the best predictors to perform kfold Monte Carlo crossvalidation experiments that compare deep learning against traditional machine learning algorithms. Our results indicate that a deep learning model with an inputlayer architecture that is finetuned using feature selection can effectively distinguish between patients with and without cancer. Additionally, compared to machine learning, deep learning has the lowest uncertainty in its predictions. These findings suggest that deep learning algorithms applied to cancer prescreening offer a radiationfree, noninvasive, and affordable complement to screening methods based on imagery. The implementation of deep learning algorithms in cancer prescreening offer opportunities to identify individuals who may require imagingbased screening, can encourage selfexamination, and decrease the psychological externalities associated with false positives in cancer screening. The integration of deep learning algorithms for both screening and prescreening will ultimately lead to earlier detection of malignancy, reducing the healthcare and societal burden associated to cancer treatment.
 [19] arXiv:2302.02415 [pdf, ps, other]

Title: On Kronecker Separability of Multiway CovarianceComments: 15 pagesSubjects: Statistics Theory (math.ST); Methodology (stat.ME)
Multiway data analysis is aimed at inferring patterns from data represented as a multidimensional array. Estimating covariance from multiway data is a fundamental statistical task, however, the intrinsic high dimensionality poses significant statistical and computational challenges. Recently, several factorized covariance models, paired with estimation algorithms, have been proposed to circumvent these obstacles. Despite several promising results on the algorithmic front, it remains underexplored whether and when such a model is valid. To address this question, we define the notion of Kroneckerseparable multiway covariance, which can be written as a sum of $r$ tensor products of modewise covariances. The question of whether a given covariance can be represented as a separable multiway covariance is then reduced to an equivalent question about separability of quantum states. Using this equivalence, it follows directly that a generic multiway covariance tends to be nonseparable (even if $r \to \infty$), and moreover, finding its best separable approximation is NPhard. These observations imply that factorized covariance models are restrictive and should be used only when there is a compelling rationale for such a model.
 [20] arXiv:2302.02432 [pdf, other]

Title: Tighter InformationTheoretic Generalization Bounds from SupersamplesSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
We present a variety of novel informationtheoretic generalization bounds for learning algorithms, from the supersample setting of Steinke & Zakynthinou (2020)the setting of the "conditional mutual information" framework. Our development exploits projecting the loss pair (obtained from a training instance and a testing instance) down to a single number and correlating loss values with a Rademacher sequence (and its shifted variants). The presented bounds include squareroot bounds, fastrate bounds, including those based on variance and sharpness, and bounds for interpolating algorithms etc. We show theoretically or empirically that these bounds are tighter than all informationtheoretic bounds known to date on the same supersample setting.
 [21] arXiv:2302.02455 [pdf, other]

Title: ODEWS: The Overdraft Early Warning SystemSubjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)
When a customer overdraws their account and their balance is negative they are assessed an overdraft fee. Americans pay approximately \$15 billion in unnecessary overdraft fees a year, often in \$35 increments; users of the Mint personal finance app pay approximately \$250 million in fees a year in particular. These overdraft fees are an excessive financial burden and lead to cascading overdraft fees trapping customers in financial hardship. To address this problem, we have created an MLdriven overdraft early warning system (ODEWS) that assesses a customer's risk of overdrafting within the next week using their banking and transaction data in the Mint app. Atrisk customers are sent an alert so they can take steps to avoid the fee, ultimately changing their behavior and financial habits. The system deployed resulted in a \$3 million savings in overdraft fees for Mint customers compared to a control group. Moreover, the methodology outlined here can be generalized to provide MLdriven personalized financial advice for many different personal finance goalsincrease credit score, build emergency savings fund, pay down debut, allocate capital for investment.
 [22] arXiv:2302.02457 [pdf, other]

Title: Scalable inference in functional linear regression with streaming dataSubjects: Methodology (stat.ME)
Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an everincreasing amount of data in memory is nearly impossible. In addition, existing inferential tools in online learning are mainly developed for finitedimensional problems, while inference methods for functional data are focused on the batch learning setting. In this paper, we tackle these issues by developing functional stochastic gradient descent algorithms and proposing an online bootstrap resampling procedure to systematically study the inference problem for functional linear regression. In particular, the proposed estimation and inference procedures use only one pass over the data; thus they are easy to implement and suitable to the situation where data arrive in a streaming manner. Furthermore, we establish the convergence rate as well as the asymptotic distribution of the proposed estimator. Meanwhile, the proposed perturbed estimator from the bootstrap procedure is shown to enjoy the same theoretical properties, which provide the theoretical justification for our online inference tool. As far as we know, this is the first inference result on the functional linear regression model with streaming data. Simulation studies are conducted to investigate the finitesample performance of the proposed procedure. An application is illustrated with the Beijing multisite airquality data.
 [23] arXiv:2302.02468 [pdf, other]

Title: Circular and spherical projected Cauchy distributionsComments: PreprintSubjects: Methodology (stat.ME)
Two new distributions are proposed: the circular projected and the spherical projected Cauchy distributions. A special case of the circular projected Cauchy coincides with the wrapped Cauchy distribution, and for this, a generalization is suggested that offers better fit via the inclusion of an extra parameter. For the spherical case, by imposing two conditions on the scatter matrix we end up with an elliptically symmetric distribution. All distributions allow for a closedform normalizing constant and straightforward random values generation, while their parameters can be estimated via maximum likelihood. The bias of the estimated parameters is assessed via numerical studies, while exhibitions using real data compare them further to some existing models indicating better fits.
 [24] arXiv:2302.02476 [pdf, other]

Title: Estimating TimeVarying Networks for HighDimensional Time SeriesSubjects: Methodology (stat.ME); Econometrics (econ.EM)
We explore timevarying networks for highdimensional locally stationary time series, using the large VAR model framework with both the transition and (error) precision matrices evolving smoothly over time. Two types of timevarying graphs are investigated: one containing directed edges of Granger causality linkages, and the other containing undirected edges of partial correlation linkages. Under the sparse structural assumption, we propose a penalised local linear method with timevarying weighted group LASSO to jointly estimate the transition matrices and identify their significant entries, and a timevarying CLIME method to estimate the precision matrices. The estimated transition and precision matrices are then used to determine the timevarying network structures. Under some mild conditions, we derive the theoretical properties of the proposed estimates including the consistency and oracle properties. In addition, we extend the methodology and theory to cover highlycorrelated largescale time series, for which the sparsity assumption becomes invalid and we allow for common factors before estimating the factoradjusted timevarying networks. We provide extensive simulation studies and an empirical application to a large U.S. macroeconomic dataset to illustrate the finitesample performance of our methods.
 [25] arXiv:2302.02482 [pdf, other]

Title: Continuously Indexed Graphical ModelsSubjects: Statistics Theory (math.ST); Probability (math.PR); Methodology (stat.ME)
Let $X = \{X_{u}\}_{u \in U}$ be a realvalued Gaussian process indexed by a set $U$. It can be thought of as an undirected graphical model with every random variable $X_{u}$ serving as a vertex. We characterize this graph in terms of the covariance of $X$ through its reproducing kernel property. Unlike other characterizations in the literature, our characterization does not restrict the index set $U$ to be finite or countable, and hence can be used to model the intrinsic dependence structure of stochastic processes in continuous time/space. Consequently, the said characterization is not (and apparently cannot be) of the inversezero type. This poses novel challenges for the problem of recovery of the dependence structure from a sample of independent realizations of $X$, also known as structure estimation. We propose a methodology that circumvents these issues, by targeting the recovery of the underlying graph up to a finite resolution, which can be arbitrarily fine and is limited only by the available sample size. The recovery is shown to be consistent so long as the graph is sufficiently regular in an appropriate sense, and convergence rates are provided. Our methodology is illustrated by simulation and two data analyses.
 [26] arXiv:2302.02486 [pdf, other]

Title: The DifferenceofLogNormals Distribution: Properties, Estimation, and GrowthAuthors: Robert ParhamSubjects: Methodology (stat.ME); Statistics Theory (math.ST); General Finance (qfin.GN)
This paper describes the DifferenceofLogNormals (DLN) distribution. A companion paper makes the case that the DLN is a fundamental distribution in nature, and shows how a simple application of the CLT gives rise to the DLN in many disparate phenomena. Here, I characterize its PDF, CDF, moments, and parameter estimators; generalize it to Ndimensions using spherical distribution theory; describe methods to deal with its signature ``doubleexponential'' nature; and use it to generalize growth measurement to possiblynegative variates distributing DLN. I also conduct MonteCarlo experiments to establish some properties of the estimators and measures described.
 [27] arXiv:2302.02488 [pdf, other]

Title: A threestate coupled Markov switching model for COVID19 outbreaks across Quebec based on hospital admissionsSubjects: Applications (stat.AP); Populations and Evolution (qbio.PE)
Recurrent COVID19 outbreaks have placed immense strain on the hospital system in Quebec. We develop a Bayesian threestate coupled Markov switching model to analyze COVID19 outbreaks across Quebec based on admissions in the 30 largest hospitals. Within each catchment area we assume the existence of three states for the disease: absence, a new state meant to account for many zeroes in some of the smaller areas, endemic and outbreak. Then we assume the disease switches between the three states in each area through a series of coupled nonhomogeneous hidden Markov chains. Unlike previous approaches, the transition probabilities may depend on covariates and the occurrence of outbreaks in neighboring areas, to account for geographical outbreak spread. Additionally, to prevent rapid switching between endemic and outbreak periods we introduce clone states into the model which enforce minimum endemic and outbreak durations. We make some interesting findings such as that mobility in retail and recreation venues had a strong positive association with the development and persistence of new COVID19 outbreaks in Quebec. Based on model comparison our contributions show promise in improving state estimation retrospectively and in realtime, especially when there are smaller areas and highly spatially synchronized outbreaks, and they offer new and interesting epidemiological interpretations.
 [28] arXiv:2302.02497 [pdf, other]

Title: Highdimensional Location Estimation via Norm Concentration for Subgamma VectorsSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $\lambda$, and want to estimate $\lambda$ as precisely as possible. Asymptotically, the maximum likelihood estimate achieves the Cram\'erRao bound of error $\mathcal N(0, \frac{1}{n\mathcal I})$, where $\mathcal I$ is the Fisher information of $f$. However, the $n$ required for convergence depends on $f$, and may be arbitrarily large. We build on the theory using \emph{smoothed} estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$smoothed distribution. As $n \to \infty$, $r \to 0$ at an explicit rate and this converges to the Cram\'erRao bound. We (1) improve the prior work for 1dimensional $f$ to converge for constant failure probability in addition to high probability, and (2) extend the theory to highdimensional distributions. In the process, we prove a new bound on the norm of a highdimensional random variable whose 1dimensional projections are subgamma, which may be of independent interest.
 [29] arXiv:2302.02544 [pdf, other]

Title: Sequential change detection via backward confidence sequencesComments: 24 pages, 10 figuresSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $\theta$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $\theta$, then we can also successfully perform SCD for $\theta$. This is accomplished by checking if two CSs  one forwards and the other backwards  ever fail to intersect. Since the literature on CSs has been rapidly evolving recently, the reduction provided in this paper immediately solves several old and new change detection problems. Further, our "backward CS", constructed by reversing time, is new and potentially of independent interest. We provide strong nonasymptotic guarantees on the frequency of false alarms and detection delay, and demonstrate numerical effectiveness on several problems.
 [30] arXiv:2302.02590 [pdf, ps, other]

Title: Consensus dynamics and coherence in hierarchical smallworld networksSubjects: Methodology (stat.ME); Discrete Mathematics (cs.DM)
The hierarchical smallworld network is a realworld network. It models well the benefit transmission web of the pyramid selling in China and many other countries. In this paper, by applying the spectral graph theory, we study three important aspects of the consensus problem in the hierarchical smallworld network: convergence speed, communication timedelay robustness, and network coherence. Firstly, we explicitly determine the Laplacian eigenvalues of the hierarchical smallworld network by making use of its treelike structure. Secondly, we find that the consensus algorithm on the hierarchical smallworld network converges faster than that on some wellstudied sparse networks, but is less robust to time delay. The closedform of the firstorder and the secondorder network coherence are also derived. Our result shows that the hierarchical smallworld network has an optimal structure of noisy consensus dynamics. Therefore, we provide a positive answer to two open questions of Yi \emph{et al}. Finally, we argue that some network structure characteristics, such as large maximum degree, small average path length, and large vertex and edge connectivity, are responsible for the strong robustness with respect to external perturbations.
 [31] arXiv:2302.02613 [pdf, ps, other]

Title: An asymptotic behavior of a finitesection of the optimal causal filterAuthors: Junho YangSubjects: Statistics Theory (math.ST)
We derive an $L_1$bound between the coefficients of the optimal causal filter applied to the datagenerating process and its approximation based on finite sample observations. Here, we assume that the datagenerating process is secondorder stationary with either short or long memory autocovariances. To obtain the $L_1$bound, we first provide an exact expression of the causal filter coefficients and their approximation in terms of the absolute convergent series of the multistep ahead infinite and finite predictor coefficients, respectively. Then, we prove a socalled uniformtype Baxter's inequality to obtain a bound for the difference between the two multistep ahead predictor coefficients (under both short and memory time series). The $L_1$approximation error bound of the causal filter coefficients can be used to evaluate the quality of the predictions of time series through the mean squared error criterion.
 [32] arXiv:2302.02670 [pdf, other]

Title: Random Forests for timefixed and timedependent predictors: The DynForest R packageSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The R package DynForest implements random forests for predicting a categorical or a (multiple causes) timetoevent outcome based on timefixed and timedependent predictors. Through the random forests, the timedependent predictors can be measured with error at subjectspecific times, and they can be endogeneous (i.e., impacted by the outcome process). They are modeled internally using flexible linear mixed models (thanks to lcmm package) with timeassociations prespecified by the user. DynForest computes dynamic predictions that take into account all the information from timefixed and timedependent predictors. DynForest also provides information about the most predictive variables using variable importance and minimal depth. Variable importance can also be computed on groups of variables. To display the results, several functions are available such as summary and plot functions. This paper aims to guide the user with a stepbystep example of the different functions for fitting random forests within DynForest.
 [33] arXiv:2302.02672 [pdf, other]

Title: Identifiability of latentvariable and structuralequation models: from linear to nonlinearSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, nonGaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, nonGaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. NonGaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.
 [34] arXiv:2302.02718 [pdf, other]

Title: A LogLinear NonParametric Online Changepoint Detection Algorithm based on Functional PruningSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
Online changepoint detection aims to detect anomalies and changes in realtime in highfrequency data streams, sometimes with limited available computational resources. This is an important task that is rooted in many realworld applications, including and not limited to cybersecurity, medicine and astrophysics. While fast and efficient online algorithms have been recently introduced, these rely on parametric assumptions which are often violated in practical applications. Motivated by data streams from the telecommunications sector, we build a flexible nonparametric approach to detect a change in the distribution of a sequence. Our procedure, NPFOCuS, builds a sequential likelihood ratio test for a change in a set of points of the empirical cumulative density function of our data. This is achieved by keeping track of the number of observations above or below those points. Thanks to functional pruning ideas, NPFOCuS has a computational cost that is loglinear in the number of observations and is suitable for highfrequency data streams. In terms of detection power, NPFOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings. We demonstrate the utility of the procedure on both simulated and real data.
 [35] arXiv:2302.02766 [pdf, other]

Title: Generalization Bounds with Datadependent Fractal DimensionsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometrybased generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a datadependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of "geometric stability" and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed datadependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings.
 [36] arXiv:2302.02768 [pdf, other]

Title: Network Autoregression for Incomplete MatrixValued Time SeriesSubjects: Methodology (stat.ME)
We study the dynamics of matrixvalued time series with observed network structures by proposing a matrix network autoregression model with row and column networks of the subjects. We incorporate covariate information and a low rank intercept matrix. We allow incomplete observations in the matrices and the missing mechanism can be covariate dependent. To estimate the model, a twostep estimation procedure is proposed. The first step aims to estimate the network autoregression coefficients, and the second step aims to estimate the regression parameters, which are matrices themselves. Theoretically, we first separately establish the asymptotic properties of the autoregression coefficients and the error bounds of the regression parameters. Subsequently, a bias reduction procedure is proposed to reduce the asymptotic bias and the theoretical property of the debiased estimator is studied. Lastly, we illustrate the usefulness of the proposed method through a number of numerical studies and an analysis of a Yelp data set.
 [37] arXiv:2302.02774 [pdf, other]

Title: The SSL Interplay: Augmentations, Inductive Bias, and GeneralizationSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)
Selfsupervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup, and highlight several insights for SSL practitioners that arise from our theory.
 [38] arXiv:2302.02859 [pdf, other]

Title: A Fast Bootstrap Algorithm for Causal Inference with Large DataComments: 46 pagesSubjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in noncausal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this paper, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.
 [39] arXiv:2302.02923 [pdf, other]

Title: In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect EstimationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)
Personalized treatment effect estimates are often of interest in highstakes applications  thus, before deploying a model estimating such effects in practice, one needs to be sure that the best candidate from the evergrowing machine learning toolbox for this task was chosen. Unfortunately, due to the absence of counterfactual information in practice, it is usually not possible to rely on standard validation metrics for doing so, leading to a wellknown model selection dilemma in the treatment effect estimation literature. While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner', we therefore empirically investigate success and failure modes of different selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the DGP used for testing, and provide interesting insights into the relative (dis)advantages of different criteria alongside desiderata for the design of further illuminating empirical studies in this context.
 [40] arXiv:2302.02942 [pdf, other]

Title: Empirical quantification of predictive uncertainty due to model discrepancy by training with an ensemble of experimental designs: an application to ion channel kineticsAuthors: Joseph G. Shuttleworth, Chon Lok Lei, Dominic G. Whittaker, Monique J. Windley, Adam P. Hill, Simon P. Preston, Gary R. MiramsComments: 23 pages, 9 figuresSubjects: Computation (stat.CO); Dynamical Systems (math.DS); Optimization and Control (math.OC); Quantitative Methods (qbio.QM)
When mathematical biology models are used to make quantitative predictions for clinical or industrial use, it is important that these predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises  where a mathematical model fails to recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for making accurate estimates of uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data to train their models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments, which are used to investigate the kinetics of the hERG potassium ion channel. Here, `informationrich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. Typically, assuming independent observational errors and training a model to an individual experiment results in parameter estimates with very little dependence on observational noise. Moreover, parameter sets arising from the same model applied to different experiments often conflict  indicative of model discrepancy. Our methods will help select more suitable mathematical models of hERG for future studies, and will be widely applicable to a range of biological modelling problems.
 [41] arXiv:2302.02950 [pdf, other]

Title: A multidimensional objective prior distribution from a scoring ruleComments: 20 pages, 5 figures, 10 tablesSubjects: Methodology (stat.ME)
The construction of objective priors is, at best, challenging for multidimensional parameter spaces. A common practice is to assume independence and set up the joint prior as the product of marginal distributions obtained via "standard" objective methods, such as Jeffreys or reference priors. However, the assumption of independence a priori is not always reasonable, and whether it can be viewed as strictly objective is still open to discussion. In this paper, by extending a previously proposed objective approach based on scoring rules for the one dimensional case, we propose a novel objective prior for multidimensional parameter spaces which yields a dependence structure. The proposed prior has the appealing property of being proper and does not depend on the chosen model; only on the parameter space considered.
 [42] arXiv:2302.02954 [pdf, other]

Title: Maximum likelihood estimator for skew Brownian motion: the convergence rateSubjects: Statistics Theory (math.ST); Probability (math.PR)
We give a thorough description of the asymptotic property of the maximum likelihood estimator (MLE) of the skewness parameter of a Skew Brownian Motion (SBM). Thanks to recent results on the Central Limit Theorem of the rate of convergence of estimators for the SBM, we prove a conjecture left open that the MLE has asymptotically a mixed normal distribution involving the local time with a rate of convergence of order $1/4$. We also give a series expansion of the MLE and study the asymptotic behavior of the score and its derivatives, as well as their variation with the skewness parameter. In particular, we exhibit a specific behavior when the SBM is actually a Brownian motion, and quantify the explosion of the coefficients of the expansion when the skewness parameter is close to $1$ or $1$.
 [43] arXiv:2302.03026 [pdf, other]

Title: SamplingBased Accuracy Testing of Posterior Estimators for General InferenceComments: 15 pagesSubjects: Machine Learning (stat.ML); Instrumentation and Methods for Astrophysics (astroph.IM); Machine Learning (cs.LG); Methodology (stat.ME)
Parameter inference, i.e. inferring the posterior distribution of the parameters of a statistical model given some data, is a central problem to many scientific disciplines. Posterior inference with generative models is an alternative to methods such as Markov Chain Monte Carlo, both for likelihoodbased and simulationbased inference. However, assessing the accuracy of posteriors encoded in generative models is not straightforward. In this paper, we introduce `distance to random point' (DRP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators.
Our method differs from previouslyexisting coveragebased methods, which require posterior evaluations. We prove that our approach is necessary and sufficient to show that a posterior estimator is optimal. We demonstrate the method on a variety of synthetic examples, and show that DRP can be used to test the results of posterior inference analyses in highdimensional spaces. We also show that our method can detect nonoptimal inferences in cases where existing methods fail.
Crosslists for Tue, 7 Feb 23
 [44] arXiv:2302.02009 (crosslist from cs.LG) [pdf, other]

Title: Domain Adaptation via Rebalanced Subdomain AlignmentAuthors: Yiling Liu, Juncheng Dong, Ziyang Jiang, Ahmed Aloui, Keyu Li, Hunter Klein, Vahid Tarokh, David CarlsonComments: 20 pages, 6 figures, 4 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in realworld scenarios. To address this limitation, we propose a novel generalization bound that reweights source classification error by aligning source and target subdomains. We prove that our proposed generalization bound is at least as strong as existing bounds under realistic assumptions, and we empirically show that it is much stronger on realworld data. We then propose an algorithm to minimize this novel generalization bound. We demonstrate by numerical experiments that this approach improves performance in shifted class distribution scenarios compared to stateoftheart methods.
 [45] arXiv:2302.02056 (crosslist from cs.DS) [pdf, other]

Title: SketchFlipMerge: Mergeable Sketches for Private Distinct CountingComments: 28 pages, 5 figuresSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR); Computation (stat.CO)
Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on realworld data.
 [46] arXiv:2302.02061 (crosslist from cs.LG) [pdf, other]

Title: Reinforcement Learning with HistoryDependent Dynamic ContextsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for historydependent environments that generalizes the contextual MDP framework to handle nonMarkov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upperconfidencebound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical modelbased algorithm for logistic DCMDPs that plans in a latent space and uses optimism over historydependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.
 [47] arXiv:2302.02092 (crosslist from cs.LG) [pdf, other]

Title: Interpolation for Robust Learning: Data Augmentation on GeodesicsComments: 33 pages, 3 figures, 18 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worstcase Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connecting subpopulation distributions. (3) Additionally, we provide a theoretical guarantee of robustness improvement and investigate how the geodesic location and the sample size contribute, respectively. Experimental validations of the proposed strategy on four datasets, including CIFAR100 and ImageNet, establish the efficacy of our method, e.g., our method improves the baselines' certifiable robustness on CIFAR10 up to $7.7\%$, with $16.8\%$ on empirical robustness on CIFAR100. Our work provides a new perspective of model robustness through the lens of Wasserstein geodesicbased interpolation with a practical offtheshelf strategy that can be combined with existing robust training methods.
 [48] arXiv:2302.02139 (crosslist from cs.LG) [pdf, other]

Title: Structural Explanations for Graph Neural Networks using HSICSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Graph neural networks (GNNs) are a type of neural model that tackle graphical tasks in an endtoend manner. Recently, GNNs have been receiving increased attention in machine learning and data mining communities because of the higher performance they achieve in various tasks, including graph classification, link prediction, and recommendation. However, the complicated dynamics of GNNs make it difficult to understand which parts of the graph features contribute more strongly to the predictions. To handle the interpretability issues, recently, various GNN explanation methods have been proposed. In this study, a flexible model agnostic explanation method is proposed to detect significant structures in graphs using the HilbertSchmidt independence criterion (HSIC), which captures the nonlinear dependency between two variables through kernels. More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lassobased node explanation method. The group and fused regularization with GraphLIME enables the interpretation of GNNs in substructure units. Then, we show that the proposed approach can be used for the explanation of sequential graph classification tasks. Through experiments, it is demonstrated that our method can identify crucial structures in a target graph in various settings.
 [49] arXiv:2302.02155 (crosslist from cs.LG) [pdf, other]

Title: Guaranteed Tensor Recovery Fused Lowrankness and SmoothnessSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
The tensor data recovery task has thus attracted much research attention in recent years. Solving such an illposed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor priors, i.e., global lowrankness (L) and local smoothness (S) across different tensor modes, which are always encoded as a sum of two separate regularization terms into the recovery models. However, unlike the primary theoretical developments on lowrank tensor recovery, these joint L+S models have no theoretical exactrecovery guarantees yet, making the methods lack reliability in real practice. To this crucial issue, in this work, we build a unique regularization term, which essentially encodes both L and S priors of a tensor simultaneously. Especially, by equipping this single regularizer into the recovery models, we can rigorously prove the exact recovery guarantees for two typical tensor recovery tasks, i.e., tensor completion (TC) and tensor robust principal component analysis (TRPCA). To the best of our knowledge, this should be the first exactrecovery results among all related L+S methods for tensor recovery. Significant recovery accuracy improvements over many other SOTA methods in several TC and TRPCA tasks with various kinds of visual tensor data are observed in extensive experiments. Typically, our method achieves a workable performance when the missing rate is extremely large, e.g., 99.5%, for the color image inpainting task, while all its peers totally fail in such challenging case.
 [50] arXiv:2302.02200 (crosslist from math.CO) [pdf, other]

Title: Rankbased linkage I: triplet comparisons and oriented simplicial complexesComments: 37 pages, 12 figuresSubjects: Combinatorics (math.CO); Statistics Theory (math.ST)
Rankbased linkage is a new tool for summarizing a collection $S$ of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on $S$. Rankbased linkage is applied to the $K$nearest neighbor digraph derived from a ranking system. Computations occur on a 2dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected $K$nearest neighbor graph on $S$. In $S K^2$ steps it builds an edgeweighted linkage graph $(S, \mathcal{L}, \sigma)$ where $\sigma(\{x, y\})$ is called the insway between objects $x$ and $y$. Take $\mathcal{L}_t$ to be the links whose insway is at least $t$, and partition $S$ into components of the graph $(S, \mathcal{L}_t)$, for varying $t$. Rankbased linkage is a functor from a category of outordered digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rankrespectful way gives a fresh clustering which does not ``rip apart`` the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimizationbased methods. Open combinatorial problems are presented in the last section.
 [51] arXiv:2302.02224 (crosslist from cs.LG) [pdf, other]

Title: TAP: The Attention Patch for CrossModal Knowledge Transfer from Unlabeled DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This work investigates the intersection of cross modal learning and semi supervised learning, where we aim to improve the supervised learning performance of the primary modality by borrowing missing information from an unlabeled modality. We investigate this problem from a Nadaraya Watson (NW) kernel regression perspective and show that this formulation implicitly leads to a kernelized cross attention module. To this end, we propose The Attention Patch (TAP), a simple neural network plugin that allows data level knowledge transfer from the unlabeled modality. We provide numerical simulations on three real world datasets to examine each aspect of TAP and show that a TAP integration in a neural network can improve generalization performance using the unlabeled modality.
 [52] arXiv:2302.02252 (crosslist from cs.LG) [pdf, other]

Title: Reinforcement Learning in LowRank MDPs with Density FeaturesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
MDPs with lowrank transitions  that is, the transition matrix can be factored into the product of two matrices, left and right  is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for valuebased learning and has been studied extensively. In this work, we instead investigate sampleefficient learning with density features, i.e., the right matrix, which induce powerful models for stateoccupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plugin solutions for convex RL. In the offline setting, we propose an algorithm for offpolicy estimation of occupancies that can handle nonexploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a levelbylevel manner. As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage. In the absence of strong assumptions like reachability, this incompatibility easily leads to exponential error blowup, which we overcome via novel technical tools. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.
 [53] arXiv:2302.02277 (crosslist from cs.LG) [pdf, other]

Title: SE(3) diffusion model with application to protein backbone generationAuthors: Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi JaakkolaSubjects: Machine Learning (cs.LG); Quantitative Methods (qbio.QM); Machine Learning (stat.ML)
The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.
 [54] arXiv:2302.02323 (crosslist from cs.LG) [pdf, other]

Title: Improving Fair Training under Correlation ShiftsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Model fairness is an essential element for Trustworthy AI. While many techniques for model fairness have been proposed, most of them assume that the training and deployment data distributions are identical, which is often not true in practice. In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing inprocessing fair algorithms have fundamental limits in accuracy and group fairness. We introduce the notion of correlation shifts, which can explicitly capture the change of the above bias. Second, we propose a novel preprocessing step that samples the input data to reduce correlation shifts and thus enables the inprocessing approaches to overcome their limitations. We formulate an optimization problem for adjusting the data ratio among labels and sensitive groups to reflect the shifted correlation. A key benefit of our approach lies in decoupling the roles of pre and inprocessing approaches: correlation adjustment via preprocessing and unfairness mitigation on the processed data via inprocessing. Experiments show that our framework effectively improves existing inprocessing fair algorithms w.r.t. accuracy and fairness, both on synthetic and real datasets.
 [55] arXiv:2302.02392 (crosslist from cs.LG) [pdf, ps, other]

Title: Refined ValueBased Offline RL under Realizability and Partial CoverageComments: Under reviewSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose valuebased algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropyregularized) Qfunction of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Qfunctions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Qfunctions with $L^2$convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
 [56] arXiv:2302.02420 (crosslist from cs.LG) [pdf, other]

Title: Direct Uncertainty QuantificationComments: 21 pages, 16 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Traditional neural networks are simple to train but they produce overconfident predictions, while Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming. This paper introduces a new approach, direct uncertainty quantification (DirectUQ), that combines their advantages where the neural network directly models uncertainty in output space, and captures both aleatoric and epistemic uncertainty. DirectUQ can be derived as an alternative variational lower bound, and hence benefits from collapsed variational inference that provides improved regularizers. On the other hand, like nonprobabilistic models, DirectUQ enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. Experiments show that DirectUQ and ensembles of DirectUQ provide a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.
 [57] arXiv:2302.02460 (crosslist from cs.LG) [pdf, other]

Title: Nonparametric Density Estimation under Distribution DriftSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study nonparametric density estimation in nonstationary drift settings. Given a sequence of independent samples taken from a distribution that gradually changes in time, the goal is to compute the best estimate for the current distribution. We prove tight minimax risk bounds for both discrete and continuous smooth densities, where the minimum is over all possible estimates and the maximum is over all possible distributions that satisfy the drift constraints. Our technique handles a broad class of drift models, and generalizes previous results on agnostic learning under drift.
 [58] arXiv:2302.02526 (crosslist from cs.LG) [pdf, other]

Title: On Private and Robust BanditsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
We study private and robust multiarmed bandits (MABs), where the agent receives Huber's contaminated heavytailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the informationtheoretic limit of regret with respect to privacy budget, contamination level and heavytailedness. Then, we propose a metaalgorithm that builds on a private and robust mean estimation subroutine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism only. For two different heavytailed settings, we give specific schemes of \texttt{PRM}, which enable us to achieve nearlyoptimal regret. As byproducts of our main results, we also give the first minimax lower bound for private heavytailed MABs (i.e., without contamination). Moreover, our two proposed truncationbased \texttt{PRM} achieve the optimal tradeoff between estimation accuracy, privacy and robustness. Finally, we support our theoretical results with experimental studies.
 [59] arXiv:2302.02552 (crosslist from cs.LG) [pdf, other]

Title: Adapting to Continuous Covariate Shift via Online Density Ratio EstimationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the \emph{covariate shift}, where the input distributions of data change from training to testing stages while the inputconditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario  \emph{continuous} covariate shift  in which the test data appear sequentially, and their distributions can shift continuously. Our goal is to adaptively train the predictor such that its prediction risk accumulated over time can be minimized. Starting with the importanceweighted learning, we show the method works effectively if the timevarying density ratios of test and train inputs can be accurately estimated. However, existing density ratio estimation methods would fail due to data scarcity at each time step. To this end, we propose an online method that can appropriately reuse historical information. Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor. Empirical results also validate the effectiveness.
 [60] arXiv:2302.02560 (crosslist from cs.LG) [pdf, other]

Title: Causal ShiftResponse Functions with Neural Networks: The Health Benefits of Lowering Air Quality Standards in the USSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Policymakers are required to evaluate the health benefits of reducing the National Ambient Air Quality Standards (NAAQS; i.e., the safety standards) for fine particulate matter PM 2.5 before implementing new policies. We formulate this objective as a shiftresponse function (SRF) and develop methods to analyze the problem using methods for causal inference, specifically under the stochastic interventions framework. SRFs model the average change in an outcome of interest resulting from a hypothetical shift in the observed exposure distribution. We propose a new broadly applicable doublyrobust method to learn SRFs using targeted regularization with neural networks. We evaluate our proposed method under various benchmarks specific for marginal estimates as a function of continuous exposure. Finally, we implement our estimator in the motivating application that considers the potential reduction in deaths from lowering the NAAQS from the current level of 12 $\mu g/m^3$ to levels that are recently proposed by the Environmental Protection Agency in the US (10, 9, and 8 $\mu g/m^3$).
 [61] arXiv:2302.02570 (crosslist from cs.AI) [pdf, other]

Title: Improved Policy Evaluation for Randomized Trials of Algorithmic Resource AllocationSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals' outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means  we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semisynthetic as well as real case study data and show improved estimation accuracy across the board.
 [62] arXiv:2302.02571 (crosslist from cs.LG) [pdf, other]

Title: Offline Learning in Markov Games with General Function ApproximationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
We study offline multiagent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium  such as Nash equilibrium and (Coarse) Correlated Equilibrium  from an offline dataset precollected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sampleefficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner. By using Bellmanconsistent pessimism, we obtain interval estimation for policies' returns, and use both the upper and the lower bounds to obtain a relaxation on the gap of a candidate policy, which becomes our optimization objective. Our results generalize prior works and provide several additional insights. Importantly, we require a data coverage condition that improves over the recently proposed "unilateral concentrability". Our condition allows selective coverage of deviation policies that optimally tradeoff between their greediness (as approximate best responses) and coverage, and we show scenarios where this leads to significantly better guarantees. As a new connection, we also show how our algorithmic framework can subsume seemingly different solution concepts designed for the special case of twoplayer zerosum games.
 [63] arXiv:2302.02589 (crosslist from cs.LG) [pdf, other]

Title: $z$SignFedAvg: A Unified Stochastic Signbased Compression for Federated LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Federated Learning (FL) is a promising privacypreserving distributed learning paradigm but suffers from high communication cost when training largescale machine learning models. Signbased methods, such as SignSGD \cite{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, signbased algorithms could diverge under heterogeneous data, which thus motivated the development of advanced techniques, such as the errorfeedback method and stochastic signbased compression, to fix this issue. Nevertheless, these methods still suffer from slower convergence rates. Besides, none of them allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for signbased compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing stochastic signbased methods. More importantly, the unified noisy perturbation scheme enables the development of the very first signbased FedAvg algorithm ($z$SignFedAvg) to accelerate the convergence. Theoretically, we show that $z$SignFedAvg achieves a faster convergence rate than existing signbased methods and, under the uniformly distributed noise, can enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that the $z$SignFedAvg can achieve competitive empirical performance on real datasets and outperforms existing schemes.
 [64] arXiv:2302.02605 (crosslist from cs.LG) [pdf, other]

Title: Toward Large Kernel ModelsComments: Code is available at github.com/EigenPro/EigenPro3Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing largescale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.
 [65] arXiv:2302.02622 (crosslist from cs.CV) [pdf, other]

Title: Uncertainty Calibration and its Application to Object DetectionAuthors: Fabian KüppersComments: PhD thesis at University of Wuppertal, cite by: 'Fabian K\"uppers. "Uncertainty Calibration and its Application to Object Detection." PhD Thesis, University of Wuppertal, January 2023'Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Imagebased environment perception is an important component especially for driver assistance systems or autonomous driving. In this scope, modern neuronal networks are used to identify multiple objects as well as the according position and size information within a single frame. The performance of such an object detection model is important for the overall performance of the whole system. However, a detection model might also predict these objects under a certain degree of uncertainty. [...]
In this work, we examine the semantic uncertainty (which object type?) as well as the spatial uncertainty (where is the object and how large is it?). We evaluate if the predicted uncertainties of an object detection model match with the observed error that is achieved on realworld data. In the first part of this work, we introduce the definition for confidence calibration of the semantic uncertainty in the context of object detection, instance segmentation, and semantic segmentation. We integrate additional position information in our examinations to evaluate the effect of the object's position on the semantic calibration properties. Besides measuring calibration, it is also possible to perform a posthoc recalibration of semantic uncertainty that might have turned out to be miscalibrated. [...]
The second part of this work deals with the spatial uncertainty obtained by a probabilistic detection model. [...] We review and extend common calibration methods so that it is possible to obtain parametric uncertainty distributions for the position information in a more flexible way.
In the last part, we demonstrate a possible usecase for our derived calibration methods in the context of object tracking. [...] We integrate our previously proposed calibration techniques and demonstrate the usefulness of semantic and spatial uncertainty calibration in a subsequent process. [...]  [66] arXiv:2302.02648 (crosslist from cs.HC) [pdf]

Title: First steps towards quantum machine learning applied to the classification of eventrelated potentialsComments: in French languageSubjects: HumanComputer Interaction (cs.HC); Machine Learning (stat.ML)
Low information transfer rate is a major bottleneck for braincomputer interfaces based on noninvasive electroencephalography (EEG) for clinical applications. This led to the development of more robust and accurate classifiers. In this study, we investigate the performance of quantumenhanced support vector classifier (QSVC). Training (predicting) balanced accuracy of QSVC was 83.17 (50.25) %. This result shows that the classifier was able to learn from EEG data, but that more research is required to obtain higher predicting accuracy. This could be achieved by a better configuration of the classifier, such as increasing the number of shots.
 [67] arXiv:2302.02747 (crosslist from econ.EM) [pdf, other]

Title: Testing Quantile Forecast OptimalitySubjects: Econometrics (econ.EM); Methodology (stat.ME)
Quantile forecasts made across multiple horizons have become an important output of many financial institutions, central banks and international organisations. This paper proposes misspecification tests for such quantile forecasts that assess optimality over a set of multiple forecast horizons and/or quantiles. The tests build on multiple MincerZarnowitz quantile regressions cast in a moment equality framework. Our main test is for the null hypothesis of autocalibration, a concept which assesses optimality with respect to the information contained in the forecasts themselves. We provide an extension that allows to test for optimality with respect to larger information sets and a multivariate extension. Importantly, our tests do not just inform about general violations of optimality, but may also provide useful insights into specific forms of suboptimality. A simulation study investigates the finite sample performance of our tests, and two empirical applications to financial returns and U.S. macroeconomic series illustrate that our tests can yield interesting insights into quantile forecast suboptimality and its causes.
 [68] arXiv:2302.02865 (crosslist from cs.LG) [pdf, other]

Title: Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous InputsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Contrastively trained encoders have recently been proven to invert the datagenerating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, realworld observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the datagenerating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty.
 [69] arXiv:2302.02876 (crosslist from cs.LG) [pdf, other]

Title: Variational Information Pursuit for Interpretable PredictionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short queryanswer chains, a greedy procedure called Information Pursuit (IP) is used, which adaptively chooses queries in order of information gain. Generative models are employed to learn the distribution of queryanswers and labels, which is in turn used to estimate the most informative query. However, learning and inference with a full generative model of the data is often intractable for complex tasks. In this work, we propose Variational Information Pursuit (VIP), a variational characterization of IP which bypasses the need for learning generative models. VIP is based on finding a query selection strategy and a classifier that minimizes the expected crossentropy between true and predicted labels. We then demonstrate that the IP strategy is the optimal solution to this problem. Therefore, instead of learning generative models, we can use our optimal strategy to directly pick the most informative query given any history. We then develop a practical algorithm by defining a finitedimensional parameterization of our strategy and classifier using deep networks and train them endtoend using our objective. Empirically, VIP is 10100x faster than IP on different Vision and NLP tasks with competitive performance. Moreover, VIP finds much shorter query chains when compared to reinforcement learning which is typically used in sequentialdecisionmaking problems. Finally, we demonstrate the utility of VIP on challenging tasks like medical diagnosis where the performance is far superior to the generative modelling approach.
 [70] arXiv:2302.02895 (crosslist from cs.CG) [pdf, other]

Title: Flexible and Probabilistic Topology Tracking with Partial Optimal TransportSubjects: Computational Geometry (cs.CG); Applications (stat.AP)
In this paper, we present a flexible and probabilistic framework for tracking topological features in timevarying scalar fields using merge trees and partial optimal transport. Merge trees are topological descriptors that record the evolution of connected components in the sublevel sets of scalar fields. We present a new technique for modeling and comparing merge trees using tools from partial optimal transport. In particular, we model a merge tree as a measure network, that is, a network equipped with a probability distribution, and define a notion of distance on the space of merge trees inspired by partial optimal transport. Such a distance offers a new and flexible perspective for encoding intrinsic and extrinsic information in the comparative measures of merge trees. More importantly, it gives rise to a partial matching between topological features in timevarying data, thus enabling flexible topology tracking for scientific simulations. Furthermore, such partial matching may be interpreted as probabilistic coupling between features at adjacent time steps, which gives rise to probabilistic tracking graphs. We derive a stability result for our distance and provide numerous experiments indicating the efficacy of distance in extracting meaningful feature tracks.
 [71] arXiv:2302.02941 (crosslist from cs.LG) [pdf, other]

Title: On OverSquashing in Message Passing Neural Networks: The Impact of Width, Depth, and TopologyAuthors: Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio', Michael BronsteinComments: 24 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as oversquashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for oversquashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate oversquashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate oversquashing: increasing the number of layers leads to oversquashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since oversquashing occurs between nodes at high commute (access) time. Our analysis provides a unified framework to study different recent methods introduced to cope with oversquashing and serves as a justification for a class of methods that fall under `graph rewiring'.
 [72] arXiv:2302.02951 (crosslist from condmat.statmech) [pdf, other]

Title: Noisecleaning the precision matrix of fMRI time seriesAuthors: Miguel IbáñezBerganza, Carlo Lucibello, Francesca Santucci, Tommaso Gili, Andrea GabrielliComments: 15 pages, 12 figures (of which 12 pages, 3 figures in the main text)Subjects: Statistical Mechanics (condmat.statmech); Machine Learning (stat.ML)
We present a comparison between various algorithms of inference of covariance and precision matrices in small datasets of real vectors, of the typical length and dimension of human brain activity time series retrieved by functional Magnetic Resonance Imaging (fMRI). Assuming a Gaussian model underlying the neural activity, the problem consists in denoising the empirically observed matrices in order to obtain a better estimator of the true precision and covariance matrices. We consider several standard noisecleaning algorithms and compare them on two types of datasets. The first type are time series of fMRI brain activity of human subjects at rest. The second type are synthetic time series sampled from a generative Gaussian model of which we can vary the fraction of dimensions per sample q = N/T and the strength of offdiagonal correlations. The reliability of each algorithm is assessed in terms of testset likelihood and, in the case of synthetic data, of the distance from the true precision matrix. We observe that the so called Optimal Rotationally Invariant Estimator, based on Random Matrix Theory, leads to a significantly lower distance from the true precision matrix in synthetic data, and higher test likelihood in natural fMRI data. We propose a variant of the Optimal Rotationally Invariant Estimator in which one of its parameters is optimised by crossvalidation. In the severe undersampling regime (large q) typical of fMRI series, it outperforms all the other estimators. We furthermore propose a simple algorithm based on an iterative likelihood gradient ascent, providing an accurate estimation for weakly correlated datasets.
 [73] arXiv:2302.02971 (crosslist from cs.LG) [pdf, other]

Title: UClip: OnAverage Unbiased Stochastic Gradient ClippingSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
UClip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, UClip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, UClip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the UClip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates $u_i$ as long as $\sum_{i=1}^t (u_i  g_i) = o(t)$ where $g_i$ are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.
 [74] arXiv:2302.02988 (crosslist from cs.LG) [pdf, other]

Title: Asymptotically Minimax Optimal FixedBudget Best Arm Identification for Expected Simple Regret MinimizationSubjects: Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
We investigate fixedbudget best arm identification (BAI) for expected simple regret minimization. In each round of an adaptive experiment, a decision maker draws one of multiple treatment arms based on past observations and subsequently observes the outcomes of the chosen arm. After the experiment, the decision maker recommends a treatment arm with the highest projected outcome. We evaluate this decision in terms of the expected simple regret, a difference between the expected outcomes of the best and recommended treatment arms. Due to the inherent uncertainty, we evaluate the regret using the minimax criterion. For distributions with fixed variances (locationshift models), such as Gaussian distributions, we derive asymptotic lower bounds for the worstcase expected simple regret. Then, we show that the Random Sampling (RS)Augmented Inverse Probability Weighting (AIPW) strategy proposed by Kato et al. (2022) is asymptotically minimax optimal in the sense that the leading factor of its worstcase expected simple regret asymptotically matches our derived worstcase lower bound. Our result indicates that, for locationshift models, the optimal RSAIPW strategy draws treatment arms with varying probabilities based on their variances. This result contrasts with the results of Bubeck et al. (2011), which shows that drawing each treatment arm with an equal ratio is minimax optimal in a bounded outcome setting.
 [75] arXiv:2302.02991 (crosslist from eess.IV) [pdf, other]

Title: Optimal Transport Guided Unsupervised Learning for Enhancing lowquality Retinal ImagesComments: Accepted as a conference paper to 20th IEEE International Symposium on Biomedical Imaging(ISBI 2023)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Realworld nonmydriatic retinal fundus photography is prone to artifacts, imperfections and lowquality when certain ocular or systemic comorbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective endtoend framework for enhancing poorquality retinal fundus images. Leveraging the optimal transport theory, we proposed an unpaired imagetoimage translation scheme for transporting lowquality images to their highquality counterparts. We theoretically proved that a Generative Adversarial Networks (GAN) model with a generator and discriminator is sufficient for this task. Furthermore, to mitigate the inconsistency of information between the lowquality images and their enhancements, an information consistency mechanism was proposed to maximally maintain structural consistency (optical discs, blood vessels, lesions) between the source and enhanced domains. Extensive experiments were conducted on the EyeQ dataset to demonstrate the superiority of our proposed method perceptually and quantitatively.
 [76] arXiv:2302.03003 (crosslist from eess.IV) [pdf, other]

Title: OTRE: Where Optimal Transport Guided Unpaired ImagetoImage Translation Meets Regularization by EnhancingAuthors: Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu, Jacob Jacob, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin WangComments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Nonmydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patientrelated causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the \emph{Optimal Transport (OT)} theory to propose an unpaired imagetoimage translation scheme for mapping lowquality retinal CFPs to highquality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a stateoftheart modelbased image reconstruction method, regularization by denoising, by plugging in priors learned by our OTguided imagetoimage translation network. We named it as \emph{regularization by enhancing (RE)}. We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some stateoftheart unsupervised competitors and a stateoftheart supervised method.
 [77] arXiv:2302.03020 (crosslist from cs.LG) [pdf, other]

Title: RLSbench: Domain Adaptation Under Relaxed Label ShiftAuthors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. LiptonSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Despite the emergence of principled methods for domain adaptation under label shift, the sensitivity of these methods for minor shifts in the class conditional distributions remains precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with shifts in label proportions. While several papers attempt to adapt these heuristics to accommodate shifts in label proportions, inconsistencies in evaluation criteria, datasets, and baselines, make it hard to assess the state of the art. In this paper, we introduce RLSbench, a largescale relaxed label shift benchmark, consisting of >500 distribution shift pairs that draw on 14 datasets across vision, tabular, and language modalities and compose them with varying label proportions. First, we evaluate 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective twostep metaalgorithm that is compatible with most deep domain adaptation heuristics: (i) pseudobalance the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. The metaalgorithm improves existing domain adaptation heuristics often by 210\% accuracy points under extreme label proportion shifts and has little (i.e., <0.5\%) effect when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmilab/RLSbench.
Replacements for Tue, 7 Feb 23
 [78] arXiv:1809.02727 (replaced) [pdf, ps, other]

Title: Decentralized Differentially Private WithoutReplacement Stochastic Gradient DescentSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [79] arXiv:1908.07521 (replaced) [pdf, other]

Title: Distributed Hypothesis Testing over a Noisy Channel: Errorexponents TradeoffSubjects: Other Statistics (stat.OT); Information Theory (cs.IT)
 [80] arXiv:1909.06094 (replaced) [pdf, other]

Title: Estimating Fisher Information Matrix in Latent Variable Models based on the Score FunctionSubjects: Methodology (stat.ME)
 [81] arXiv:1911.00648 (replaced) [pdf, other]

Title: salmon: A Symbolic Linear Regression Package for PythonComments: Accepted in the Journal of Statistical SoftwareSubjects: Computation (stat.CO)
 [82] arXiv:2002.01444 (replaced) [pdf, other]

Title: Proper Learning of Linear Dynamical Systems as a NonCommutative Polynomial Optimisation ProblemComments: 27 pages, 6 figures, with additional experiments exploiting sparsitySubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [83] arXiv:2004.04464 (replaced) [pdf, other]

Title: A Characteristic Function for ShapleyValueBased\\Attribution of Anomaly ScoresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [84] arXiv:2005.01026 (replaced) [pdf, other]

Title: MultiCenter Federated Learning: Clients Clustering for Better PersonalizationComments: This paper has two duplicated versions: 2005.01026 and 2108.08647. The first one 2005.01026 is the right one, and the second one 2108.08647 should be deleted because it always causes misoperatingJournalref: World Wide Web,26,(2003),481500Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
 [85] arXiv:2009.07703 (replaced) [pdf, other]

Title: Efficient Variational Bayes Learning of Graphical Models with Smooth Structural ChangesJournalref: IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2023)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
 [86] arXiv:2009.12682 (replaced) [pdf, other]

Title: DecisionAware Conditional GANs for Time Series DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [87] arXiv:2101.02094 (replaced) [pdf, ps, other]

Title: BernsteinType Bounds for Beta DistributionAuthors: Maciej SkorskiComments: major revision  fixed a mistake in the proofSubjects: Probability (math.PR); Statistics Theory (math.ST); Applications (stat.AP)
 [88] arXiv:2102.11050 (replaced) [pdf, other]

Title: Online Learning via Offline Greedy Algorithms: Applications in Market Design and OptimizationAuthors: Rad Niazadeh (1), Negin Golrezaei (2), Joshua Wang (3), Fransisca Susan (4), Ashwinkumar Badanidiyuru (3) ((1) Chicago Booth School of Business, Operations Management, (2) MIT Sloan School of Management, Operations Management, (3) Google Research Mountain View, (4) MIT Operations Research Center)Comments: 87 pages, 2 figures. Management Science (2022)Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [89] arXiv:2105.03529 (replaced) [pdf, other]

Title: Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational DataAuthors: Johann A. GagnonBartsch, Adam C. Sales, Edward Wu, Anthony F. Botelho, John A. Erickson, Luke W. Miratrix, Neil T. HeffernanSubjects: Applications (stat.AP)
 [90] arXiv:2105.10590 (replaced) [pdf, other]

Title: Parallelizing Contextual BanditsAuthors: Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S. Song, Peter Bartlett, Michael I. JordanSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Biomolecules (qbio.BM); Quantitative Methods (qbio.QM)
 [91] arXiv:2106.01128 (replaced) [pdf, other]

Title: LinearTime Gromov Wasserstein Distances using Low Rank Couplings and CostsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [92] arXiv:2106.14045 (replaced) [pdf, other]

Title: The mbsts package: Multivariate Bayesian Structural Time Series Models in RSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Mathematical Software (cs.MS); Applications (stat.AP); Computation (stat.CO)
 [93] arXiv:2107.00371 (replaced) [pdf, other]

Title: Sparse GCA and Thresholded Gradient DescentSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [94] arXiv:2108.12600 (replaced) [pdf, other]

Title: A robust fusionextraction procedure with summary statistics in the presence of biased sourcesSubjects: Methodology (stat.ME)
 [95] arXiv:2109.02959 (replaced) [pdf, other]

Title: Fast approximations of pseudoobservations in the context of rightcensoring and intervalcensoringAuthors: Olivier Bouaziz (MAP5  UMR 8145)Subjects: Statistics Theory (math.ST)
 [96] arXiv:2109.09367 (replaced) [pdf, other]

Title: Extending Bootstrap AMG for Clustering of Attributed GraphsComments: 32 pages, 12 figures, preprintSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Statistics Theory (math.ST)
 [97] arXiv:2111.03289 (replaced) [pdf, ps, other]

Title: Improved Regret Analysis for VarianceAdaptive Linear Bandits and HorizonFree Linear Mixture MDPsComments: accepted to neurips'22Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [98] arXiv:2111.04964 (replaced) [pdf, other]

Title: On Representation Knowledge Distillation for Graph Neural NetworksComments: IEEE Transactions on Neural Networks and Learning Representation (TNNLS), Special Issue on Deep Neural Networks for Graphs: Theory, Models, Algorithms and ApplicationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [99] arXiv:2112.09036 (replaced) [pdf, other]

Title: The Dual PC Algorithm and the Role of Gaussianity for Structure Learning of Bayesian NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
 [100] arXiv:2112.10753 (replaced) [pdf, other]

Title: Strong Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Markov Jump Linear SystemsSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
 [101] arXiv:2201.12064 (replaced) [pdf, other]

Title: Multiscale Graph Comparison via the Embedded Laplacian DiscrepancySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [102] arXiv:2202.01666 (replaced) [pdf, other]

Title: Proportional Fairness in Federated LearningComments: Accepted at TMLR 2023, typos fixedSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [103] arXiv:2202.03835 (replaced) [pdf, other]

Title: A covariant, discrete timefrequency representation tailored for zerobased signal detectionComments: Accepted for publication in IEEE Transactions on Signal Processing on May, 26, 2022Subjects: Signal Processing (eess.SP); Statistics Theory (math.ST); Methodology (stat.ME)
 [104] arXiv:2202.04912 (replaced) [pdf, other]

Title: Random Forest Weighted Local Fréchet RegressionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [105] arXiv:2202.07098 (replaced) [pdf, ps, other]

Title: Statistical Inference After Adaptive Sampling for Longitudinal DataSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
 [106] arXiv:2203.00614 (replaced) [pdf, other]

Title: Side Effects of Learning from Lowdimensional Data Embedded in a Euclidean SpaceComments: 53 pages (11 pages for Appendix), 24 figuresSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
 [107] arXiv:2203.10563 (replaced) [pdf, other]

Title: Application of deshape synchrosqueezing to estimate gait cadence from a singlesensor accelerometer placed in different body locationsSubjects: Applications (stat.AP)
 [108] arXiv:2204.00180 (replaced) [pdf, other]

Title: Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification ApproachAuthors: Filip ObradovićComments: For associated code, see: this https URLSubjects: Applications (stat.AP); Econometrics (econ.EM)
 [109] arXiv:2204.07879 (replaced) [pdf, ps, other]

Title: Polynomialtime sparse measure recoverySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [110] arXiv:2204.08964 (replaced) [pdf, other]

Title: Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chainsComments: 25 pages 7 figuresSubjects: Quantum Physics (quantph); Mathematical Physics (mathph); Statistics Theory (math.ST)
 [111] arXiv:2205.04107 (replaced) [pdf, other]

Title: Inference of multivariate exponential Hawkes processes with inhibition and application to neuronal activityAuthors: Anna Bonnet (LPSM (UMR\_8001)), Miguel Martinez Herrera (LPSM (UMR\_8001)), Maxime Sangnier (LPSM (UMR\_8001))Subjects: Methodology (stat.ME)
 [112] arXiv:2205.13496 (replaced) [pdf, other]

Title: Censored Quantile Regression Neural Networks for DistributionFree Survival AnalysisComments: Published in NeurIPS 2022Journalref: NeurIPS 2022Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [113] arXiv:2205.14504 (replaced) [pdf, other]

Title: Bayesian prediction via nonparametric transformation modelsComments: The corresponding R package BuLTM is available on GitHub this https URLSubjects: Methodology (stat.ME)
 [114] arXiv:2206.02617 (replaced) [pdf, other]

Title: Individual Privacy Accounting for Differentially Private Stochastic Gradient DescentSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
 [115] arXiv:2206.02659 (replaced) [pdf, other]

Title: Robust FineTuning of Deep Neural Networks with Hessianbased Generalization GuaranteesComments: 36 pages, 5 figures, 8 tables (Fixed typos). ICML 2022Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [116] arXiv:2206.12680 (replaced) [pdf, other]

Title: Topologyaware Generalization of Decentralized SGDComments: Accepted for publication in the 39th International Conference on Machine Learning (ICML 2022)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [117] arXiv:2206.14275 (replaced) [pdf, other]

Title: Dynamic CoVaR ModelingSubjects: Econometrics (econ.EM); Statistics Theory (math.ST); Risk Management (qfin.RM); Methodology (stat.ME)
 [118] arXiv:2207.00357 (replaced) [pdf, other]

Title: Efficient parameter estimation for parabolic SPDEs based on a loglinear model for realized volatilitiesSubjects: Statistics Theory (math.ST)
 [119] arXiv:2207.02287 (replaced) [pdf, other]

Title: Branching Processes in Random Environments with ThresholdsComments: 47 pages, 3 figures, 5 tablesSubjects: Probability (math.PR); Statistics Theory (math.ST)
 [120] arXiv:2207.08038 (replaced) [pdf, other]

Title: A Singular Woodbury and PseudoDeterminant Matrix Identities and Application to Gaussian Process RegressionSubjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO)
 [121] arXiv:2207.14088 (replaced) [pdf, other]

Title: On the Sequential Probability Ratio Test in Hidden Markov ModelsComments: 28 pages, 10 figures, submitted to CONCUR 2022Subjects: Probability (math.PR); Logic in Computer Science (cs.LO); Statistics Theory (math.ST)
 [122] arXiv:2208.00959 (replaced) [pdf, other]

Title: HUG model: an interaction point process for Bayesian detection of multiple sources in groundwaters from hydrochemical dataAuthors: Christophe Reype (IECL, PASTA), Radu S. Stoica (IECL, PASTA), Antonin Richard, Madalina Deaconu (IECL, PASTA)Subjects: Applications (stat.AP); Statistics Theory (math.ST); Methodology (stat.ME)
 [123] arXiv:2208.07132 (replaced) [pdf, other]

Title: Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2D2M2 priorComments: 61 pages, 21 figures, 9 tablesSubjects: Methodology (stat.ME); Computation (stat.CO)
 [124] arXiv:2209.05153 (replaced) [pdf, other]

Title: The test of exponentiality based on the mean residual life function revisitedAuthors: Bruno EbnerComments: 16 pages, 1 figure, 5 tablesSubjects: Statistics Theory (math.ST)
 [125] arXiv:2209.07791 (replaced) [pdf, ps, other]

Title: Maximum likelihood estimation and prediction error for a Mat{é}rn model on the circleAuthors: Sébastien Petit (L2S, LNE )Subjects: Statistics Theory (math.ST)
 [126] arXiv:2209.12651 (replaced) [pdf, other]

Title: Learning Variational Models with Unrolling and Bilevel OptimizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [127] arXiv:2209.13570 (replaced) [pdf, other]

Title: Hierarchical Sliced Wasserstein DistanceComments: Accepted to ICLR 2023, 29 pages, 8 figures, 3 tables,Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [128] arXiv:2209.15414 (replaced) [pdf, other]

Title: Predicting the power grid frequency of European islandsAuthors: Thorbjørn Lund Onsaker, Heidi S. Nygård, Damià Gomila, Pere Colet, Ralf Mikut, Richard Jumar, Heiko Maass, Uwe Kühnapfel, Veit Hagenmeyer, Benjamin SchäferComments: 17 pagesSubjects: Applications (stat.AP); Machine Learning (cs.LG); Systems and Control (eess.SY); Data Analysis, Statistics and Probability (physics.dataan)
 [129] arXiv:2210.00635 (replaced) [pdf, other]

Title: Robust Empirical Risk Minimization with ToleranceComments: 22 pages, 1 figure, To appear at ALT'23Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [130] arXiv:2210.00895 (replaced) [pdf, other]

Title: On BestArm Identification with a Fixed Budget in NonParametric MultiArmed BanditsAuthors: Antoine Barrier (UMPAENSL, LMO, CELESTE), Aurélien Garivier (UMPAENSL, LIP), Gilles Stoltz (LMO, CELESTE)Journalref: ALT 2023  The 34th International Conference on Algorithmic Learning Theory, Feb 2023, Singapour, SingaporeSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [131] arXiv:2210.06819 (replaced) [pdf, other]

Title: Meanfield analysis for heavy ball methods: Dropoutstability, connectivity, and global convergenceComments: 14 pages in main text; 51 pages including bibliography and appendix. Published in Transcation on Machine Learning Research(TMLR), 2023. this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [132] arXiv:2210.14860 (replaced) [pdf, other]

Title: Dependence matters: Statistical models to identify the drivers of tie formation in economic networksSubjects: Methodology (stat.ME); Applications (stat.AP)
 [133] arXiv:2211.08572 (replaced) [pdf, other]

Title: Bayesian FixedBudget BestArm IdentificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [134] arXiv:2211.09259 (replaced) [pdf, other]

Title: The Missing Indicator Method: From Low to High DimensionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [135] arXiv:2211.09403 (replaced) [pdf, other]

Title: Learning Mixtures of Markov Chains and MDPsComments: 51 pages (13 page paper, 38 page appendix). Paper restructured and refined, corrections made to proofs, experiments addedSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [136] arXiv:2211.10747 (replaced) [pdf, other]

Title: Exploring validation metrics for offline modelbased optimisationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [137] arXiv:2211.13793 (replaced) [pdf, other]

Title: Tensor Decomposition of Largescale Clinical EEGs Reveals Interpretable Patterns of Brain PhysiologyAuthors: Teja Gupta, Neeraj Wagh, Samarth Rawal, Brent Berry, Gregory Worrell, Yogatheesan VaratharajahComments: 4 pages, 3 Figures, 2 Tables; Accepted at IEEE NER 2023Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Applications (stat.AP)
 [138] arXiv:2211.14296 (replaced) [pdf, other]

Title: A System for MorphologyTask Generalization via Unified Representation and Behavior DistillationComments: Accepted at ICLR2023 (notabletop25%), Website: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
 [139] arXiv:2211.14908 (replaced) [pdf, other]

Title: A Permutationfree Kernel TwoSample TestComments: Published at the Thirtysixth Conference on Neural Information Processing Systems (NeurIPS), with an oral presentationSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [140] arXiv:2212.09178 (replaced) [pdf, ps, other]

Title: Support Vector Regression: Risk Quadrangle FrameworkSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [141] arXiv:2212.09844 (replaced) [pdf, other]

Title: Counterfactual Risk Assessments under Unmeasured ConfoundingSubjects: Econometrics (econ.EM); Computers and Society (cs.CY); Machine Learning (cs.LG); Methodology (stat.ME)
 [142] arXiv:2212.12749 (replaced) [pdf, other]

Title: Deep Latent State Space Models for TimeSeries GenerationSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [143] arXiv:2301.07067 (replaced) [pdf, other]

Title: Transformers as Algorithms: Generalization and Stability in Incontext LearningComments: Revised version significantly improves the stability guarantees and provides new experimentsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [144] arXiv:2301.09479 (replaced) [pdf, other]

Title: ModalityAgnostic Variational Compression of Implicit Neural RepresentationsSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [145] arXiv:2301.09861 (replaced) [pdf]

Title: A convolutional neural network of low complexity for tumor anomaly detectionComments: This article has been accepted for publication in the 8th International Congress on Information and Communication Technology (ICICT 2023, Springer)Subjects: Image and Video Processing (eess.IV); Applications (stat.AP)
 [146] arXiv:2301.12003 (replaced) [pdf, other]

Title: Minimizing Trajectory Curvature of ODEbased Generative ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [147] arXiv:2301.13112 (replaced) [pdf, other]

Title: Benchmarking optimality of time series classification methods in distinguishing diffusionsComments: 21 pages, 8 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [148] arXiv:2301.13857 (replaced) [pdf, other]

Title: Learning in POMDPs is SampleEfficient with Hindsight ObservabilitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [149] arXiv:2302.00704 (replaced) [pdf, other]

Title: Pathologies of Predictive Diversity in Deep EnsemblesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [150] arXiv:2302.00814 (replaced) [pdf, other]

Title: Stochastic Contextual Bandits with Long Horizon RewardsComments: 47 pages, to appear at AAAI 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [151] arXiv:2302.01425 (replaced) [pdf, other]

Title: Fast, Differentiable and Sparse Topk: a Convex Analysis PerspectiveComments: 23 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2302, contact, help (Access key information)