Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 7 Feb 23
 [1] arXiv:2302.01952 [pdf, other]

Title: On a continuous time model of gradient descent dynamics and instability in deep learningComments: Transactions of Machine Learning Research, 2023Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The recipe behind the success of deep learning has been the combination of neural networks and gradientbased optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the tradeoff between training stability and test set evaluation performance.
 [2] arXiv:2302.02033 [pdf, ps, other]

Title: An Asymptotically Optimal Algorithm for the OneDimensional Convex Hull Feasibility ProblemSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
This work studies the pureexploration setting for the convex hull feasibility (CHF) problem where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHF problem in the onedimensional setting. We present the first asymptotically optimal algorithm called ThompsonCHF, whose modular design consists of a stopping rule and a sampling rule. In addition, we provide an extension of the algorithm that generalizes several important problems in the multiarmed bandit literature. Finally, we further investigate the Gaussian bandit case with unknown variances and address how the ThompsonCHF algorithm can be adjusted to be asymptotically optimal in this setting.
 [3] arXiv:2302.02228 [pdf, other]

Title: Counterfactual Identifiability of Bijective Causal ModelsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widelyused causal models in the literature. We establish their counterfactual identifiability for three common causal structures with unobserved confounding, and propose a practical learning method that casts learning a BGM as structured generative modeling. Learned BGMs enable efficient counterfactual estimation and can be obtained using a variety of deep conditional generative models. We evaluate our techniques in a visual task and demonstrate its application in a realworld video streaming simulation task.
 [4] arXiv:2302.02406 [pdf, other]

Title: Prescreening breast cancer with machine learning and deep learningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We suggest that deep learning can be used for prescreening cancer by analyzing demographic and anthropometric information of patients, as well as biological markers obtained from routine blood samples and relative risks obtained from metaanalysis and international databases. We applied feature selection algorithms to a database of 116 women, including 52 healthy women and 64 women diagnosed with breast cancer, to identify the best prescreening predictors of cancer. We utilized the best predictors to perform kfold Monte Carlo crossvalidation experiments that compare deep learning against traditional machine learning algorithms. Our results indicate that a deep learning model with an inputlayer architecture that is finetuned using feature selection can effectively distinguish between patients with and without cancer. Additionally, compared to machine learning, deep learning has the lowest uncertainty in its predictions. These findings suggest that deep learning algorithms applied to cancer prescreening offer a radiationfree, noninvasive, and affordable complement to screening methods based on imagery. The implementation of deep learning algorithms in cancer prescreening offer opportunities to identify individuals who may require imagingbased screening, can encourage selfexamination, and decrease the psychological externalities associated with false positives in cancer screening. The integration of deep learning algorithms for both screening and prescreening will ultimately lead to earlier detection of malignancy, reducing the healthcare and societal burden associated to cancer treatment.
 [5] arXiv:2302.02432 [pdf, other]

Title: Tighter InformationTheoretic Generalization Bounds from SupersamplesSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
We present a variety of novel informationtheoretic generalization bounds for learning algorithms, from the supersample setting of Steinke & Zakynthinou (2020)the setting of the "conditional mutual information" framework. Our development exploits projecting the loss pair (obtained from a training instance and a testing instance) down to a single number and correlating loss values with a Rademacher sequence (and its shifted variants). The presented bounds include squareroot bounds, fastrate bounds, including those based on variance and sharpness, and bounds for interpolating algorithms etc. We show theoretically or empirically that these bounds are tighter than all informationtheoretic bounds known to date on the same supersample setting.
 [6] arXiv:2302.02455 [pdf, other]

Title: ODEWS: The Overdraft Early Warning SystemSubjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)
When a customer overdraws their account and their balance is negative they are assessed an overdraft fee. Americans pay approximately \$15 billion in unnecessary overdraft fees a year, often in \$35 increments; users of the Mint personal finance app pay approximately \$250 million in fees a year in particular. These overdraft fees are an excessive financial burden and lead to cascading overdraft fees trapping customers in financial hardship. To address this problem, we have created an MLdriven overdraft early warning system (ODEWS) that assesses a customer's risk of overdrafting within the next week using their banking and transaction data in the Mint app. Atrisk customers are sent an alert so they can take steps to avoid the fee, ultimately changing their behavior and financial habits. The system deployed resulted in a \$3 million savings in overdraft fees for Mint customers compared to a control group. Moreover, the methodology outlined here can be generalized to provide MLdriven personalized financial advice for many different personal finance goalsincrease credit score, build emergency savings fund, pay down debut, allocate capital for investment.
 [7] arXiv:2302.02670 [pdf, other]

Title: Random Forests for timefixed and timedependent predictors: The DynForest R packageSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The R package DynForest implements random forests for predicting a categorical or a (multiple causes) timetoevent outcome based on timefixed and timedependent predictors. Through the random forests, the timedependent predictors can be measured with error at subjectspecific times, and they can be endogeneous (i.e., impacted by the outcome process). They are modeled internally using flexible linear mixed models (thanks to lcmm package) with timeassociations prespecified by the user. DynForest computes dynamic predictions that take into account all the information from timefixed and timedependent predictors. DynForest also provides information about the most predictive variables using variable importance and minimal depth. Variable importance can also be computed on groups of variables. To display the results, several functions are available such as summary and plot functions. This paper aims to guide the user with a stepbystep example of the different functions for fitting random forests within DynForest.
 [8] arXiv:2302.02672 [pdf, other]

Title: Identifiability of latentvariable and structuralequation models: from linear to nonlinearSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, nonGaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, nonGaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. NonGaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.
 [9] arXiv:2302.02766 [pdf, other]

Title: Generalization Bounds with Datadependent Fractal DimensionsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometrybased generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a datadependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of "geometric stability" and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed datadependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings.
 [10] arXiv:2302.02774 [pdf, other]

Title: The SSL Interplay: Augmentations, Inductive Bias, and GeneralizationSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)
Selfsupervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup, and highlight several insights for SSL practitioners that arise from our theory.
 [11] arXiv:2302.02923 [pdf, other]

Title: In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect EstimationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)
Personalized treatment effect estimates are often of interest in highstakes applications  thus, before deploying a model estimating such effects in practice, one needs to be sure that the best candidate from the evergrowing machine learning toolbox for this task was chosen. Unfortunately, due to the absence of counterfactual information in practice, it is usually not possible to rely on standard validation metrics for doing so, leading to a wellknown model selection dilemma in the treatment effect estimation literature. While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner', we therefore empirically investigate success and failure modes of different selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the DGP used for testing, and provide interesting insights into the relative (dis)advantages of different criteria alongside desiderata for the design of further illuminating empirical studies in this context.
 [12] arXiv:2302.03026 [pdf, other]

Title: SamplingBased Accuracy Testing of Posterior Estimators for General InferenceComments: 15 pagesSubjects: Machine Learning (stat.ML); Instrumentation and Methods for Astrophysics (astroph.IM); Machine Learning (cs.LG); Methodology (stat.ME)
Parameter inference, i.e. inferring the posterior distribution of the parameters of a statistical model given some data, is a central problem to many scientific disciplines. Posterior inference with generative models is an alternative to methods such as Markov Chain Monte Carlo, both for likelihoodbased and simulationbased inference. However, assessing the accuracy of posteriors encoded in generative models is not straightforward. In this paper, we introduce `distance to random point' (DRP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators.
Our method differs from previouslyexisting coveragebased methods, which require posterior evaluations. We prove that our approach is necessary and sufficient to show that a posterior estimator is optimal. We demonstrate the method on a variety of synthetic examples, and show that DRP can be used to test the results of posterior inference analyses in highdimensional spaces. We also show that our method can detect nonoptimal inferences in cases where existing methods fail.
Crosslists for Tue, 7 Feb 23
 [13] arXiv:2302.02009 (crosslist from cs.LG) [pdf, other]

Title: Domain Adaptation via Rebalanced Subdomain AlignmentAuthors: Yiling Liu, Juncheng Dong, Ziyang Jiang, Ahmed Aloui, Keyu Li, Hunter Klein, Vahid Tarokh, David CarlsonComments: 20 pages, 6 figures, 4 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in realworld scenarios. To address this limitation, we propose a novel generalization bound that reweights source classification error by aligning source and target subdomains. We prove that our proposed generalization bound is at least as strong as existing bounds under realistic assumptions, and we empirically show that it is much stronger on realworld data. We then propose an algorithm to minimize this novel generalization bound. We demonstrate by numerical experiments that this approach improves performance in shifted class distribution scenarios compared to stateoftheart methods.
 [14] arXiv:2302.02061 (crosslist from cs.LG) [pdf, other]

Title: Reinforcement Learning with HistoryDependent Dynamic ContextsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for historydependent environments that generalizes the contextual MDP framework to handle nonMarkov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upperconfidencebound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical modelbased algorithm for logistic DCMDPs that plans in a latent space and uses optimism over historydependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.
 [15] arXiv:2302.02092 (crosslist from cs.LG) [pdf, other]

Title: Interpolation for Robust Learning: Data Augmentation on GeodesicsComments: 33 pages, 3 figures, 18 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worstcase Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connecting subpopulation distributions. (3) Additionally, we provide a theoretical guarantee of robustness improvement and investigate how the geodesic location and the sample size contribute, respectively. Experimental validations of the proposed strategy on four datasets, including CIFAR100 and ImageNet, establish the efficacy of our method, e.g., our method improves the baselines' certifiable robustness on CIFAR10 up to $7.7\%$, with $16.8\%$ on empirical robustness on CIFAR100. Our work provides a new perspective of model robustness through the lens of Wasserstein geodesicbased interpolation with a practical offtheshelf strategy that can be combined with existing robust training methods.
 [16] arXiv:2302.02139 (crosslist from cs.LG) [pdf, other]

Title: Structural Explanations for Graph Neural Networks using HSICSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Graph neural networks (GNNs) are a type of neural model that tackle graphical tasks in an endtoend manner. Recently, GNNs have been receiving increased attention in machine learning and data mining communities because of the higher performance they achieve in various tasks, including graph classification, link prediction, and recommendation. However, the complicated dynamics of GNNs make it difficult to understand which parts of the graph features contribute more strongly to the predictions. To handle the interpretability issues, recently, various GNN explanation methods have been proposed. In this study, a flexible model agnostic explanation method is proposed to detect significant structures in graphs using the HilbertSchmidt independence criterion (HSIC), which captures the nonlinear dependency between two variables through kernels. More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lassobased node explanation method. The group and fused regularization with GraphLIME enables the interpretation of GNNs in substructure units. Then, we show that the proposed approach can be used for the explanation of sequential graph classification tasks. Through experiments, it is demonstrated that our method can identify crucial structures in a target graph in various settings.
 [17] arXiv:2302.02155 (crosslist from cs.LG) [pdf, other]

Title: Guaranteed Tensor Recovery Fused Lowrankness and SmoothnessSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
The tensor data recovery task has thus attracted much research attention in recent years. Solving such an illposed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor priors, i.e., global lowrankness (L) and local smoothness (S) across different tensor modes, which are always encoded as a sum of two separate regularization terms into the recovery models. However, unlike the primary theoretical developments on lowrank tensor recovery, these joint L+S models have no theoretical exactrecovery guarantees yet, making the methods lack reliability in real practice. To this crucial issue, in this work, we build a unique regularization term, which essentially encodes both L and S priors of a tensor simultaneously. Especially, by equipping this single regularizer into the recovery models, we can rigorously prove the exact recovery guarantees for two typical tensor recovery tasks, i.e., tensor completion (TC) and tensor robust principal component analysis (TRPCA). To the best of our knowledge, this should be the first exactrecovery results among all related L+S methods for tensor recovery. Significant recovery accuracy improvements over many other SOTA methods in several TC and TRPCA tasks with various kinds of visual tensor data are observed in extensive experiments. Typically, our method achieves a workable performance when the missing rate is extremely large, e.g., 99.5%, for the color image inpainting task, while all its peers totally fail in such challenging case.
 [18] arXiv:2302.02224 (crosslist from cs.LG) [pdf, other]

Title: TAP: The Attention Patch for CrossModal Knowledge Transfer from Unlabeled DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This work investigates the intersection of cross modal learning and semi supervised learning, where we aim to improve the supervised learning performance of the primary modality by borrowing missing information from an unlabeled modality. We investigate this problem from a Nadaraya Watson (NW) kernel regression perspective and show that this formulation implicitly leads to a kernelized cross attention module. To this end, we propose The Attention Patch (TAP), a simple neural network plugin that allows data level knowledge transfer from the unlabeled modality. We provide numerical simulations on three real world datasets to examine each aspect of TAP and show that a TAP integration in a neural network can improve generalization performance using the unlabeled modality.
 [19] arXiv:2302.02252 (crosslist from cs.LG) [pdf, other]

Title: Reinforcement Learning in LowRank MDPs with Density FeaturesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
MDPs with lowrank transitions  that is, the transition matrix can be factored into the product of two matrices, left and right  is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for valuebased learning and has been studied extensively. In this work, we instead investigate sampleefficient learning with density features, i.e., the right matrix, which induce powerful models for stateoccupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plugin solutions for convex RL. In the offline setting, we propose an algorithm for offpolicy estimation of occupancies that can handle nonexploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a levelbylevel manner. As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage. In the absence of strong assumptions like reachability, this incompatibility easily leads to exponential error blowup, which we overcome via novel technical tools. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.
 [20] arXiv:2302.02277 (crosslist from cs.LG) [pdf, other]

Title: SE(3) diffusion model with application to protein backbone generationAuthors: Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi JaakkolaSubjects: Machine Learning (cs.LG); Quantitative Methods (qbio.QM); Machine Learning (stat.ML)
The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.
 [21] arXiv:2302.02323 (crosslist from cs.LG) [pdf, other]

Title: Improving Fair Training under Correlation ShiftsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Model fairness is an essential element for Trustworthy AI. While many techniques for model fairness have been proposed, most of them assume that the training and deployment data distributions are identical, which is often not true in practice. In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing inprocessing fair algorithms have fundamental limits in accuracy and group fairness. We introduce the notion of correlation shifts, which can explicitly capture the change of the above bias. Second, we propose a novel preprocessing step that samples the input data to reduce correlation shifts and thus enables the inprocessing approaches to overcome their limitations. We formulate an optimization problem for adjusting the data ratio among labels and sensitive groups to reflect the shifted correlation. A key benefit of our approach lies in decoupling the roles of pre and inprocessing approaches: correlation adjustment via preprocessing and unfairness mitigation on the processed data via inprocessing. Experiments show that our framework effectively improves existing inprocessing fair algorithms w.r.t. accuracy and fairness, both on synthetic and real datasets.
 [22] arXiv:2302.02392 (crosslist from cs.LG) [pdf, ps, other]

Title: Refined ValueBased Offline RL under Realizability and Partial CoverageComments: Under reviewSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose valuebased algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropyregularized) Qfunction of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Qfunctions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Qfunctions with $L^2$convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
 [23] arXiv:2302.02420 (crosslist from cs.LG) [pdf, other]

Title: Direct Uncertainty QuantificationComments: 21 pages, 16 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Traditional neural networks are simple to train but they produce overconfident predictions, while Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming. This paper introduces a new approach, direct uncertainty quantification (DirectUQ), that combines their advantages where the neural network directly models uncertainty in output space, and captures both aleatoric and epistemic uncertainty. DirectUQ can be derived as an alternative variational lower bound, and hence benefits from collapsed variational inference that provides improved regularizers. On the other hand, like nonprobabilistic models, DirectUQ enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. Experiments show that DirectUQ and ensembles of DirectUQ provide a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.
 [24] arXiv:2302.02460 (crosslist from cs.LG) [pdf, other]

Title: Nonparametric Density Estimation under Distribution DriftSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study nonparametric density estimation in nonstationary drift settings. Given a sequence of independent samples taken from a distribution that gradually changes in time, the goal is to compute the best estimate for the current distribution. We prove tight minimax risk bounds for both discrete and continuous smooth densities, where the minimum is over all possible estimates and the maximum is over all possible distributions that satisfy the drift constraints. Our technique handles a broad class of drift models, and generalizes previous results on agnostic learning under drift.
 [25] arXiv:2302.02497 (crosslist from math.ST) [pdf, other]

Title: Highdimensional Location Estimation via Norm Concentration for Subgamma VectorsSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $\lambda$, and want to estimate $\lambda$ as precisely as possible. Asymptotically, the maximum likelihood estimate achieves the Cram\'erRao bound of error $\mathcal N(0, \frac{1}{n\mathcal I})$, where $\mathcal I$ is the Fisher information of $f$. However, the $n$ required for convergence depends on $f$, and may be arbitrarily large. We build on the theory using \emph{smoothed} estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$smoothed distribution. As $n \to \infty$, $r \to 0$ at an explicit rate and this converges to the Cram\'erRao bound. We (1) improve the prior work for 1dimensional $f$ to converge for constant failure probability in addition to high probability, and (2) extend the theory to highdimensional distributions. In the process, we prove a new bound on the norm of a highdimensional random variable whose 1dimensional projections are subgamma, which may be of independent interest.
 [26] arXiv:2302.02526 (crosslist from cs.LG) [pdf, other]

Title: On Private and Robust BanditsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
We study private and robust multiarmed bandits (MABs), where the agent receives Huber's contaminated heavytailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the informationtheoretic limit of regret with respect to privacy budget, contamination level and heavytailedness. Then, we propose a metaalgorithm that builds on a private and robust mean estimation subroutine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism only. For two different heavytailed settings, we give specific schemes of \texttt{PRM}, which enable us to achieve nearlyoptimal regret. As byproducts of our main results, we also give the first minimax lower bound for private heavytailed MABs (i.e., without contamination). Moreover, our two proposed truncationbased \texttt{PRM} achieve the optimal tradeoff between estimation accuracy, privacy and robustness. Finally, we support our theoretical results with experimental studies.
 [27] arXiv:2302.02544 (crosslist from math.ST) [pdf, other]

Title: Sequential change detection via backward confidence sequencesComments: 24 pages, 10 figuresSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $\theta$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $\theta$, then we can also successfully perform SCD for $\theta$. This is accomplished by checking if two CSs  one forwards and the other backwards  ever fail to intersect. Since the literature on CSs has been rapidly evolving recently, the reduction provided in this paper immediately solves several old and new change detection problems. Further, our "backward CS", constructed by reversing time, is new and potentially of independent interest. We provide strong nonasymptotic guarantees on the frequency of false alarms and detection delay, and demonstrate numerical effectiveness on several problems.
 [28] arXiv:2302.02552 (crosslist from cs.LG) [pdf, other]

Title: Adapting to Continuous Covariate Shift via Online Density Ratio EstimationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the \emph{covariate shift}, where the input distributions of data change from training to testing stages while the inputconditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario  \emph{continuous} covariate shift  in which the test data appear sequentially, and their distributions can shift continuously. Our goal is to adaptively train the predictor such that its prediction risk accumulated over time can be minimized. Starting with the importanceweighted learning, we show the method works effectively if the timevarying density ratios of test and train inputs can be accurately estimated. However, existing density ratio estimation methods would fail due to data scarcity at each time step. To this end, we propose an online method that can appropriately reuse historical information. Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor. Empirical results also validate the effectiveness.
 [29] arXiv:2302.02560 (crosslist from cs.LG) [pdf, other]

Title: Causal ShiftResponse Functions with Neural Networks: The Health Benefits of Lowering Air Quality Standards in the USSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Policymakers are required to evaluate the health benefits of reducing the National Ambient Air Quality Standards (NAAQS; i.e., the safety standards) for fine particulate matter PM 2.5 before implementing new policies. We formulate this objective as a shiftresponse function (SRF) and develop methods to analyze the problem using methods for causal inference, specifically under the stochastic interventions framework. SRFs model the average change in an outcome of interest resulting from a hypothetical shift in the observed exposure distribution. We propose a new broadly applicable doublyrobust method to learn SRFs using targeted regularization with neural networks. We evaluate our proposed method under various benchmarks specific for marginal estimates as a function of continuous exposure. Finally, we implement our estimator in the motivating application that considers the potential reduction in deaths from lowering the NAAQS from the current level of 12 $\mu g/m^3$ to levels that are recently proposed by the Environmental Protection Agency in the US (10, 9, and 8 $\mu g/m^3$).
 [30] arXiv:2302.02570 (crosslist from cs.AI) [pdf, other]

Title: Improved Policy Evaluation for Randomized Trials of Algorithmic Resource AllocationSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals' outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means  we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semisynthetic as well as real case study data and show improved estimation accuracy across the board.
 [31] arXiv:2302.02571 (crosslist from cs.LG) [pdf, other]

Title: Offline Learning in Markov Games with General Function ApproximationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
We study offline multiagent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium  such as Nash equilibrium and (Coarse) Correlated Equilibrium  from an offline dataset precollected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sampleefficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner. By using Bellmanconsistent pessimism, we obtain interval estimation for policies' returns, and use both the upper and the lower bounds to obtain a relaxation on the gap of a candidate policy, which becomes our optimization objective. Our results generalize prior works and provide several additional insights. Importantly, we require a data coverage condition that improves over the recently proposed "unilateral concentrability". Our condition allows selective coverage of deviation policies that optimally tradeoff between their greediness (as approximate best responses) and coverage, and we show scenarios where this leads to significantly better guarantees. As a new connection, we also show how our algorithmic framework can subsume seemingly different solution concepts designed for the special case of twoplayer zerosum games.
 [32] arXiv:2302.02589 (crosslist from cs.LG) [pdf, other]

Title: $z$SignFedAvg: A Unified Stochastic Signbased Compression for Federated LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Federated Learning (FL) is a promising privacypreserving distributed learning paradigm but suffers from high communication cost when training largescale machine learning models. Signbased methods, such as SignSGD \cite{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, signbased algorithms could diverge under heterogeneous data, which thus motivated the development of advanced techniques, such as the errorfeedback method and stochastic signbased compression, to fix this issue. Nevertheless, these methods still suffer from slower convergence rates. Besides, none of them allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for signbased compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing stochastic signbased methods. More importantly, the unified noisy perturbation scheme enables the development of the very first signbased FedAvg algorithm ($z$SignFedAvg) to accelerate the convergence. Theoretically, we show that $z$SignFedAvg achieves a faster convergence rate than existing signbased methods and, under the uniformly distributed noise, can enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that the $z$SignFedAvg can achieve competitive empirical performance on real datasets and outperforms existing schemes.
 [33] arXiv:2302.02605 (crosslist from cs.LG) [pdf, other]

Title: Toward Large Kernel ModelsComments: Code is available at github.com/EigenPro/EigenPro3Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing largescale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.
 [34] arXiv:2302.02622 (crosslist from cs.CV) [pdf, other]

Title: Uncertainty Calibration and its Application to Object DetectionAuthors: Fabian KüppersComments: PhD thesis at University of Wuppertal, cite by: 'Fabian K\"uppers. "Uncertainty Calibration and its Application to Object Detection." PhD Thesis, University of Wuppertal, January 2023'Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Imagebased environment perception is an important component especially for driver assistance systems or autonomous driving. In this scope, modern neuronal networks are used to identify multiple objects as well as the according position and size information within a single frame. The performance of such an object detection model is important for the overall performance of the whole system. However, a detection model might also predict these objects under a certain degree of uncertainty. [...]
In this work, we examine the semantic uncertainty (which object type?) as well as the spatial uncertainty (where is the object and how large is it?). We evaluate if the predicted uncertainties of an object detection model match with the observed error that is achieved on realworld data. In the first part of this work, we introduce the definition for confidence calibration of the semantic uncertainty in the context of object detection, instance segmentation, and semantic segmentation. We integrate additional position information in our examinations to evaluate the effect of the object's position on the semantic calibration properties. Besides measuring calibration, it is also possible to perform a posthoc recalibration of semantic uncertainty that might have turned out to be miscalibrated. [...]
The second part of this work deals with the spatial uncertainty obtained by a probabilistic detection model. [...] We review and extend common calibration methods so that it is possible to obtain parametric uncertainty distributions for the position information in a more flexible way.
In the last part, we demonstrate a possible usecase for our derived calibration methods in the context of object tracking. [...] We integrate our previously proposed calibration techniques and demonstrate the usefulness of semantic and spatial uncertainty calibration in a subsequent process. [...]  [35] arXiv:2302.02648 (crosslist from cs.HC) [pdf]

Title: First steps towards quantum machine learning applied to the classification of eventrelated potentialsComments: in French languageSubjects: HumanComputer Interaction (cs.HC); Machine Learning (stat.ML)
Low information transfer rate is a major bottleneck for braincomputer interfaces based on noninvasive electroencephalography (EEG) for clinical applications. This led to the development of more robust and accurate classifiers. In this study, we investigate the performance of quantumenhanced support vector classifier (QSVC). Training (predicting) balanced accuracy of QSVC was 83.17 (50.25) %. This result shows that the classifier was able to learn from EEG data, but that more research is required to obtain higher predicting accuracy. This could be achieved by a better configuration of the classifier, such as increasing the number of shots.
 [36] arXiv:2302.02718 (crosslist from stat.ME) [pdf, other]

Title: A LogLinear NonParametric Online Changepoint Detection Algorithm based on Functional PruningSubjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)
Online changepoint detection aims to detect anomalies and changes in realtime in highfrequency data streams, sometimes with limited available computational resources. This is an important task that is rooted in many realworld applications, including and not limited to cybersecurity, medicine and astrophysics. While fast and efficient online algorithms have been recently introduced, these rely on parametric assumptions which are often violated in practical applications. Motivated by data streams from the telecommunications sector, we build a flexible nonparametric approach to detect a change in the distribution of a sequence. Our procedure, NPFOCuS, builds a sequential likelihood ratio test for a change in a set of points of the empirical cumulative density function of our data. This is achieved by keeping track of the number of observations above or below those points. Thanks to functional pruning ideas, NPFOCuS has a computational cost that is loglinear in the number of observations and is suitable for highfrequency data streams. In terms of detection power, NPFOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings. We demonstrate the utility of the procedure on both simulated and real data.
 [37] arXiv:2302.02859 (crosslist from stat.ME) [pdf, other]

Title: A Fast Bootstrap Algorithm for Causal Inference with Large DataComments: 46 pagesSubjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)
Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in noncausal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this paper, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.
 [38] arXiv:2302.02865 (crosslist from cs.LG) [pdf, other]

Title: Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous InputsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Contrastively trained encoders have recently been proven to invert the datagenerating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, realworld observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the datagenerating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty.
 [39] arXiv:2302.02876 (crosslist from cs.LG) [pdf, other]

Title: Variational Information Pursuit for Interpretable PredictionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short queryanswer chains, a greedy procedure called Information Pursuit (IP) is used, which adaptively chooses queries in order of information gain. Generative models are employed to learn the distribution of queryanswers and labels, which is in turn used to estimate the most informative query. However, learning and inference with a full generative model of the data is often intractable for complex tasks. In this work, we propose Variational Information Pursuit (VIP), a variational characterization of IP which bypasses the need for learning generative models. VIP is based on finding a query selection strategy and a classifier that minimizes the expected crossentropy between true and predicted labels. We then demonstrate that the IP strategy is the optimal solution to this problem. Therefore, instead of learning generative models, we can use our optimal strategy to directly pick the most informative query given any history. We then develop a practical algorithm by defining a finitedimensional parameterization of our strategy and classifier using deep networks and train them endtoend using our objective. Empirically, VIP is 10100x faster than IP on different Vision and NLP tasks with competitive performance. Moreover, VIP finds much shorter query chains when compared to reinforcement learning which is typically used in sequentialdecisionmaking problems. Finally, we demonstrate the utility of VIP on challenging tasks like medical diagnosis where the performance is far superior to the generative modelling approach.
 [40] arXiv:2302.02941 (crosslist from cs.LG) [pdf, other]

Title: On OverSquashing in Message Passing Neural Networks: The Impact of Width, Depth, and TopologyAuthors: Francesco Di Giovanni, Lorenzo Giusti, Federico Barbero, Giulia Luise, Pietro Lio', Michael BronsteinComments: 24 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)
Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as oversquashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for oversquashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate oversquashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate oversquashing: increasing the number of layers leads to oversquashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since oversquashing occurs between nodes at high commute (access) time. Our analysis provides a unified framework to study different recent methods introduced to cope with oversquashing and serves as a justification for a class of methods that fall under `graph rewiring'.
 [41] arXiv:2302.02951 (crosslist from condmat.statmech) [pdf, other]

Title: Noisecleaning the precision matrix of fMRI time seriesAuthors: Miguel IbáñezBerganza, Carlo Lucibello, Francesca Santucci, Tommaso Gili, Andrea GabrielliComments: 15 pages, 12 figures (of which 12 pages, 3 figures in the main text)Subjects: Statistical Mechanics (condmat.statmech); Machine Learning (stat.ML)
We present a comparison between various algorithms of inference of covariance and precision matrices in small datasets of real vectors, of the typical length and dimension of human brain activity time series retrieved by functional Magnetic Resonance Imaging (fMRI). Assuming a Gaussian model underlying the neural activity, the problem consists in denoising the empirically observed matrices in order to obtain a better estimator of the true precision and covariance matrices. We consider several standard noisecleaning algorithms and compare them on two types of datasets. The first type are time series of fMRI brain activity of human subjects at rest. The second type are synthetic time series sampled from a generative Gaussian model of which we can vary the fraction of dimensions per sample q = N/T and the strength of offdiagonal correlations. The reliability of each algorithm is assessed in terms of testset likelihood and, in the case of synthetic data, of the distance from the true precision matrix. We observe that the so called Optimal Rotationally Invariant Estimator, based on Random Matrix Theory, leads to a significantly lower distance from the true precision matrix in synthetic data, and higher test likelihood in natural fMRI data. We propose a variant of the Optimal Rotationally Invariant Estimator in which one of its parameters is optimised by crossvalidation. In the severe undersampling regime (large q) typical of fMRI series, it outperforms all the other estimators. We furthermore propose a simple algorithm based on an iterative likelihood gradient ascent, providing an accurate estimation for weakly correlated datasets.
 [42] arXiv:2302.02971 (crosslist from cs.LG) [pdf, other]

Title: UClip: OnAverage Unbiased Stochastic Gradient ClippingSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
UClip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, UClip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, UClip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the UClip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates $u_i$ as long as $\sum_{i=1}^t (u_i  g_i) = o(t)$ where $g_i$ are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.
 [43] arXiv:2302.02988 (crosslist from cs.LG) [pdf, other]

Title: Asymptotically Minimax Optimal FixedBudget Best Arm Identification for Expected Simple Regret MinimizationSubjects: Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
We investigate fixedbudget best arm identification (BAI) for expected simple regret minimization. In each round of an adaptive experiment, a decision maker draws one of multiple treatment arms based on past observations and subsequently observes the outcomes of the chosen arm. After the experiment, the decision maker recommends a treatment arm with the highest projected outcome. We evaluate this decision in terms of the expected simple regret, a difference between the expected outcomes of the best and recommended treatment arms. Due to the inherent uncertainty, we evaluate the regret using the minimax criterion. For distributions with fixed variances (locationshift models), such as Gaussian distributions, we derive asymptotic lower bounds for the worstcase expected simple regret. Then, we show that the Random Sampling (RS)Augmented Inverse Probability Weighting (AIPW) strategy proposed by Kato et al. (2022) is asymptotically minimax optimal in the sense that the leading factor of its worstcase expected simple regret asymptotically matches our derived worstcase lower bound. Our result indicates that, for locationshift models, the optimal RSAIPW strategy draws treatment arms with varying probabilities based on their variances. This result contrasts with the results of Bubeck et al. (2011), which shows that drawing each treatment arm with an equal ratio is minimax optimal in a bounded outcome setting.
 [44] arXiv:2302.02991 (crosslist from eess.IV) [pdf, other]

Title: Optimal Transport Guided Unsupervised Learning for Enhancing lowquality Retinal ImagesComments: Accepted as a conference paper to 20th IEEE International Symposium on Biomedical Imaging(ISBI 2023)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Realworld nonmydriatic retinal fundus photography is prone to artifacts, imperfections and lowquality when certain ocular or systemic comorbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective endtoend framework for enhancing poorquality retinal fundus images. Leveraging the optimal transport theory, we proposed an unpaired imagetoimage translation scheme for transporting lowquality images to their highquality counterparts. We theoretically proved that a Generative Adversarial Networks (GAN) model with a generator and discriminator is sufficient for this task. Furthermore, to mitigate the inconsistency of information between the lowquality images and their enhancements, an information consistency mechanism was proposed to maximally maintain structural consistency (optical discs, blood vessels, lesions) between the source and enhanced domains. Extensive experiments were conducted on the EyeQ dataset to demonstrate the superiority of our proposed method perceptually and quantitatively.
 [45] arXiv:2302.03003 (crosslist from eess.IV) [pdf, other]

Title: OTRE: Where Optimal Transport Guided Unpaired ImagetoImage Translation Meets Regularization by EnhancingAuthors: Wenhui Zhu, Peijie Qiu, Oana M. Dumitrascu, Jacob Jacob, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin WangComments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Nonmydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patientrelated causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the \emph{Optimal Transport (OT)} theory to propose an unpaired imagetoimage translation scheme for mapping lowquality retinal CFPs to highquality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a stateoftheart modelbased image reconstruction method, regularization by denoising, by plugging in priors learned by our OTguided imagetoimage translation network. We named it as \emph{regularization by enhancing (RE)}. We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some stateoftheart unsupervised competitors and a stateoftheart supervised method.
 [46] arXiv:2302.03020 (crosslist from cs.LG) [pdf, other]

Title: RLSbench: Domain Adaptation Under Relaxed Label ShiftAuthors: Saurabh Garg, Nick Erickson, James Sharpnack, Alex Smola, Sivaraman Balakrishnan, Zachary C. LiptonSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Despite the emergence of principled methods for domain adaptation under label shift, the sensitivity of these methods for minor shifts in the class conditional distributions remains precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with shifts in label proportions. While several papers attempt to adapt these heuristics to accommodate shifts in label proportions, inconsistencies in evaluation criteria, datasets, and baselines, make it hard to assess the state of the art. In this paper, we introduce RLSbench, a largescale relaxed label shift benchmark, consisting of >500 distribution shift pairs that draw on 14 datasets across vision, tabular, and language modalities and compose them with varying label proportions. First, we evaluate 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective twostep metaalgorithm that is compatible with most deep domain adaptation heuristics: (i) pseudobalance the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. The metaalgorithm improves existing domain adaptation heuristics often by 210\% accuracy points under extreme label proportion shifts and has little (i.e., <0.5\%) effect when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmilab/RLSbench.
Replacements for Tue, 7 Feb 23
 [47] arXiv:2009.07703 (replaced) [pdf, other]

Title: Efficient Variational Bayes Learning of Graphical Models with Smooth Structural ChangesJournalref: IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2023)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
 [48] arXiv:2105.10590 (replaced) [pdf, other]

Title: Parallelizing Contextual BanditsAuthors: Jeffrey Chan, Aldo Pacchiano, Nilesh Tripuraneni, Yun S. Song, Peter Bartlett, Michael I. JordanSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Biomolecules (qbio.BM); Quantitative Methods (qbio.QM)
 [49] arXiv:2107.00371 (replaced) [pdf, other]

Title: Sparse GCA and Thresholded Gradient DescentSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [50] arXiv:2111.03289 (replaced) [pdf, ps, other]

Title: Improved Regret Analysis for VarianceAdaptive Linear Bandits and HorizonFree Linear Mixture MDPsComments: accepted to neurips'22Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [51] arXiv:2112.09036 (replaced) [pdf, other]

Title: The Dual PC Algorithm and the Role of Gaussianity for Structure Learning of Bayesian NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
 [52] arXiv:2201.12064 (replaced) [pdf, other]

Title: Multiscale Graph Comparison via the Embedded Laplacian DiscrepancySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [53] arXiv:2202.04912 (replaced) [pdf, other]

Title: Random Forest Weighted Local Fréchet RegressionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [54] arXiv:2205.13496 (replaced) [pdf, other]

Title: Censored Quantile Regression Neural Networks for DistributionFree Survival AnalysisComments: Published in NeurIPS 2022Journalref: NeurIPS 2022Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [55] arXiv:2209.12651 (replaced) [pdf, other]

Title: Learning Variational Models with Unrolling and Bilevel OptimizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [56] arXiv:2209.13570 (replaced) [pdf, other]

Title: Hierarchical Sliced Wasserstein DistanceComments: Accepted to ICLR 2023, 29 pages, 8 figures, 3 tables,Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [57] arXiv:2211.09403 (replaced) [pdf, other]

Title: Learning Mixtures of Markov Chains and MDPsComments: 51 pages (13 page paper, 38 page appendix). Paper restructured and refined, corrections made to proofs, experiments addedSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [58] arXiv:2211.10747 (replaced) [pdf, other]

Title: Exploring validation metrics for offline modelbased optimisationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [59] arXiv:2212.09178 (replaced) [pdf, ps, other]

Title: Support Vector Regression: Risk Quadrangle FrameworkSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
 [60] arXiv:2212.12749 (replaced) [pdf, other]

Title: Deep Latent State Space Models for TimeSeries GenerationSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [61] arXiv:2301.09479 (replaced) [pdf, other]

Title: ModalityAgnostic Variational Compression of Implicit Neural RepresentationsSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [62] arXiv:2301.13112 (replaced) [pdf, other]

Title: Benchmarking optimality of time series classification methods in distinguishing diffusionsComments: 21 pages, 8 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [63] arXiv:1809.02727 (replaced) [pdf, ps, other]

Title: Decentralized Differentially Private WithoutReplacement Stochastic Gradient DescentSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [64] arXiv:2002.01444 (replaced) [pdf, other]

Title: Proper Learning of Linear Dynamical Systems as a NonCommutative Polynomial Optimisation ProblemComments: 27 pages, 6 figures, with additional experiments exploiting sparsitySubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [65] arXiv:2004.04464 (replaced) [pdf, other]

Title: A Characteristic Function for ShapleyValueBased\\Attribution of Anomaly ScoresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [66] arXiv:2005.01026 (replaced) [pdf, other]

Title: MultiCenter Federated Learning: Clients Clustering for Better PersonalizationComments: This paper has two duplicated versions: 2005.01026 and 2108.08647. The first one 2005.01026 is the right one, and the second one 2108.08647 should be deleted because it always causes misoperatingJournalref: World Wide Web,26,(2003),481500Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
 [67] arXiv:2009.12682 (replaced) [pdf, other]

Title: DecisionAware Conditional GANs for Time Series DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [68] arXiv:2102.11050 (replaced) [pdf, other]

Title: Online Learning via Offline Greedy Algorithms: Applications in Market Design and OptimizationAuthors: Rad Niazadeh (1), Negin Golrezaei (2), Joshua Wang (3), Fransisca Susan (4), Ashwinkumar Badanidiyuru (3) ((1) Chicago Booth School of Business, Operations Management, (2) MIT Sloan School of Management, Operations Management, (3) Google Research Mountain View, (4) MIT Operations Research Center)Comments: 87 pages, 2 figures. Management Science (2022)Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [69] arXiv:2106.01128 (replaced) [pdf, other]

Title: LinearTime Gromov Wasserstein Distances using Low Rank Couplings and CostsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [70] arXiv:2111.04964 (replaced) [pdf, other]

Title: On Representation Knowledge Distillation for Graph Neural NetworksComments: IEEE Transactions on Neural Networks and Learning Representation (TNNLS), Special Issue on Deep Neural Networks for Graphs: Theory, Models, Algorithms and ApplicationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [71] arXiv:2112.10753 (replaced) [pdf, other]

Title: Strong Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Markov Jump Linear SystemsSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
 [72] arXiv:2202.01666 (replaced) [pdf, other]

Title: Proportional Fairness in Federated LearningComments: Accepted at TMLR 2023, typos fixedSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [73] arXiv:2203.00614 (replaced) [pdf, other]

Title: Side Effects of Learning from Lowdimensional Data Embedded in a Euclidean SpaceComments: 53 pages (11 pages for Appendix), 24 figuresSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
 [74] arXiv:2204.07879 (replaced) [pdf, ps, other]

Title: Polynomialtime sparse measure recoverySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [75] arXiv:2206.02617 (replaced) [pdf, other]

Title: Individual Privacy Accounting for Differentially Private Stochastic Gradient DescentSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
 [76] arXiv:2206.02659 (replaced) [pdf, other]

Title: Robust FineTuning of Deep Neural Networks with Hessianbased Generalization GuaranteesComments: 36 pages, 5 figures, 8 tables (Fixed typos). ICML 2022Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [77] arXiv:2206.12680 (replaced) [pdf, other]

Title: Topologyaware Generalization of Decentralized SGDComments: Accepted for publication in the 39th International Conference on Machine Learning (ICML 2022)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [78] arXiv:2210.00635 (replaced) [pdf, other]

Title: Robust Empirical Risk Minimization with ToleranceComments: 22 pages, 1 figure, To appear at ALT'23Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [79] arXiv:2210.00895 (replaced) [pdf, other]

Title: On BestArm Identification with a Fixed Budget in NonParametric MultiArmed BanditsAuthors: Antoine Barrier (UMPAENSL, LMO, CELESTE), Aurélien Garivier (UMPAENSL, LIP), Gilles Stoltz (LMO, CELESTE)Journalref: ALT 2023  The 34th International Conference on Algorithmic Learning Theory, Feb 2023, Singapour, SingaporeSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [80] arXiv:2210.06819 (replaced) [pdf, other]

Title: Meanfield analysis for heavy ball methods: Dropoutstability, connectivity, and global convergenceComments: 14 pages in main text; 51 pages including bibliography and appendix. Published in Transcation on Machine Learning Research(TMLR), 2023. this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [81] arXiv:2211.08572 (replaced) [pdf, other]

Title: Bayesian FixedBudget BestArm IdentificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [82] arXiv:2211.09259 (replaced) [pdf, other]

Title: The Missing Indicator Method: From Low to High DimensionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [83] arXiv:2211.14296 (replaced) [pdf, other]

Title: A System for MorphologyTask Generalization via Unified Representation and Behavior DistillationComments: Accepted at ICLR2023 (notabletop25%), Website: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
 [84] arXiv:2211.14908 (replaced) [pdf, other]

Title: A Permutationfree Kernel TwoSample TestComments: Published at the Thirtysixth Conference on Neural Information Processing Systems (NeurIPS), with an oral presentationSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [85] arXiv:2301.07067 (replaced) [pdf, other]

Title: Transformers as Algorithms: Generalization and Stability in Incontext LearningComments: Revised version significantly improves the stability guarantees and provides new experimentsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [86] arXiv:2301.12003 (replaced) [pdf, other]

Title: Minimizing Trajectory Curvature of ODEbased Generative ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [87] arXiv:2301.13857 (replaced) [pdf, other]

Title: Learning in POMDPs is SampleEfficient with Hindsight ObservabilitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [88] arXiv:2302.00704 (replaced) [pdf, other]

Title: Pathologies of Predictive Diversity in Deep EnsemblesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [89] arXiv:2302.00814 (replaced) [pdf, other]

Title: Stochastic Contextual Bandits with Long Horizon RewardsComments: 47 pages, to appear at AAAI 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [90] arXiv:2302.01425 (replaced) [pdf, other]

Title: Fast, Differentiable and Sparse Topk: a Convex Analysis PerspectiveComments: 23 pagesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, stat, recent, 2302, contact, help (Access key information)