We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 90 entries: 1-90 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 7 Feb 23

[1]  arXiv:2302.01952 [pdf, other]
Title: On a continuous time model of gradient descent dynamics and instability in deep learning
Comments: Transactions of Machine Learning Research, 2023
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.

[2]  arXiv:2302.02033 [pdf, ps, other]
Title: An Asymptotically Optimal Algorithm for the One-Dimensional Convex Hull Feasibility Problem
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

This work studies the pure-exploration setting for the convex hull feasibility (CHF) problem where one aims to efficiently and accurately determine if a given point lies in the convex hull of means of a finite set of distributions. We give a complete characterization of the sample complexity of the CHF problem in the one-dimensional setting. We present the first asymptotically optimal algorithm called Thompson-CHF, whose modular design consists of a stopping rule and a sampling rule. In addition, we provide an extension of the algorithm that generalizes several important problems in the multi-armed bandit literature. Finally, we further investigate the Gaussian bandit case with unknown variances and address how the Thompson-CHF algorithm can be adjusted to be asymptotically optimal in this setting.

[3]  arXiv:2302.02228 [pdf, other]
Title: Counterfactual Identifiability of Bijective Causal Models
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widely-used causal models in the literature. We establish their counterfactual identifiability for three common causal structures with unobserved confounding, and propose a practical learning method that casts learning a BGM as structured generative modeling. Learned BGMs enable efficient counterfactual estimation and can be obtained using a variety of deep conditional generative models. We evaluate our techniques in a visual task and demonstrate its application in a real-world video streaming simulation task.

[4]  arXiv:2302.02406 [pdf, other]
Title: Pre-screening breast cancer with machine learning and deep learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We suggest that deep learning can be used for pre-screening cancer by analyzing demographic and anthropometric information of patients, as well as biological markers obtained from routine blood samples and relative risks obtained from meta-analysis and international databases. We applied feature selection algorithms to a database of 116 women, including 52 healthy women and 64 women diagnosed with breast cancer, to identify the best pre-screening predictors of cancer. We utilized the best predictors to perform k-fold Monte Carlo cross-validation experiments that compare deep learning against traditional machine learning algorithms. Our results indicate that a deep learning model with an input-layer architecture that is fine-tuned using feature selection can effectively distinguish between patients with and without cancer. Additionally, compared to machine learning, deep learning has the lowest uncertainty in its predictions. These findings suggest that deep learning algorithms applied to cancer pre-screening offer a radiation-free, non-invasive, and affordable complement to screening methods based on imagery. The implementation of deep learning algorithms in cancer pre-screening offer opportunities to identify individuals who may require imaging-based screening, can encourage self-examination, and decrease the psychological externalities associated with false positives in cancer screening. The integration of deep learning algorithms for both screening and pre-screening will ultimately lead to earlier detection of malignancy, reducing the healthcare and societal burden associated to cancer treatment.

[5]  arXiv:2302.02432 [pdf, other]
Title: Tighter Information-Theoretic Generalization Bounds from Supersamples
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

We present a variety of novel information-theoretic generalization bounds for learning algorithms, from the supersample setting of Steinke & Zakynthinou (2020)-the setting of the "conditional mutual information" framework. Our development exploits projecting the loss pair (obtained from a training instance and a testing instance) down to a single number and correlating loss values with a Rademacher sequence (and its shifted variants). The presented bounds include square-root bounds, fast-rate bounds, including those based on variance and sharpness, and bounds for interpolating algorithms etc. We show theoretically or empirically that these bounds are tighter than all information-theoretic bounds known to date on the same supersample setting.

[6]  arXiv:2302.02455 [pdf, other]
Title: ODEWS: The Overdraft Early Warning System
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)

When a customer overdraws their account and their balance is negative they are assessed an overdraft fee. Americans pay approximately \$15 billion in unnecessary overdraft fees a year, often in \$35 increments; users of the Mint personal finance app pay approximately \$250 million in fees a year in particular. These overdraft fees are an excessive financial burden and lead to cascading overdraft fees trapping customers in financial hardship. To address this problem, we have created an ML-driven overdraft early warning system (ODEWS) that assesses a customer's risk of overdrafting within the next week using their banking and transaction data in the Mint app. At-risk customers are sent an alert so they can take steps to avoid the fee, ultimately changing their behavior and financial habits. The system deployed resulted in a \$3 million savings in overdraft fees for Mint customers compared to a control group. Moreover, the methodology outlined here can be generalized to provide ML-driven personalized financial advice for many different personal finance goals--increase credit score, build emergency savings fund, pay down debut, allocate capital for investment.

[7]  arXiv:2302.02670 [pdf, other]
Title: Random Forests for time-fixed and time-dependent predictors: The DynForest R package
Authors: Anthony Devaux (BPH), Cécile Proust-Lima (BPH), Robin Genuer (BPH)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The R package DynForest implements random forests for predicting a categorical or a (multiple causes) time-to-event outcome based on time-fixed and time-dependent predictors. Through the random forests, the time-dependent predictors can be measured with error at subject-specific times, and they can be endogeneous (i.e., impacted by the outcome process). They are modeled internally using flexible linear mixed models (thanks to lcmm package) with time-associations pre-specified by the user. DynForest computes dynamic predictions that take into account all the information from time-fixed and time-dependent predictors. DynForest also provides information about the most predictive variables using variable importance and minimal depth. Variable importance can also be computed on groups of variables. To display the results, several functions are available such as summary and plot functions. This paper aims to guide the user with a step-by-step example of the different functions for fitting random forests within DynForest.

[8]  arXiv:2302.02672 [pdf, other]
Title: Identifiability of latent-variable and structural-equation models: from linear to nonlinear
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

An old problem in multivariate statistics is that linear Gaussian models are often unidentifiable, i.e. some parameters cannot be uniquely estimated. In factor analysis, an orthogonal rotation of the factors is unidentifiable, while in linear regression, the direction of effect cannot be identified. For such linear models, non-Gaussianity of the (latent) variables has been shown to provide identifiability. In the case of factor analysis, this leads to independent component analysis, while in the case of the direction of effect, non-Gaussian versions of structural equation modelling solve the problem. More recently, we have shown how even general nonparametric nonlinear versions of such models can be estimated. Non-Gaussianity is not enough in this case, but assuming we have time series, or that the distributions are suitably modulated by some observed auxiliary variables, the models are identifiable. This paper reviews the identifiability theory for the linear and nonlinear cases, considering both factor analytic models and structural equation models.

[9]  arXiv:2302.02766 [pdf, other]
Title: Generalization Bounds with Data-dependent Fractal Dimensions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of "geometric stability" and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed data-dependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings.

[10]  arXiv:2302.02774 [pdf, other]
Title: The SSL Interplay: Augmentations, Inductive Bias, and Generalization
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)

Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in a theory friendly setup, and highlight several insights for SSL practitioners that arise from our theory.

[11]  arXiv:2302.02923 [pdf, other]
Title: In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)

Personalized treatment effect estimates are often of interest in high-stakes applications -- thus, before deploying a model estimating such effects in practice, one needs to be sure that the best candidate from the ever-growing machine learning toolbox for this task was chosen. Unfortunately, due to the absence of counterfactual information in practice, it is usually not possible to rely on standard validation metrics for doing so, leading to a well-known model selection dilemma in the treatment effect estimation literature. While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner', we therefore empirically investigate success- and failure modes of different selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the DGP used for testing, and provide interesting insights into the relative (dis)advantages of different criteria alongside desiderata for the design of further illuminating empirical studies in this context.

[12]  arXiv:2302.03026 [pdf, other]
Title: Sampling-Based Accuracy Testing of Posterior Estimators for General Inference
Comments: 15 pages
Subjects: Machine Learning (stat.ML); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Methodology (stat.ME)

Parameter inference, i.e. inferring the posterior distribution of the parameters of a statistical model given some data, is a central problem to many scientific disciplines. Posterior inference with generative models is an alternative to methods such as Markov Chain Monte Carlo, both for likelihood-based and simulation-based inference. However, assessing the accuracy of posteriors encoded in generative models is not straightforward. In this paper, we introduce `distance to random point' (DRP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators.
Our method differs from previously-existing coverage-based methods, which require posterior evaluations. We prove that our approach is necessary and sufficient to show that a posterior estimator is optimal. We demonstrate the method on a variety of synthetic examples, and show that DRP can be used to test the results of posterior inference analyses in high-dimensional spaces. We also show that our method can detect non-optimal inferences in cases where existing methods fail.

Cross-lists for Tue, 7 Feb 23

[13]  arXiv:2302.02009 (cross-list from cs.LG) [pdf, other]
Title: Domain Adaptation via Rebalanced Sub-domain Alignment
Comments: 20 pages, 6 figures, 4 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Unsupervised domain adaptation (UDA) is a technique used to transfer knowledge from a labeled source domain to a different but related unlabeled target domain. While many UDA methods have shown success in the past, they often assume that the source and target domains must have identical class label distributions, which can limit their effectiveness in real-world scenarios. To address this limitation, we propose a novel generalization bound that reweights source classification error by aligning source and target sub-domains. We prove that our proposed generalization bound is at least as strong as existing bounds under realistic assumptions, and we empirically show that it is much stronger on real-world data. We then propose an algorithm to minimize this novel generalization bound. We demonstrate by numerical experiments that this approach improves performance in shifted class distribution scenarios compared to state-of-the-art methods.

[14]  arXiv:2302.02061 (cross-list from cs.LG) [pdf, other]
Title: Reinforcement Learning with History-Dependent Dynamic Contexts
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

[15]  arXiv:2302.02092 (cross-list from cs.LG) [pdf, other]
Title: Interpolation for Robust Learning: Data Augmentation on Geodesics
Comments: 33 pages, 3 figures, 18 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose to study and promote the robustness of a model as per its performance through the interpolation of training data distributions. Specifically, (1) we augment the data by finding the worst-case Wasserstein barycenter on the geodesic connecting subpopulation distributions of different categories. (2) We regularize the model for smoother performance on the continuous geodesic path connecting subpopulation distributions. (3) Additionally, we provide a theoretical guarantee of robustness improvement and investigate how the geodesic location and the sample size contribute, respectively. Experimental validations of the proposed strategy on four datasets, including CIFAR-100 and ImageNet, establish the efficacy of our method, e.g., our method improves the baselines' certifiable robustness on CIFAR10 up to $7.7\%$, with $16.8\%$ on empirical robustness on CIFAR-100. Our work provides a new perspective of model robustness through the lens of Wasserstein geodesic-based interpolation with a practical off-the-shelf strategy that can be combined with existing robust training methods.

[16]  arXiv:2302.02139 (cross-list from cs.LG) [pdf, other]
Title: Structural Explanations for Graph Neural Networks using HSIC
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Graph neural networks (GNNs) are a type of neural model that tackle graphical tasks in an end-to-end manner. Recently, GNNs have been receiving increased attention in machine learning and data mining communities because of the higher performance they achieve in various tasks, including graph classification, link prediction, and recommendation. However, the complicated dynamics of GNNs make it difficult to understand which parts of the graph features contribute more strongly to the predictions. To handle the interpretability issues, recently, various GNN explanation methods have been proposed. In this study, a flexible model agnostic explanation method is proposed to detect significant structures in graphs using the Hilbert-Schmidt independence criterion (HSIC), which captures the nonlinear dependency between two variables through kernels. More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lasso-based node explanation method. The group and fused regularization with GraphLIME enables the interpretation of GNNs in substructure units. Then, we show that the proposed approach can be used for the explanation of sequential graph classification tasks. Through experiments, it is demonstrated that our method can identify crucial structures in a target graph in various settings.

[17]  arXiv:2302.02155 (cross-list from cs.LG) [pdf, other]
Title: Guaranteed Tensor Recovery Fused Low-rankness and Smoothness
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

The tensor data recovery task has thus attracted much research attention in recent years. Solving such an ill-posed problem generally requires to explore intrinsic prior structures underlying tensor data, and formulate them as certain forms of regularization terms for guiding a sound estimate of the restored tensor. Recent research have made significant progress by adopting two insightful tensor priors, i.e., global low-rankness (L) and local smoothness (S) across different tensor modes, which are always encoded as a sum of two separate regularization terms into the recovery models. However, unlike the primary theoretical developments on low-rank tensor recovery, these joint L+S models have no theoretical exact-recovery guarantees yet, making the methods lack reliability in real practice. To this crucial issue, in this work, we build a unique regularization term, which essentially encodes both L and S priors of a tensor simultaneously. Especially, by equipping this single regularizer into the recovery models, we can rigorously prove the exact recovery guarantees for two typical tensor recovery tasks, i.e., tensor completion (TC) and tensor robust principal component analysis (TRPCA). To the best of our knowledge, this should be the first exact-recovery results among all related L+S methods for tensor recovery. Significant recovery accuracy improvements over many other SOTA methods in several TC and TRPCA tasks with various kinds of visual tensor data are observed in extensive experiments. Typically, our method achieves a workable performance when the missing rate is extremely large, e.g., 99.5%, for the color image inpainting task, while all its peers totally fail in such challenging case.

[18]  arXiv:2302.02224 (cross-list from cs.LG) [pdf, other]
Title: TAP: The Attention Patch for Cross-Modal Knowledge Transfer from Unlabeled Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This work investigates the intersection of cross modal learning and semi supervised learning, where we aim to improve the supervised learning performance of the primary modality by borrowing missing information from an unlabeled modality. We investigate this problem from a Nadaraya Watson (NW) kernel regression perspective and show that this formulation implicitly leads to a kernelized cross attention module. To this end, we propose The Attention Patch (TAP), a simple neural network plugin that allows data level knowledge transfer from the unlabeled modality. We provide numerical simulations on three real world datasets to examine each aspect of TAP and show that a TAP integration in a neural network can improve generalization performance using the unlabeled modality.

[19]  arXiv:2302.02252 (cross-list from cs.LG) [pdf, other]
Title: Reinforcement Learning in Low-Rank MDPs with Density Features
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

MDPs with low-rank transitions -- that is, the transition matrix can be factored into the product of two matrices, left and right -- is a highly representative structure that enables tractable learning. The left matrix enables expressive function approximation for value-based learning and has been studied extensively. In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions. This setting not only sheds light on leveraging unsupervised learning in RL, but also enables plug-in solutions for convex RL. In the offline setting, we propose an algorithm for off-policy estimation of occupancies that can handle non-exploratory data. Using this as a subroutine, we further devise an online algorithm that constructs exploratory data distributions in a level-by-level manner. As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage. In the absence of strong assumptions like reachability, this incompatibility easily leads to exponential error blow-up, which we overcome via novel technical tools. Our results also readily extend to the representation learning setting, when the density features are unknown and must be learned from an exponentially large candidate set.

[20]  arXiv:2302.02277 (cross-list from cs.LG) [pdf, other]
Title: SE(3) diffusion model with application to protein backbone generation
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)

The design of novel protein structures remains a challenge in protein engineering for applications across biomedicine and chemistry. In this line of work, a diffusion model over rigid bodies in 3D (referred to as frames) has shown success in generating novel, functional protein backbones that have not been observed in nature. However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for learning the SE(3) equivariant score over multiple frames. We apply FrameDiff on monomer backbone generation and find it can generate designable monomers up to 500 amino acids without relying on a pretrained protein structure prediction network that has been integral to previous methods. We find our samples are capable of generalizing beyond any known protein structure.

[21]  arXiv:2302.02323 (cross-list from cs.LG) [pdf, other]
Title: Improving Fair Training under Correlation Shifts
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Model fairness is an essential element for Trustworthy AI. While many techniques for model fairness have been proposed, most of them assume that the training and deployment data distributions are identical, which is often not true in practice. In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We introduce the notion of correlation shifts, which can explicitly capture the change of the above bias. Second, we propose a novel pre-processing step that samples the input data to reduce correlation shifts and thus enables the in-processing approaches to overcome their limitations. We formulate an optimization problem for adjusting the data ratio among labels and sensitive groups to reflect the shifted correlation. A key benefit of our approach lies in decoupling the roles of pre- and in-processing approaches: correlation adjustment via pre-processing and unfairness mitigation on the processed data via in-processing. Experiments show that our framework effectively improves existing in-processing fair algorithms w.r.t. accuracy and fairness, both on synthetic and real datasets.

[22]  arXiv:2302.02392 (cross-list from cs.LG) [pdf, ps, other]
Title: Refined Value-Based Offline RL under Realizability and Partial Coverage
Comments: Under review
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropy-regularized) Q-function of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Q-functions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Q-functions with $L^2$-convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.

[23]  arXiv:2302.02420 (cross-list from cs.LG) [pdf, other]
Title: Direct Uncertainty Quantification
Comments: 21 pages, 16 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Traditional neural networks are simple to train but they produce overconfident predictions, while Bayesian neural networks provide good uncertainty quantification but optimizing them is time consuming. This paper introduces a new approach, direct uncertainty quantification (DirectUQ), that combines their advantages where the neural network directly models uncertainty in output space, and captures both aleatoric and epistemic uncertainty. DirectUQ can be derived as an alternative variational lower bound, and hence benefits from collapsed variational inference that provides improved regularizers. On the other hand, like non-probabilistic models, DirectUQ enjoys simple training and one can use Rademacher complexity to provide risk bounds for the model. Experiments show that DirectUQ and ensembles of DirectUQ provide a good tradeoff in terms of run time and uncertainty quantification, especially for out of distribution data.

[24]  arXiv:2302.02460 (cross-list from cs.LG) [pdf, other]
Title: Nonparametric Density Estimation under Distribution Drift
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study nonparametric density estimation in non-stationary drift settings. Given a sequence of independent samples taken from a distribution that gradually changes in time, the goal is to compute the best estimate for the current distribution. We prove tight minimax risk bounds for both discrete and continuous smooth densities, where the minimum is over all possible estimates and the maximum is over all possible distributions that satisfy the drift constraints. Our technique handles a broad class of drift models, and generalizes previous results on agnostic learning under drift.

[25]  arXiv:2302.02497 (cross-list from math.ST) [pdf, other]
Title: High-dimensional Location Estimation via Norm Concentration for Subgamma Vectors
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)

In location estimation, we are given $n$ samples from a known distribution $f$ shifted by an unknown translation $\lambda$, and want to estimate $\lambda$ as precisely as possible. Asymptotically, the maximum likelihood estimate achieves the Cram\'er-Rao bound of error $\mathcal N(0, \frac{1}{n\mathcal I})$, where $\mathcal I$ is the Fisher information of $f$. However, the $n$ required for convergence depends on $f$, and may be arbitrarily large. We build on the theory using \emph{smoothed} estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$-smoothed distribution. As $n \to \infty$, $r \to 0$ at an explicit rate and this converges to the Cram\'er-Rao bound. We (1) improve the prior work for 1-dimensional $f$ to converge for constant failure probability in addition to high probability, and (2) extend the theory to high-dimensional distributions. In the process, we prove a new bound on the norm of a high-dimensional random variable whose 1-dimensional projections are subgamma, which may be of independent interest.

[26]  arXiv:2302.02526 (cross-list from cs.LG) [pdf, other]
Title: On Private and Robust Bandits
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)

We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism only. For two different heavy-tailed settings, we give specific schemes of \texttt{PRM}, which enable us to achieve nearly-optimal regret. As by-products of our main results, we also give the first minimax lower bound for private heavy-tailed MABs (i.e., without contamination). Moreover, our two proposed truncation-based \texttt{PRM} achieve the optimal trade-off between estimation accuracy, privacy and robustness. Finally, we support our theoretical results with experimental studies.

[27]  arXiv:2302.02544 (cross-list from math.ST) [pdf, other]
Title: Sequential change detection via backward confidence sequences
Comments: 24 pages, 10 figures
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We present a simple reduction from sequential estimation to sequential changepoint detection (SCD). In short, suppose we are interested in detecting changepoints in some parameter or functional $\theta$ of the underlying distribution. We demonstrate that if we can construct a confidence sequence (CS) for $\theta$, then we can also successfully perform SCD for $\theta$. This is accomplished by checking if two CSs -- one forwards and the other backwards -- ever fail to intersect. Since the literature on CSs has been rapidly evolving recently, the reduction provided in this paper immediately solves several old and new change detection problems. Further, our "backward CS", constructed by reversing time, is new and potentially of independent interest. We provide strong nonasymptotic guarantees on the frequency of false alarms and detection delay, and demonstrate numerical effectiveness on several problems.

[28]  arXiv:2302.02552 (cross-list from cs.LG) [pdf, other]
Title: Adapting to Continuous Covariate Shift via Online Density Ratio Estimation
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the \emph{covariate shift}, where the input distributions of data change from training to testing stages while the input-conditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario -- \emph{continuous} covariate shift -- in which the test data appear sequentially, and their distributions can shift continuously. Our goal is to adaptively train the predictor such that its prediction risk accumulated over time can be minimized. Starting with the importance-weighted learning, we show the method works effectively if the time-varying density ratios of test and train inputs can be accurately estimated. However, existing density ratio estimation methods would fail due to data scarcity at each time step. To this end, we propose an online method that can appropriately reuse historical information. Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor. Empirical results also validate the effectiveness.

[29]  arXiv:2302.02560 (cross-list from cs.LG) [pdf, other]
Title: Causal Shift-Response Functions with Neural Networks: The Health Benefits of Lowering Air Quality Standards in the US
Subjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

Policymakers are required to evaluate the health benefits of reducing the National Ambient Air Quality Standards (NAAQS; i.e., the safety standards) for fine particulate matter PM 2.5 before implementing new policies. We formulate this objective as a shift-response function (SRF) and develop methods to analyze the problem using methods for causal inference, specifically under the stochastic interventions framework. SRFs model the average change in an outcome of interest resulting from a hypothetical shift in the observed exposure distribution. We propose a new broadly applicable doubly-robust method to learn SRFs using targeted regularization with neural networks. We evaluate our proposed method under various benchmarks specific for marginal estimates as a function of continuous exposure. Finally, we implement our estimator in the motivating application that considers the potential reduction in deaths from lowering the NAAQS from the current level of 12 $\mu g/m^3$ to levels that are recently proposed by the Environmental Protection Agency in the US (10, 9, and 8 $\mu g/m^3$).

[30]  arXiv:2302.02570 (cross-list from cs.AI) [pdf, other]
Title: Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)

We consider the task of evaluating policies of algorithmic resource allocation through randomized controlled trials (RCTs). Such policies are tasked with optimizing the utilization of limited intervention resources, with the goal of maximizing the benefits derived. Evaluation of such allocation policies through RCTs proves difficult, notwithstanding the scale of the trial, because the individuals' outcomes are inextricably interlinked through resource constraints controlling the policy decisions. Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT. We identify conditions under which such reassignments are permissible and can be leveraged to construct counterfactual trials, whose outcomes can be accurately ascertained, for free. We prove theoretically that such an estimator is more accurate than common estimators based on sample means -- we show that it returns an unbiased estimate and simultaneously reduces variance. We demonstrate the value of our approach through empirical experiments on synthetic, semi-synthetic as well as real case study data and show improved estimation accuracy across the board.

[31]  arXiv:2302.02571 (cross-list from cs.LG) [pdf, other]
Title: Offline Learning in Markov Games with General Function Approximation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game. Existing works consider relatively restricted tabular or linear models and handle each equilibria separately. In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner. By using Bellman-consistent pessimism, we obtain interval estimation for policies' returns, and use both the upper and the lower bounds to obtain a relaxation on the gap of a candidate policy, which becomes our optimization objective. Our results generalize prior works and provide several additional insights. Importantly, we require a data coverage condition that improves over the recently proposed "unilateral concentrability". Our condition allows selective coverage of deviation policies that optimally trade-off between their greediness (as approximate best responses) and coverage, and we show scenarios where this leads to significantly better guarantees. As a new connection, we also show how our algorithmic framework can subsume seemingly different solution concepts designed for the special case of two-player zero-sum games.

[32]  arXiv:2302.02589 (cross-list from cs.LG) [pdf, other]
Title: $z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm but suffers from high communication cost when training large-scale machine learning models. Sign-based methods, such as SignSGD \cite{bernstein2018signsgd}, have been proposed as a biased gradient compression technique for reducing the communication cost. However, sign-based algorithms could diverge under heterogeneous data, which thus motivated the development of advanced techniques, such as the error-feedback method and stochastic sign-based compression, to fix this issue. Nevertheless, these methods still suffer from slower convergence rates. Besides, none of them allows multiple local SGD updates like FedAvg \cite{mcmahan2017communication}. In this paper, we propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression, which not only allows one to flexibly control the tradeoff between gradient bias and convergence performance, but also provides a unified viewpoint to existing stochastic sign-based methods. More importantly, the unified noisy perturbation scheme enables the development of the very first sign-based FedAvg algorithm ($z$-SignFedAvg) to accelerate the convergence. Theoretically, we show that $z$-SignFedAvg achieves a faster convergence rate than existing sign-based methods and, under the uniformly distributed noise, can enjoy the same convergence rate as its uncompressed counterpart. Extensive experiments are conducted to demonstrate that the $z$-SignFedAvg can achieve competitive empirical performance on real datasets and outperforms existing schemes.

[33]  arXiv:2302.02605 (cross-list from cs.LG) [pdf, other]
Title: Toward Large Kernel Models
Comments: Code is available at github.com/EigenPro/EigenPro3
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.

[34]  arXiv:2302.02622 (cross-list from cs.CV) [pdf, other]
Title: Uncertainty Calibration and its Application to Object Detection
Authors: Fabian Küppers
Comments: PhD thesis at University of Wuppertal, cite by: 'Fabian K\"uppers. "Uncertainty Calibration and its Application to Object Detection." PhD Thesis, University of Wuppertal, January 2023'
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)

Image-based environment perception is an important component especially for driver assistance systems or autonomous driving. In this scope, modern neuronal networks are used to identify multiple objects as well as the according position and size information within a single frame. The performance of such an object detection model is important for the overall performance of the whole system. However, a detection model might also predict these objects under a certain degree of uncertainty. [...]
In this work, we examine the semantic uncertainty (which object type?) as well as the spatial uncertainty (where is the object and how large is it?). We evaluate if the predicted uncertainties of an object detection model match with the observed error that is achieved on real-world data. In the first part of this work, we introduce the definition for confidence calibration of the semantic uncertainty in the context of object detection, instance segmentation, and semantic segmentation. We integrate additional position information in our examinations to evaluate the effect of the object's position on the semantic calibration properties. Besides measuring calibration, it is also possible to perform a post-hoc recalibration of semantic uncertainty that might have turned out to be miscalibrated. [...]
The second part of this work deals with the spatial uncertainty obtained by a probabilistic detection model. [...] We review and extend common calibration methods so that it is possible to obtain parametric uncertainty distributions for the position information in a more flexible way.
In the last part, we demonstrate a possible use-case for our derived calibration methods in the context of object tracking. [...] We integrate our previously proposed calibration techniques and demonstrate the usefulness of semantic and spatial uncertainty calibration in a subsequent process. [...]

[35]  arXiv:2302.02648 (cross-list from cs.HC) [pdf]
Title: First steps towards quantum machine learning applied to the classification of event-related potentials
Authors: Grégoire Cattan, Alexandre Quemy (PUT), Anton Andreev (GIPSA-Services)
Comments: in French language
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)

Low information transfer rate is a major bottleneck for brain-computer interfaces based on non-invasive electroencephalography (EEG) for clinical applications. This led to the development of more robust and accurate classifiers. In this study, we investigate the performance of quantum-enhanced support vector classifier (QSVC). Training (predicting) balanced accuracy of QSVC was 83.17 (50.25) %. This result shows that the classifier was able to learn from EEG data, but that more research is required to obtain higher predicting accuracy. This could be achieved by a better configuration of the classifier, such as increasing the number of shots.

[36]  arXiv:2302.02718 (cross-list from stat.ME) [pdf, other]
Title: A Log-Linear Non-Parametric Online Changepoint Detection Algorithm based on Functional Pruning
Subjects: Methodology (stat.ME); Computation (stat.CO); Machine Learning (stat.ML)

Online changepoint detection aims to detect anomalies and changes in real-time in high-frequency data streams, sometimes with limited available computational resources. This is an important task that is rooted in many real-world applications, including and not limited to cybersecurity, medicine and astrophysics. While fast and efficient online algorithms have been recently introduced, these rely on parametric assumptions which are often violated in practical applications. Motivated by data streams from the telecommunications sector, we build a flexible nonparametric approach to detect a change in the distribution of a sequence. Our procedure, NP-FOCuS, builds a sequential likelihood ratio test for a change in a set of points of the empirical cumulative density function of our data. This is achieved by keeping track of the number of observations above or below those points. Thanks to functional pruning ideas, NP-FOCuS has a computational cost that is log-linear in the number of observations and is suitable for high-frequency data streams. In terms of detection power, NP-FOCuS is seen to outperform current nonparametric online changepoint techniques in a variety of settings. We demonstrate the utility of the procedure on both simulated and real data.

[37]  arXiv:2302.02859 (cross-list from stat.ME) [pdf, other]
Title: A Fast Bootstrap Algorithm for Causal Inference with Large Data
Comments: 46 pages
Subjects: Methodology (stat.ME); Applications (stat.AP); Machine Learning (stat.ML)

Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non-causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this paper, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.

[38]  arXiv:2302.02865 (cross-list from cs.LG) [pdf, other]
Title: Probabilistic Contrastive Learning Recovers the Correct Aleatoric Uncertainty of Ambiguous Inputs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Contrastively trained encoders have recently been proven to invert the data-generating process: they encode each input, e.g., an image, into the true latent vector that generated the image (Zimmermann et al., 2021). However, real-world observations often have inherent ambiguities. For instance, images may be blurred or only show a 2D view of a 3D object, so multiple latents could have generated them. This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points. We prove that these distributions recover the correct posteriors of the data-generating process, including its level of aleatoric uncertainty, up to a rotation of the latent space. In addition to providing calibrated uncertainty estimates, these posteriors allow the computation of credible intervals in image retrieval. They comprise images with the same latent as a given query, subject to its uncertainty.

[39]  arXiv:2302.02876 (cross-list from cs.LG) [pdf, other]
Title: Variational Information Pursuit for Interpretable Predictions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

There is a growing interest in the machine learning community in developing predictive algorithms that are "interpretable by design". Towards this end, recent work proposes to make interpretable decisions by sequentially asking interpretable queries about data until a prediction can be made with high confidence based on the answers obtained (the history). To promote short query-answer chains, a greedy procedure called Information Pursuit (IP) is used, which adaptively chooses queries in order of information gain. Generative models are employed to learn the distribution of query-answers and labels, which is in turn used to estimate the most informative query. However, learning and inference with a full generative model of the data is often intractable for complex tasks. In this work, we propose Variational Information Pursuit (V-IP), a variational characterization of IP which bypasses the need for learning generative models. V-IP is based on finding a query selection strategy and a classifier that minimizes the expected cross-entropy between true and predicted labels. We then demonstrate that the IP strategy is the optimal solution to this problem. Therefore, instead of learning generative models, we can use our optimal strategy to directly pick the most informative query given any history. We then develop a practical algorithm by defining a finite-dimensional parameterization of our strategy and classifier using deep networks and train them end-to-end using our objective. Empirically, V-IP is 10-100x faster than IP on different Vision and NLP tasks with competitive performance. Moreover, V-IP finds much shorter query chains when compared to reinforcement learning which is typically used in sequential-decision-making problems. Finally, we demonstrate the utility of V-IP on challenging tasks like medical diagnosis where the performance is far superior to the generative modelling approach.

[40]  arXiv:2302.02941 (cross-list from cs.LG) [pdf, other]
Title: On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology
Comments: 24 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Machine Learning (stat.ML)

Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as over-squashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute (access) time. Our analysis provides a unified framework to study different recent methods introduced to cope with over-squashing and serves as a justification for a class of methods that fall under `graph rewiring'.

[41]  arXiv:2302.02951 (cross-list from cond-mat.stat-mech) [pdf, other]
Title: Noise-cleaning the precision matrix of fMRI time series
Comments: 15 pages, 12 figures (of which 12 pages, 3 figures in the main text)
Subjects: Statistical Mechanics (cond-mat.stat-mech); Machine Learning (stat.ML)

We present a comparison between various algorithms of inference of covariance and precision matrices in small datasets of real vectors, of the typical length and dimension of human brain activity time series retrieved by functional Magnetic Resonance Imaging (fMRI). Assuming a Gaussian model underlying the neural activity, the problem consists in denoising the empirically observed matrices in order to obtain a better estimator of the true precision and covariance matrices. We consider several standard noise-cleaning algorithms and compare them on two types of datasets. The first type are time series of fMRI brain activity of human subjects at rest. The second type are synthetic time series sampled from a generative Gaussian model of which we can vary the fraction of dimensions per sample q = N/T and the strength of off-diagonal correlations. The reliability of each algorithm is assessed in terms of test-set likelihood and, in the case of synthetic data, of the distance from the true precision matrix. We observe that the so called Optimal Rotationally Invariant Estimator, based on Random Matrix Theory, leads to a significantly lower distance from the true precision matrix in synthetic data, and higher test likelihood in natural fMRI data. We propose a variant of the Optimal Rotationally Invariant Estimator in which one of its parameters is optimised by cross-validation. In the severe undersampling regime (large q) typical of fMRI series, it outperforms all the other estimators. We furthermore propose a simple algorithm based on an iterative likelihood gradient ascent, providing an accurate estimation for weakly correlated datasets.

[42]  arXiv:2302.02971 (cross-list from cs.LG) [pdf, other]
Title: U-Clip: On-Average Unbiased Stochastic Gradient Clipping
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

U-Clip is a simple amendment to gradient clipping that can be applied to any iterative gradient optimization algorithm. Like regular clipping, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clipping) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients on the next iteration (before clipping). We show that the cumulative bias of the U-Clip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates $u_i$ as long as $\sum_{i=1}^t (u_i - g_i) = o(t)$ where $g_i$ are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet.

[43]  arXiv:2302.02988 (cross-list from cs.LG) [pdf, other]
Title: Asymptotically Minimax Optimal Fixed-Budget Best Arm Identification for Expected Simple Regret Minimization
Subjects: Machine Learning (cs.LG); Econometrics (econ.EM); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)

We investigate fixed-budget best arm identification (BAI) for expected simple regret minimization. In each round of an adaptive experiment, a decision maker draws one of multiple treatment arms based on past observations and subsequently observes the outcomes of the chosen arm. After the experiment, the decision maker recommends a treatment arm with the highest projected outcome. We evaluate this decision in terms of the expected simple regret, a difference between the expected outcomes of the best and recommended treatment arms. Due to the inherent uncertainty, we evaluate the regret using the minimax criterion. For distributions with fixed variances (location-shift models), such as Gaussian distributions, we derive asymptotic lower bounds for the worst-case expected simple regret. Then, we show that the Random Sampling (RS)-Augmented Inverse Probability Weighting (AIPW) strategy proposed by Kato et al. (2022) is asymptotically minimax optimal in the sense that the leading factor of its worst-case expected simple regret asymptotically matches our derived worst-case lower bound. Our result indicates that, for location-shift models, the optimal RS-AIPW strategy draws treatment arms with varying probabilities based on their variances. This result contrasts with the results of Bubeck et al. (2011), which shows that drawing each treatment arm with an equal ratio is minimax optimal in a bounded outcome setting.

[44]  arXiv:2302.02991 (cross-list from eess.IV) [pdf, other]
Title: Optimal Transport Guided Unsupervised Learning for Enhancing low-quality Retinal Images
Comments: Accepted as a conference paper to 20th IEEE International Symposium on Biomedical Imaging(ISBI 2023)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Real-world non-mydriatic retinal fundus photography is prone to artifacts, imperfections and low-quality when certain ocular or systemic co-morbidities exist. Artifacts may result in inaccuracy or ambiguity in clinical diagnoses. In this paper, we proposed a simple but effective end-to-end framework for enhancing poor-quality retinal fundus images. Leveraging the optimal transport theory, we proposed an unpaired image-to-image translation scheme for transporting low-quality images to their high-quality counterparts. We theoretically proved that a Generative Adversarial Networks (GAN) model with a generator and discriminator is sufficient for this task. Furthermore, to mitigate the inconsistency of information between the low-quality images and their enhancements, an information consistency mechanism was proposed to maximally maintain structural consistency (optical discs, blood vessels, lesions) between the source and enhanced domains. Extensive experiments were conducted on the EyeQ dataset to demonstrate the superiority of our proposed method perceptually and quantitatively.

[45]  arXiv:2302.03003 (cross-list from eess.IV) [pdf, other]
Title: OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing
Comments: Accepted as a conference paper to The 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the \emph{Optimal Transport (OT)} theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as \emph{regularization by enhancing (RE)}. We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.

[46]  arXiv:2302.03020 (cross-list from cs.LG) [pdf, other]
Title: RLSbench: Domain Adaptation Under Relaxed Label Shift
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Despite the emergence of principled methods for domain adaptation under label shift, the sensitivity of these methods for minor shifts in the class conditional distributions remains precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with shifts in label proportions. While several papers attempt to adapt these heuristics to accommodate shifts in label proportions, inconsistencies in evaluation criteria, datasets, and baselines, make it hard to assess the state of the art. In this paper, we introduce RLSbench, a large-scale relaxed label shift benchmark, consisting of >500 distribution shift pairs that draw on 14 datasets across vision, tabular, and language modalities and compose them with varying label proportions. First, we evaluate 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most deep domain adaptation heuristics: (i) pseudo-balance the data at each epoch; and (ii) adjust the final classifier with (an estimate of) target label distribution. The meta-algorithm improves existing domain adaptation heuristics often by 2--10\% accuracy points under extreme label proportion shifts and has little (i.e., <0.5\%) effect when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmi-lab/RLSbench.

Replacements for Tue, 7 Feb 23

[47]  arXiv:2009.07703 (replaced) [pdf, other]
Title: Efficient Variational Bayes Learning of Graphical Models with Smooth Structural Changes
Journal-ref: IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2023)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[48]  arXiv:2105.10590 (replaced) [pdf, other]
Title: Parallelizing Contextual Bandits
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM)
[49]  arXiv:2107.00371 (replaced) [pdf, other]
Title: Sparse GCA and Thresholded Gradient Descent
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[50]  arXiv:2111.03289 (replaced) [pdf, ps, other]
Title: Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs
Comments: accepted to neurips'22
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[51]  arXiv:2112.09036 (replaced) [pdf, other]
Title: The Dual PC Algorithm and the Role of Gaussianity for Structure Learning of Bayesian Networks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
[52]  arXiv:2201.12064 (replaced) [pdf, other]
Title: Multiscale Graph Comparison via the Embedded Laplacian Discrepancy
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[53]  arXiv:2202.04912 (replaced) [pdf, other]
Title: Random Forest Weighted Local Fréchet Regression
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[54]  arXiv:2205.13496 (replaced) [pdf, other]
Title: Censored Quantile Regression Neural Networks for Distribution-Free Survival Analysis
Comments: Published in NeurIPS 2022
Journal-ref: NeurIPS 2022
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[55]  arXiv:2209.12651 (replaced) [pdf, other]
Title: Learning Variational Models with Unrolling and Bilevel Optimization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[56]  arXiv:2209.13570 (replaced) [pdf, other]
Title: Hierarchical Sliced Wasserstein Distance
Comments: Accepted to ICLR 2023, 29 pages, 8 figures, 3 tables,
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[57]  arXiv:2211.09403 (replaced) [pdf, other]
Title: Learning Mixtures of Markov Chains and MDPs
Comments: 51 pages (13 page paper, 38 page appendix). Paper restructured and refined, corrections made to proofs, experiments added
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[58]  arXiv:2211.10747 (replaced) [pdf, other]
Title: Exploring validation metrics for offline model-based optimisation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[59]  arXiv:2212.09178 (replaced) [pdf, ps, other]
Title: Support Vector Regression: Risk Quadrangle Framework
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[60]  arXiv:2212.12749 (replaced) [pdf, other]
Title: Deep Latent State Space Models for Time-Series Generation
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[61]  arXiv:2301.09479 (replaced) [pdf, other]
Title: Modality-Agnostic Variational Compression of Implicit Neural Representations
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[62]  arXiv:2301.13112 (replaced) [pdf, other]
Title: Benchmarking optimality of time series classification methods in distinguishing diffusions
Comments: 21 pages, 8 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[63]  arXiv:1809.02727 (replaced) [pdf, ps, other]
Title: Decentralized Differentially Private Without-Replacement Stochastic Gradient Descent
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[64]  arXiv:2002.01444 (replaced) [pdf, other]
Title: Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem
Comments: 27 pages, 6 figures, with additional experiments exploiting sparsity
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
[65]  arXiv:2004.04464 (replaced) [pdf, other]
Title: A Characteristic Function for Shapley-Value-Based\\Attribution of Anomaly Scores
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[66]  arXiv:2005.01026 (replaced) [pdf, other]
Title: Multi-Center Federated Learning: Clients Clustering for Better Personalization
Comments: This paper has two duplicated versions: 2005.01026 and 2108.08647. The first one 2005.01026 is the right one, and the second one 2108.08647 should be deleted because it always causes misoperating
Journal-ref: World Wide Web,26,(2003),481-500
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[67]  arXiv:2009.12682 (replaced) [pdf, other]
Title: Decision-Aware Conditional GANs for Time Series Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[68]  arXiv:2102.11050 (replaced) [pdf, other]
Title: Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization
Authors: Rad Niazadeh (1), Negin Golrezaei (2), Joshua Wang (3), Fransisca Susan (4), Ashwinkumar Badanidiyuru (3) ((1) Chicago Booth School of Business, Operations Management, (2) MIT Sloan School of Management, Operations Management, (3) Google Research Mountain View, (4) MIT Operations Research Center)
Comments: 87 pages, 2 figures. Management Science (2022)
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC); Machine Learning (stat.ML)
[69]  arXiv:2106.01128 (replaced) [pdf, other]
Title: Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[70]  arXiv:2111.04964 (replaced) [pdf, other]
Title: On Representation Knowledge Distillation for Graph Neural Networks
Comments: IEEE Transactions on Neural Networks and Learning Representation (TNNLS), Special Issue on Deep Neural Networks for Graphs: Theory, Models, Algorithms and Applications
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[71]  arXiv:2112.10753 (replaced) [pdf, other]
Title: Strong Consistency and Rate of Convergence of Switched Least Squares System Identification for Autonomous Markov Jump Linear Systems
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
[72]  arXiv:2202.01666 (replaced) [pdf, other]
Title: Proportional Fairness in Federated Learning
Comments: Accepted at TMLR 2023, typos fixed
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
[73]  arXiv:2203.00614 (replaced) [pdf, other]
Title: Side Effects of Learning from Low-dimensional Data Embedded in a Euclidean Space
Comments: 53 pages (11 pages for Appendix), 24 figures
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[74]  arXiv:2204.07879 (replaced) [pdf, ps, other]
Title: Polynomial-time sparse measure recovery
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[75]  arXiv:2206.02617 (replaced) [pdf, other]
Title: Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[76]  arXiv:2206.02659 (replaced) [pdf, other]
Title: Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees
Comments: 36 pages, 5 figures, 8 tables (Fixed typos). ICML 2022
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Statistics Theory (math.ST); Machine Learning (stat.ML)
[77]  arXiv:2206.12680 (replaced) [pdf, other]
Title: Topology-aware Generalization of Decentralized SGD
Comments: Accepted for publication in the 39th International Conference on Machine Learning (ICML 2022)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[78]  arXiv:2210.00635 (replaced) [pdf, other]
Title: Robust Empirical Risk Minimization with Tolerance
Comments: 22 pages, 1 figure, To appear at ALT'23
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[79]  arXiv:2210.00895 (replaced) [pdf, other]
Title: On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits
Authors: Antoine Barrier (UMPA-ENSL, LMO, CELESTE), Aurélien Garivier (UMPA-ENSL, LIP), Gilles Stoltz (LMO, CELESTE)
Journal-ref: ALT 2023 - The 34th International Conference on Algorithmic Learning Theory, Feb 2023, Singapour, Singapore
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
[80]  arXiv:2210.06819 (replaced) [pdf, other]
Title: Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence
Comments: 14 pages in main text; 51 pages including bibliography and appendix. Published in Transcation on Machine Learning Research(TMLR), 2023. this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[81]  arXiv:2211.08572 (replaced) [pdf, other]
Title: Bayesian Fixed-Budget Best-Arm Identification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[82]  arXiv:2211.09259 (replaced) [pdf, other]
Title: The Missing Indicator Method: From Low to High Dimensions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[83]  arXiv:2211.14296 (replaced) [pdf, other]
Title: A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation
Comments: Accepted at ICLR2023 (notable-top-25%), Website: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
[84]  arXiv:2211.14908 (replaced) [pdf, other]
Title: A Permutation-free Kernel Two-Sample Test
Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), with an oral presentation
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[85]  arXiv:2301.07067 (replaced) [pdf, other]
Title: Transformers as Algorithms: Generalization and Stability in In-context Learning
Comments: Revised version significantly improves the stability guarantees and provides new experiments
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[86]  arXiv:2301.12003 (replaced) [pdf, other]
Title: Minimizing Trajectory Curvature of ODE-based Generative Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[87]  arXiv:2301.13857 (replaced) [pdf, other]
Title: Learning in POMDPs is Sample-Efficient with Hindsight Observability
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[88]  arXiv:2302.00704 (replaced) [pdf, other]
Title: Pathologies of Predictive Diversity in Deep Ensembles
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[89]  arXiv:2302.00814 (replaced) [pdf, other]
Title: Stochastic Contextual Bandits with Long Horizon Rewards
Comments: 47 pages, to appear at AAAI 2023
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[90]  arXiv:2302.01425 (replaced) [pdf, other]
Title: Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
Comments: 23 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[ total of 90 entries: 1-90 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, stat, recent, 2302, contact, help  (Access key information)