Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 3 Feb 23
 [1] arXiv:2302.00695 [pdf, other]

Title: Versatile EnergyBased Models for High Energy PhysicsComments: 16 pages, 8 figuresSubjects: Machine Learning (cs.LG); High Energy Physics  Experiment (hepex); High Energy Physics  Phenomenology (hepph); Machine Learning (stat.ML)
Energybased models have the natural advantage of flexibility in the form of the energy function. Recently, energybased models have achieved great success in modeling highdimensional data in computer vision and natural language processing. In accordance with these signs of progress, we build a versatile energybased model for High Energy Physics events at the Large Hadron Collider. This framework builds on a powerful generative model and describes higherorder interparticle interactions. It suits different encoding architectures and builds on implicit generation. As for applicational aspects, it can serve as a powerful parameterized event generator, a generic anomalous signal detector, and an augmented event classifier.
 [2] arXiv:2302.00704 [pdf, other]

Title: Pathologies of Predictive Diversity in Deep EnsemblesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Classical results establish that ensembles of small models benefit when predictive diversity is encouraged, through bagging, boosting, and similar. Here we demonstrate that this intuition does not carry over to ensembles of deep neural networks used for classification, and in fact the opposite can be true. Unlike regression models or small (unconfident) classifiers, predictions from large (confident) neural networks concentrate in vertices of the probability simplex. Thus, decorrelating these points necessarily moves the ensemble prediction away from vertices, harming confidence and moving points across decision boundaries. Through large scale experiments, we demonstrate that diversityencouraging regularizers hurt the performance of highcapacity deep ensembles used for classification. Even more surprisingly, discouraging predictive diversity can be beneficial. Together this work strongly suggests that the best strategy for deep ensembles is utilizing more accurate, but likely less diverse, component models.
 [3] arXiv:2302.00709 [pdf, other]

Title: Riemannian Stochastic Approximation for Minimizing Tame Nonsmooth Objective FunctionsSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
In many learning applications, the parameters in a model are structurally constrained in a way that can be modeled as them lying on a Riemannian manifold. Riemannian optimization, wherein procedures to enforce an iterative minimizing sequence to be constrained to the manifold, is used to train such models. At the same time, tame geometry has become a significant topological description of nonsmooth functions that appear in the landscapes of training neural networks and other important models with structural compositions of continuous nonlinear functions with nonsmooth maps. In this paper, we study the properties of such stratifiable functions on a manifold and the behavior of retracted stochastic gradient descent, with diminishing stepsizes, for minimizing such functions.
 [4] arXiv:2302.00713 [pdf, other]

Title: The WeisfeilerLehman Distance: Reinterpretation and Connection with GNNsSubjects: Machine Learning (cs.LG)
In this paper, we present a novel interpretation of the socalled WeisfeilerLehman (WL) distance, introduced by Chen et al. (2022), using concepts from stochastic processes. The WL distance aims at comparing graphs with node features, has the same discriminative power as the classic WeisfeilerLehman graph isomorphism test and has deep connections to the GromovWasserstein distance. This new interpretation connects the WL distance to the literature on distances for stochastic processes, which also makes the interpretation of the distance more accessible and intuitive. We further explore the connections between the WL distance and certain Message Passing Neural Networks, and discuss the implications of the WL distance for understanding the Lipschitz property and the universal approximation results for these networks.
 [5] arXiv:2302.00722 [pdf, other]

Title: A Survey of Deep Learning: From Activations to TransformersJournalref: Neural Processing Letters (2022)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep learning has made tremendous progress in the last decade. A key success factor is the large amount of architectures, layers, objectives, and optimization techniques that have emerged in recent years. They include a myriad of variants related to attention, normalization, skip connection, transformer and selfsupervised learning schemes  to name a few. We provide a comprehensive overview of the most important, recent works in these areas to those who already have a basic understanding of deep learning. We hope that a holistic and unified treatment of influential, recent works helps researchers to form new connections between diverse areas of deep learning.
 [6] arXiv:2302.00727 [pdf, ps, other]

Title: Sample Complexity of KernelBased QLearningAuthors: SingYuan Yeh, FuChieh Chang, ChangWei Yueh, PeiYuan Wu, Alberto Bernacchia, Sattar VakiliSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Modern reinforcement learning (RL) often faces an enormous stateaction space. Existing analytical results are typically for settings with a small number of stateactions, or simple models such as linearly modeled Qfunctions. To derive statistically efficient RL policies handling large stateaction spaces, with more general Qfunctions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Qlearning when a generative model exists. We propose a nonparametric Qlearning algorithm which finds an $\epsilon$optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $\epsilon$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.
 [7] arXiv:2302.00736 [pdf, ps, other]

Title: Approximating the Shapley Value without Marginal ContributionsSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
The Shapley value is arguably the most popular approach for assigning a meaningful contribution value to players in a cooperative game, which has recently been used intensively in various areas of machine learning, most notably in explainable artificial intelligence. The meaningfulness is due to axiomatic properties that only the Shapley value satisfies, which, however, comes at the expense of an exact computation growing exponentially with the number of agents. Accordingly, a number of works are devoted to the efficient approximation of the Shapley values, all of which revolve around the notion of an agent's marginal contribution. In this paper, we propose with SVARM and Stratified SVARM two parameterfree and domainindependent approximation algorithms based on a representation of the Shapley value detached from the notion of marginal contributions. We prove unmatched theoretical guarantees regarding their approximation quality and provide satisfying empirical results.
 [8] arXiv:2302.00747 [pdf, other]

Title: Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor AttacksSubjects: Machine Learning (cs.LG)
Deep learning models achieve excellent performance in numerous machine learning tasks. Yet, they suffer from securityrelated issues such as adversarial examples and poisoning (backdoor) attacks. A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. Then, a backdoored model performs as expected when receiving a clean input, but it misclassifies when receiving a backdoored input stamped with a predesigned pattern called "trigger". Unfortunately, it is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger. This paper proposes a backdoor detection method by utilizing a special type of adversarial attack, universal adversarial perturbation (UAP), and its similarities with a backdoor trigger. We observe an intuitive phenomenon: UAPs generated from backdoored models need fewer perturbations to mislead the model than UAPs from clean models. UAPs of backdoored models tend to exploit the shortcut from all classes to the target class, built by the backdoor trigger. We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs. Experiments on 345 models trained on several datasets show that USB effectively detects the injected backdoor and provides comparable or better results than stateoftheart methods.
 [9] arXiv:2302.00763 [pdf, other]

Title: Collaborating with language models for embodied reasoningAuthors: Ishita Dasgupta, Christine KaeserChen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, Rob FergusComments: Presented at NeurIPS 2022 Language and Reinforcement Learning Workshop (best paper) and NeurIPS 2022 Foundation Models for Decision Making Workshop. 4 pages main; 14 pages total (including references and appendix); 3 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through incontext learning. However, LSLMs do not inherently have the ability to interrogate or intervene on the environment. In this work, we investigate how to combine these complementary abilities in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pretrained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command. We present a set of tasks that require reasoning, test this system's ability to generalize zeroshot and investigate failure cases, and demonstrate how components of this system can be trained with reinforcementlearning to improve performance.
 [10] arXiv:2302.00766 [pdf, other]

Title: Privacy Risk for anisotropic Langevin dynamics using relative entropy boundsSubjects: Machine Learning (cs.LG)
The privacy preserving properties of Langevin dynamics with additive isotropic noise have been extensively studied. However, the isotropic noise assumption is very restrictive: (a) when adding noise to existing learning algorithms to preserve privacy and maintain the best possible accuracy one should take into account the relative magnitude of the outputs and their correlations; (b) popular algorithms such as stochastic gradient descent (and their continuous time limits) appear to possess anisotropic covariance properties. To study the privacy risks for the anisotropic noise case, one requires general results on the relative entropy between the laws of two Stochastic Differential Equations with different drifts and diffusion coefficients. Our main contribution is to establish such a bound using stability estimates for solutions to the FokkerPlanck equations via functional inequalities. With additional assumptions, the relative entropy bound implies an $(\epsilon,\delta)$differential privacy bound. We discuss the practical implications of our bound related to privacy risk in different contexts.Finally, the benefits of anisotropic noise are illustrated using numerical results on optimising a quadratic loss or calibrating a neural network.
 [11] arXiv:2302.00775 [pdf, other]

Title: Model Monitoring and Robustness of InUse Machine Learning Models: Quantifying Data Distribution Shifts Using Population Stability IndexSubjects: Machine Learning (cs.LG)
Safety goes first. Meeting and maintaining industry safety standards for robustness of artificial intelligence (AI) and machine learning (ML) models require continuous monitoring for faults and performance drops. Deep learning models are widely used in industrial applications, e.g., computer vision, but the susceptibility of their performance to environment changes (e.g., noise) \emph{after deployment} on the product, are now wellknown. A major challenge is detecting data distribution shifts that happen, comparing the following: {\bf (i)} development stage of AI and ML models, i.e., train/validation/test, to {\bf (ii)} deployment stage on the product (i.e., even after `testing') in the environment. We focus on a computer vision example related to autonomous driving and aim at detecting shifts that occur as a result of adding noise to images. We use the population stability index (PSI) as a measure of presence and intensity of shift and present results of our empirical experiments showing a promising potential for the PSI. We further discuss multiple aspects of model monitoring and robustness that need to be analyzed \emph{simultaneously} to achieve robustness for industry safety standards. We propose the need for and the research direction toward \emph{categorizations} of problem classes and examples where monitoring for robustness is required and present challenges and pointers for future work from a \emph{practical} perspective.
 [12] arXiv:2302.00787 [pdf, other]

Title: FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random FeaturesAuthors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian WellerSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, nontrigonometric RFs which approximate Gaussian and softmaxkernels. In contrast to traditional RF approximations, parameters of these new methods can be optimized to reduce the variance of the approximation, and the optimum can be expressed in closed form. We show that our methods lead to variance reduction in practice ($e^{10}$times smaller variance and beyond) and outperform previous methods in a kernel regression task. Using our proposed mechanism, we also present FAVOR#, a method for selfattention approximation in Transformers. We show that FAVOR# outperforms other random feature methods in speech modelling and natural language processing.
 [13] arXiv:2302.00789 [pdf, other]

Title: Variational Autoencoder Learns Better Feature Representations for EEGbased Obesity ClassificationComments: 8 pages, 6 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Obesity is a common issue in modern societies today that can lead to various diseases and significantly reduced quality of life. Currently, research has been conducted to investigate resting state EEG (electroencephalogram) signals with an aim to identify possible neurological characteristics associated with obesity. In this study, we propose a deep learningbased framework to extract the resting state EEG features for obese and lean subject classification. Specifically, a novel variational autoencoder framework is employed to extract subjectinvariant features from the raw EEG signals, which are then classified by a 1D convolutional neural network. Comparing with conventional machine learning and deep learning methods, we demonstrate the superiority of using VAE for feature extraction, as reflected by the significantly improved classification accuracies, better visualizations and reduced impurity measures in the feature representations. Future work can be directed to gaining an indepth understanding regarding the spatial patterns that have been learned by the proposed model from a neurological view, as well as improving the interpretability of the proposed model by allowing it to uncover any temporalrelated information.
 [14] arXiv:2302.00794 [pdf]

Title: Using Machine Learning to Develop Smart Reflex Testing ProtocolsSubjects: Machine Learning (cs.LG); Quantitative Methods (qbio.QM)
Objective: Reflex testing protocols allow clinical laboratories to perform second line diagnostic tests on existing specimens based on the results of initially ordered tests. Reflex testing can support optimal clinical laboratory test ordering and diagnosis. In current clinical practice, reflex testing typically relies on simple "ifthen" rules; however, this limits their scope since most test ordering decisions involve more complexity than a simple rule will allow. Here, using the analyte ferritin as an example, we propose an alternative machine learningbased approach to "smart" reflex testing with a wider scope and greater impact than traditional rulebased approaches. Methods: Using patient data, we developed a machine learning model to predict whether a patient getting CBC testing will also have ferritin testing ordered, consider applications of this model to "smart" reflex testing, and evaluate the model by comparing its performance to possible rulebased approaches. Results: Our underlying machine learning models performed moderately well in predicting ferritin test ordering and demonstrated greater suitability to reflex testing than rulebased approaches. Using chart review, we demonstrate that our model may improve ferritin test ordering. Finally, as a secondary goal, we demonstrate that ferritin test results are missing not at random (MNAR), a finding with implications for unbiased imputation of missing test results. Conclusions: Machine learning may provide a foundation for new types of reflex testing with enhanced benefits for clinical diagnosis and laboratory utilization management.
 [15] arXiv:2302.00806 [pdf, other]

Title: OraclePreserving Latent FlowsComments: 9 pages, 8 figuresSubjects: Machine Learning (cs.LG); High Energy Physics  Phenomenology (hepph); Group Theory (math.GR); Machine Learning (stat.ML)
We develop a deep learning methodology for the simultaneous discovery of multiple nontrivial continuous symmetries across an entire labelled dataset. The symmetry transformations and the corresponding generators are modeled with fully connected neural networks trained with a specially constructed loss function ensuring the desired symmetry properties. The two new elements in this work are the use of a reduceddimensionality latent space and the generalization to transformations invariant with respect to highdimensional oracles. The method is demonstrated with several examples on the MNIST digit dataset.
 [16] arXiv:2302.00808 [pdf, other]

Title: AverageConstrained Policy OptimizationComments: 18 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Reinforcement Learning (RL) with constraints is becoming an increasingly important problem for various applications. Often, the average criterion is more suitable. Yet, RL for average criterionconstrained MDPs remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new (possibly the first) policy optimization algorithm for constrained MDPs with the average criterion. The AverageConstrained Policy Optimization (ACPO) algorithm is inspired by the famed PPOtype algorithms based on trust region methods. We develop basic sensitivity theory for average MDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging MuJoCo environments, show the superior performance of the algorithm when compared to other stateoftheart algorithms adapted for the average CMDP setting.
 [17] arXiv:2302.00814 [pdf, other]

Title: Stochastic Contextual Bandits with Long Horizon RewardsComments: 47 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
The growing interest in complex decisionmaking and language modeling problems highlights the importance of sampleefficient learning over very long horizons. This work takes a step in this direction by investigating contextual linear bandits where the current reward depends on at most $s$ prior actions and contexts (not necessarily consecutive), up to a time horizon of $h$. In order to avoid polynomial dependence on $h$, we propose new algorithms that leverage sparsity to discover the dependence pattern and arm parameters jointly. We consider both the datapoor ($T<h$) and datarich ($T\ge h$) regimes, and derive respective regret upper bounds $\tilde O(d\sqrt{sT} +\min\{ q, T\})$ and $\tilde O(\sqrt{sdT})$, with sparsity $s$, feature dimension $d$, total time horizon $T$, and $q$ that is adaptive to the reward dependence pattern. Complementing upper bounds, we also show that learning over a single trajectory brings inherent challenges: While the dependence pattern and arm parameters form a rank1 matrix, circulant matrices are not isometric over rank1 manifolds and sample complexity indeed benefits from the sparse reward dependence structure. Our results necessitate a new analysis to address longrange temporal dependencies across data and avoid polynomial dependence on the reward horizon $h$. Specifically, we utilize connections to the restricted isometry property of circulant matrices formed by dependent subGaussian vectors and establish new guarantees that are also of independent interest.
 [18] arXiv:2302.00817 [pdf, other]

Title: Recurrent Graph Convolutional Networks for Spatiotemporal Prediction of Snow Accumulation Using Airborne RadarSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
The accurate prediction and estimation of annual snow accumulation has grown in importance as we deal with the effects of climate change and the increase of global atmospheric temperatures. Airborne radar sensors, such as the Snow Radar, are able to measure accumulation rate patterns at a largescale and monitor the effects of ongoing climate change on Greenland's precipitation and runoff. The Snow Radar's use of an ultrawide bandwidth enables a fine vertical resolution that helps in capturing internal ice layers. Given the amount of snow accumulation in previous years using the radar data, in this paper, we propose a machine learning model based on recurrent graph convolutional networks to predict the snow accumulation in recent consecutive years at a certain location. We found that the model performs better and with more consistency than equivalent nongeometric and nontemporal models.
 [19] arXiv:2302.00834 [pdf, ps, other]

Title: Sharp Lower Bounds on Interpolation by Deep ReLU Neural Networks at Irregularly Spaced DataAuthors: Jonathan W. SiegelSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We study the interpolation, or memorization, power of deep ReLU neural networks. Specifically, we consider the question of how efficiently, in terms of the number of parameters, deep ReLU networks can interpolate values at $N$ datapoints in the unit ball which are separated by a distance $\delta$. We show that $\Omega(N)$ parameters are required in the regime where $\delta$ is exponentially small in $N$, which gives the sharp result in this regime since $O(N)$ parameters are always sufficient. This also shows that the bitextraction technique used to prove lower bounds on the VC dimension cannot be applied to irregularly spaced datapoints.
 [20] arXiv:2302.00839 [pdf, other]

Title: Fast Online ValueMaximizing Prediction Sets with Conformal Cost ControlSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Many realworld multilabel prediction problems involve setvalued predictions that must satisfy specific requirements dictated by downstream usage. We focus on a typical scenario where such requirements, separately encoding \textit{value} and \textit{cost}, compete with each other. For instance, a hospital might expect a smart diagnosis system to capture as many severe, often comorbid, diseases as possible (the value), while maintaining strict control over incorrect predictions (the cost). We present a general pipeline, dubbed as FavMac, to maximize the value while controlling the cost in such scenarios. FavMac can be combined with almost any multilabel classifier, affording distributionfree theoretical guarantees on cost control. Moreover, unlike prior works, FavMac can handle realworld largescale applications via a carefully designed online update mechanism, which is of independent interest. Our methodological and theoretical contributions are supported by experiments on several healthcare tasks and synthetic datasets  FavMac furnishes higher value compared with several variants and baselines while maintaining strict cost control.
 [21] arXiv:2302.00845 [pdf, other]

Title: Scale up with Order: Finding Good Data Permutations for Distributed TrainingSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Gradient Balancing (GraB) is a recently proposed technique that finds provably better data permutations when training models with multiple epochs over a finite dataset. It converges at a faster rate than the widely adopted Random Reshuffling, by minimizing the discrepancy of the gradients on adjacently selected examples. However, GraB only operates under critical assumptions such as small batch sizes and centralized data, leaving open the question of how to order examples at large scale  i.e. distributed learning with decentralized data. To alleviate the limitation, in this paper we propose DGraB that involves two novel designs: (1) $\textsf{PairBalance}$ that eliminates the requirement to use stale gradient mean in GraB which critically relies on small learning rates; (2) an ordering protocol that runs $\textsf{PairBalance}$ in a distributed environment with negligible overhead, which benefits from both data ordering and parallelism. We prove DGraB enjoys linear speed up at rate $\tilde{O}((mnT)^{2/3})$ on smooth nonconvex objectives and $\tilde{O}((mnT)^{2})$ under PL condition, where $n$ denotes the number of parallel workers, $m$ denotes the number of examples per worker and $T$ denotes the number of epochs. Empirically, we show on various applications including GLUE, CIFAR10 and WikiText2 that DGraB outperforms naive parallel GraB and Distributed Random Reshuffling in terms of both training and validation performance.
 [22] arXiv:2302.00848 [pdf, other]

Title: Causal Effect Estimation: Recent Advances, Challenges, and OpportunitiesSubjects: Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Causal inference has numerous realworld applications in many domains, such as health care, marketing, political science, and online advertising. Treatment effect estimation, a fundamental problem in causal inference, has been extensively studied in statistics for decades. However, traditional treatment effect estimation methods may not well handle largescale and highdimensional heterogeneous data. In recent years, an emerging research direction has attracted increasing attention in the broad artificial intelligence field, which combines the advantages of traditional treatment effect estimation approaches (e.g., propensity score, matching, and reweighing) and advanced machine learning approaches (e.g., representation learning, adversarial learning, and graph neural networks). Although the advanced machine learning approaches have shown extraordinary performance in treatment effect estimation, it also comes with a lot of new topics and new research questions. In view of the latest research efforts in the causal inference field, we provide a comprehensive discussion of challenges and opportunities for the three core components of the treatment effect estimation task, i.e., treatment, covariates, and outcome. In addition, we showcase the promising research directions of this topic from multiple perspectives.
 [23] arXiv:2302.00849 [pdf, other]

Title: Implicit regularization in Heavyball momentum accelerated stochastic gradient descentJournalref: International Conference on Learning Representations (ICLR2023)Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
It is well known that the finite stepsize ($h$) in Gradient Descent (GD) implicitly regularizes solutions to flatter minima. A natural question to ask is "Does the momentum parameter $\beta$ play a role in implicit regularization in Heavyball (H.B) momentum accelerated gradient descent (GD+M)?". To answer this question, first, we show that the discrete H.B momentum update (GD+M) follows a continuous trajectory induced by a modified loss, which consists of an original loss and an implicit regularizer. Then, we show that this implicit regularizer for (GD+M) is stronger than that of (GD) by factor of $(\frac{1+\beta}{1\beta})$, thus explaining why (GD+M) shows better generalization performance and higher test accuracy than (GD). Furthermore, we extend our analysis to the stochastic version of gradient descent with momentum (SGD+M) and characterize the continuous trajectory of the update of (SGD+M) in a pointwise sense. We explore the implicit regularization in (SGD+M) and (GD+M) through a series of experiments validating our theory.
 [24] arXiv:2302.00854 [pdf, other]

Title: Learning PDE Solution Operator for Continuous Modeling of TimeSeriesSubjects: Machine Learning (cs.LG)
Learning underlying dynamics from data is important and challenging in many realworld scenarios. Incorporating differential equations (DEs) to design continuous networks has drawn much attention recently, however, most prior works make specific assumptions on the type of DEs, making the model specialized for particular problems. This work presents a partial differential equation (PDE) based framework which improves the dynamics modeling capability. Building upon the recent Fourier neural operator, we propose a neural operator that can handle time continuously without requiring iterative operations or specific grids of temporal discretization. A theoretical result demonstrating its universality is provided. We also uncover an intrinsic property of neural operators that improves data efficiency and model generalization by ensuring stability. Our model achieves superior accuracy in dealing with timedependent PDEs compared to existing models. Furthermore, several numerical pieces of evidence validate that our method better represents a wide range of dynamics and outperforms stateoftheart DEbased models in realtimeseries applications. Our framework opens up a new way for a continuous representation of neural networks that can be readily adopted for realworld applications.
 [25] arXiv:2302.00857 [pdf, other]

Title: Algorithm Design for Online MetaLearning with Task Boundary DetectionComments: Submitted for publicationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Online metalearning has recently emerged as a marriage between batch metalearning and online learning, for achieving the capability of quick adaptation on new tasks in a lifelong manner. However, most existing approaches focus on the restrictive setting where the distribution of the online tasks remains fixed with known task boundaries. In this work, we relax these assumptions and propose a novel algorithm for taskagnostic online metalearning in nonstationary environments. More specifically, we first propose two simple but effective detection mechanisms of task switches and distribution shift based on empirical observations, which serve as a key building block for more elegant online model updates in our algorithm: the task switch detection mechanism allows reusing of the best model available for the current task at hand, and the distribution shift detection mechanism differentiates the meta model update in order to preserve the knowledge for indistribution tasks and quickly learn the new knowledge for outofdistribution tasks. In particular, our online meta model updates are based only on the current data, which eliminates the need of storing previous data as required in most existing methods. We further show that a sublinear taskaveraged regret can be achieved for our algorithm under mild conditions. Empirical studies on three different benchmarks clearly demonstrate the significant advantage of our algorithm over related baseline approaches.
 [26] arXiv:2302.00861 [pdf, other]

Title: SimMTM: A Simple PreTraining Framework for Masked TimeSeries ModelingSubjects: Machine Learning (cs.LG)
Time series analysis is widely used in extensive areas. Recently, to reduce labeling expenses and benefit various tasks, selfsupervised pretraining has attracted immense interest. One mainstream paradigm is masked modeling, which successfully pretrains deep models by learning to reconstruct the masked content based on the unmasked part. However, since the semantic information of time series is mainly contained in temporal variations, the standard way of randomly masking a portion of time points will ruin vital temporal variations of time series seriously, making the reconstruction task too difficult to guide representation learning. We thus present SimMTM, a Simple pretraining framework for Masked Timeseries Modeling. By relating masked modeling to manifold learning, SimMTM proposes to recover masked time points by the weighted aggregation of multiple neighbors outside the manifold, which eases the reconstruction task by assembling ruined but complementary temporal variations from multiple masked series. SimMTM further learns to uncover the local structure of the manifold helpful for masked modeling. Experimentally, SimMTM achieves stateoftheart finetuning performance in two canonical time series analysis tasks: forecasting and classification, covering both in and crossdomain settings.
 [27] arXiv:2302.00864 [pdf, other]

Title: CLIPood: Generalizing CLIP to OutofDistributionsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Outofdistribution (OOD) generalization, where the model needs to handle distribution shifts from training, is a major challenge of machine learning. Recently, contrastive languageimage pretraining (CLIP) models have shown impressive zeroshot ability, revealing a promising path toward OOD generalization. However, to boost upon zeroshot performance, further adaptation of CLIP on downstream tasks is indispensable but undesirably degrades OOD generalization ability. In this paper, we aim at generalizing CLIP to outofdistribution test data on downstream tasks. Beyond the two canonical OOD situations, domain shift and open class, we tackle a more general but difficult inthewild setting where both OOD situations may occur on the unseen test data. We propose CLIPood, a simple finetuning method that can adapt CLIP models to all OOD situations. To exploit semantic relations between classes from the text modality, CLIPood introduces a new training objective, margin metric softmax (MMS), with class adaptive margins for finetuning. Moreover, to incorporate both the pretrained zeroshot model and the finetuned taskadaptive model, CLIPood proposes a new Beta moving average (BMA) to maintain a temporal ensemble according to Beta distribution. Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
 [28] arXiv:2302.00869 [pdf, other]

Title: Disentanglement of Latent Representations via Sparse Causal InterventionsComments: 16 pages, 10 pages for the main paper and 6 pages for the supplement, 14 figures, submitted to IJCAI 2023Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Discrete Mathematics (cs.DM); Methodology (stat.ME)
The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vectorquantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and realworld disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on highlevel semantic attributes of an image without affecting its quality, even with imbalanced data distributions.
 [29] arXiv:2302.00872 [pdf, other]

Title: Reliable Prediction Intervals with Directly Optimized Inductive Conformal Regression for Deep LearningSubjects: Machine Learning (cs.LG)
By generating prediction intervals (PIs) to quantify the uncertainty of each prediction in deep learning regression, the risk of wrong predictions can be effectively controlled. Highquality PIs need to be as narrow as possible, whilst covering a preset proportion of real labels. At present, many approaches to improve the quality of PIs can effectively reduce the width of PIs, but they do not ensure that enough real labels are captured. Inductive Conformal Predictor (ICP) is an algorithm that can generate effective PIs which is theoretically guaranteed to cover a preset proportion of data. However, typically ICP is not directly optimized to yield minimal PI width. However, in this study, we use Directly Optimized Inductive Conformal Regression (DOICR) that takes only the average width of PIs as the loss function and increases the quality of PIs through an optimized scheme under the validity condition that sufficient real labels are captured in the PIs. Benchmark experiments show that DOICR outperforms current stateoftheart algorithms for regression problems using underlying Deep Neural Network structures for both tabular and image data.
 [30] arXiv:2302.00873 [pdf, other]

Title: Predicting the Silent Majority on Graphs: Knowledge Transferable Graph Neural NetworkComments: accepted by WWW2023Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Graphs consisting of vocal nodes ("the vocal minority") and silent nodes ("the silent majority"), namely VSGraph, are ubiquitous in the real world. The vocal nodes tend to have abundant features and labels. In contrast, silent nodes only have incomplete features and rare labels, e.g., the description and political tendency of politicians (vocal) are abundant while not for ordinary people (silent) on the twitter's social network. Predicting the silent majority remains a crucial yet challenging problem. However, most existing messagepassing based GNNs assume that all nodes belong to the same domain, without considering the missing features and distributionshift between domains, leading to poor ability to deal with VSGraph. To combat the above challenges, we propose Knowledge Transferable Graph Neural Network (KTGNN), which models distribution shifts during message passing and representation learning by transferring knowledge from vocal nodes to silent nodes. Specifically, we design the domainadapted "feature completion and message passing mechanism" for node representation learning while preserving domain difference. And a knowledge transferable classifier based on KLdivergence is followed. Comprehensive experiments on realworld scenarios (i.e., company financial risk assessment and political elections) demonstrate the superior performance of our method. Our source code has been open sourced.
 [31] arXiv:2302.00880 [pdf, other]

Title: Empirical Analysis of the AdaBoost's Error BoundComments: 4 pages, 4 figuresSubjects: Machine Learning (cs.LG)
Understanding the accuracy limits of machine learning algorithms is essential for data scientists to properly measure performance so they can continually improve their models' predictive capabilities. This study empirically verified the error bound of the AdaBoost algorithm for both synthetic and realworld data. The results show that the error bound holds up in practice, demonstrating its efficiency and importance to a variety of applications. The corresponding source code is available at https://github.com/armanbolatov/adaboost_error_bound.
 [32] arXiv:2302.00890 [pdf, other]

Title: Neural Common Neighbor with Completion for Link PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Despite its outstanding performance in various graph tasks, vanilla Message Passing Neural Network (MPNN) usually fails in link prediction tasks, as it only uses representations of two individual target nodes and ignores the pairwise relation between them. To capture the pairwise relations, some models add manual features to the input graph and use the output of MPNN to produce pairwise representations. In contrast, others directly use manual features as pairwise representations. Though this simplification avoids applying a GNN to each link individually and thus improves scalability, these models still have much room for performance improvement due to the handcrafted and unlearnable pairwise features. To upgrade performance while maintaining scalability, we propose Neural Common Neighbor (NCN), which uses learnable pairwise representations. To further boost NCN, we study the unobserved link problem. The incompleteness of the graph is ubiquitous and leads to distribution shifts between the training and test set, loss of common neighbor information, and performance degradation of models. Therefore, we propose two intervention methods: common neighbor completion and target link removal. Combining the two methods with NCN, we propose Neural Common Neighbor with Completion (NCNC). NCN and NCNC outperform recent strong baselines by large margins. NCNC achieves stateoftheart performance in link prediction tasks.
 [33] arXiv:2302.00892 [pdf, other]

Title: Quantum Graph Learning: Frontiers and OutlookSubjects: Machine Learning (cs.LG)
Quantum theory has shown its superiority in enhancing machine learning. However, facilitating quantum theory to enhance graph learning is in its infancy. This survey investigates the current advances in quantum graph learning (QGL) from three perspectives, i.e., underlying theories, methods, and prospects. We first look at QGL and discuss the mutualism of quantum theory and graph learning, the specificity of graphstructured data, and the bottleneck of graph learning, respectively. A new taxonomy of QGL is presented, i.e., quantum computing on graphs, quantum graph representation, and quantum circuits for graph neural networks. Pitfall traps are then highlighted and explained. This survey aims to provide a brief but insightful introduction to this emerging field, along with a detailed discussion of frontiers and outlook yet to be investigated.
 [34] arXiv:2302.00902 [pdf, other]

Title: Language Quantized AutoEncoders: Towards Unsupervised TextImage AlignmentComments: arXiv admin note: text overlap with arXiv:2106.13884 by other authorsSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Recent progress in scaling up large language models has shown impressive capabilities in performing fewshot learning across a wide range of textbased tasks. However, a key limitation is that these language models fundamentally lack visual perception  a crucial attribute needed to extend these models to be able to interact with the real world and solve vision tasks, such as in visualquestion answering and robotics. Prior works have largely connected image to text through pretraining and/or finetuning on curated imagetext datasets, which can be a costly and expensive process. In order to resolve this limitation, we propose a simple yet effective approach called LanguageQuantized AutoEncoder (LQAE), a modification of VQVAE that learns to align textimage data in an unsupervised manner by leveraging pretrained language models (e.g., BERT, RoBERTa). Our main idea is to encode image as sequences of text tokens by directly quantizing image embeddings using a pretrained language codebook. We then apply random masking followed by a BERT model, and have the decoder reconstruct the original image from BERT predicted text token embeddings. By doing so, LQAE learns to represent similar images with similar clusters of text tokens, thereby aligning these two modalities without the use of aligned textimage pairs. This enables fewshot image classification with large language models (e.g., GPT3) as well as linear classification of images based on BERT text features. To the best of our knowledge, our work is the first work that uses unaligned images for multimodal tasks by leveraging the power of pretrained language models.
 [35] arXiv:2302.00910 [pdf, other]

Title: Energy Efficient Training of SNN using Local Zeroth Order MethodSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Spiking neural networks are becoming increasingly popular for their low energy requirement in realworld tasks with accuracy comparable to the traditional ANNs. SNN training algorithms face the loss of gradient information and nondifferentiability due to the Heaviside function in minimizing the model loss over model parameters. To circumvent the problem surrogate method uses a differentiable approximation of the Heaviside in the backward pass, while the forward pass uses the Heaviside as the spiking function. We propose to use the zeroth order technique at the neuron level to resolve this dichotomy and use it within the automatic differentiation tool. As a result, we establish a theoretical connection between the proposed local zerothorder technique and the existing surrogate methods and viceversa. The proposed method naturally lends itself to energyefficient training of SNNs on GPUs. Experimental results with neuromorphic datasets show that such implementation requires less than 1 percent neurons to be active in the backward pass, resulting in a 100x speedup in the backward computation time. Our method offers better generalization compared to the stateoftheart energyefficient technique while maintaining similar efficiency.
 [36] arXiv:2302.00924 [pdf, other]

Title: LMC: Fast Training of GNNs via Subgraph Sampling with Provable ConvergenceSubjects: Machine Learning (cs.LG)
The message passingbased graph neural networks (GNNs) have achieved great success in many realworld applications. However, training GNNs on largescale graphs suffers from the wellknown neighbor explosion problem, i.e., the exponentially increasing dependencies of nodes with the number of message passing layers. Subgraphwise sampling methods  a promising class of minibatch training techniques  discard messages outside the minibatches in backward passes to avoid the neighbor explosion problem at the expense of gradient estimation accuracy. This poses significant challenges to their convergence analysis and convergence speeds, which seriously limits their reliable realworld applications. To address this challenge, we propose a novel subgraphwise sampling method with a convergence guarantee, namely Local Message Compensation (LMC). To the best of our knowledge, LMC is the {\it first} subgraphwise sampling method with provable convergence. The key idea of LMC is to retrieve the discarded messages in backward passes based on a message passing formulation of backward passes. By efficient and effective compensations for the discarded messages in both forward and backward passes, LMC computes accurate minibatch gradients and thus accelerates convergence. We further show that LMC converges to firstorder stationary points of GNNs. Experiments on largescale benchmark tasks demonstrate that LMC significantly outperforms stateoftheart subgraphwise sampling methods in terms of efficiency.
 [37] arXiv:2302.00928 [pdf, other]

Title: Rethinking WarmStarts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$/$\text{L}^\natural$Convex Function MinimizationSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
An emerging line of work has shown that machinelearned predictions are useful to warmstart algorithms for discrete optimization problems, such as bipartite matching. Previous studies have shown time complexity bounds proportional to some distance between a prediction and an optimal solution, which we can approximately minimize by learning predictions from past optimal solutions. However, such guarantees may not be meaningful when multiple optimal solutions exist. Indeed, the dual problem of bipartite matching and, more generally, $\text{L}$/$\text{L}^\natural$convex function minimization have arbitrarily many optimal solutions, making such predictiondependent bounds arbitrarily large. To resolve this theoretically critical issue, we present a new warmstartwithprediction framework for $\text{L}$/$\text{L}^\natural$convex function minimization. Our framework offers time complexity bounds proportional to the distance between a prediction and the set of all optimal solutions. The main technical difficulty lies in learning predictions that are provably close to sets of all optimal solutions, for which we present an onlinegradientdescentbased method. We thus give the first polynomialtime learnability of predictions that can provably warmstart algorithms regardless of multiple optimal solutions.
 [38] arXiv:2302.00932 [pdf, other]

Title: Dynamic Ensemble of Lowfidelity Experts: Mitigating NAS "ColdStart"Authors: Junbo Zhao, Xuefei Ning, Enshu Liu, Binxin Ru, Zixuan Zhou, Tianchen Zhao, Chen Chen, Jiajin Zhang, Qingmin Liao, Yu WangSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Predictorbased Neural Architecture Search (NAS) employs an architecture performance predictor to improve the sample efficiency. However, predictorbased NAS suffers from the severe ``coldstart'' problem, since a large amount of architectureperformance data is required to get a working predictor. In this paper, we focus on exploiting information in cheapertoobtain performance estimations (i.e., lowfidelity information) to mitigate the large data requirements of predictor training. Despite the intuitiveness of this idea, we observe that using inappropriate lowfidelity information even damages the prediction ability and different search spaces have different preferences for lowfidelity information types. To solve the problem and better fuse beneficial information provided by different types of lowfidelity information, we propose a novel dynamic ensemble predictor framework that comprises two steps. In the first step, we train different subpredictors on different types of available lowfidelity information to extract beneficial knowledge as lowfidelity experts. In the second step, we learn a gating network to dynamically output a set of weighting coefficients conditioned on each input neural architecture, which will be used to combine the predictions of different lowfidelity experts in a weighted sum. The overall predictor is optimized on a small set of actual architectureperformance data to fuse the knowledge from different lowfidelity experts to make the final prediction. We conduct extensive experiments across five search spaces with different architecture encoders under various experimental settings. Our method can easily be incorporated into existing predictorbased NAS frameworks to discover better architectures.
 [39] arXiv:2302.00938 [pdf, other]

Title: An Enhanced Vcycle MgNet Model for Operator Learning in Numerical Partial Differential EquationsComments: 21 pages, 6 figures, 6 tablesSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
This study used a multigridbased convolutional neural network architecture known as MgNet in operator learning to solve numerical partial differential equations (PDEs). Given the property of smoothing iterations in multigrid methods where lowfrequency errors decay slowly, we introduced a lowfrequency correction structure for residuals to enhance the standard Vcycle MgNet. The enhanced MgNet model can capture the lowfrequency features of solutions considerably better than the standard Vcycle MgNet. The numerical results obtained using some standard operator learning tasks are better than those obtained using many stateoftheart methods, demonstrating the efficiency of our model.Moreover, numerically, our new model is more robust in case of low and highresolution data during training and testing, respectively.
 [40] arXiv:2302.00942 [pdf, other]

Title: Efficient Graph Field Integrators Meet Point CloudsAuthors: Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian WellerSubjects: Machine Learning (cs.LG)
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilonnearestneighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Methods (FMMs), which have had a tremendous impact on efficient integration, but for nonEuclidean spaces. We focus on geometries induced by distributions of walk lengths between points (e.g., shortestpath distance). We provide an extensive theoretical analysis of our algorithms, obtaining new results in structural graph theory as a byproduct. We also perform exhaustive empirical evaluation, including onsurface interpolation for rigid and deformable objects (particularly for meshdynamics modeling), Wasserstein distance computations for point clouds, and the GromovWasserstein variant.
 [41] arXiv:2302.00956 [pdf, other]

Title: Resilient Binary Neural NetworkComments: AAAI 2023 OralSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Binary neural networks (BNNs) have received everincreasing popularity for their great capability of reducing storage burden as well as quickening inference time. However, there is a severe performance drop compared with {realvalued} networks, due to its intrinsic frequent weight oscillation during training. In this paper, we introduce a Resilient Binary Neural Network (ReBNN) to mitigate the frequent oscillation for better BNNs' training. We identify that the weight oscillation mainly stems from the nonparametric scaling factor. To address this issue, we propose to parameterize the scaling factor and introduce a weighted reconstruction loss to build an adaptive training objective. %To the best of our knowledge, it is the first work to solve BNNs based on a dynamically reweighted loss function. For the first time, we show that the weight oscillation is controlled by the balanced parameter attached to the reconstruction loss, which provides a theoretical foundation to parameterize it in back propagation. Based on this, we learn our ReBNN by {calculating} the {balanced} parameter {based on} its maximum magnitude, which can effectively mitigate the weight oscillation with a resilient training process. Extensive experiments are conducted upon various network models, such as ResNet and FasterRCNN for computer vision, as well as BERT for natural language processing. The results demonstrate the overwhelming performance of our ReBNN over prior arts. For example, our ReBNN achieves 66.9\% Top1 accuracy with ResNet18 backbone on the ImageNet dataset, surpassing existing stateofthearts by a significant margin. Our code is opensourced at https://github.com/SteveTsui/ReBNN.
 [42] arXiv:2302.00967 [pdf, other]

Title: Energy Efficiency of Training Neural Network Architectures: An Empirical StudyComments: Accepted in HICSS 2023. For its published version refer to the Proceedings of the 56th Hawaii International Conference on System Sciences; URI this https URLJournalref: Proceedings of the 56th Hawaii International Conference on System Sciences, pp. 781790 (2023)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
The evaluation of Deep Learning models has traditionally focused on criteria such as accuracy, F1 score, and related measures. The increasing availability of high computational power environments allows the creation of deeper and more complex models. However, the computations needed to train such models entail a large carbon footprint. In this work, we study the relations between DL model architectures and their environmental impact in terms of energy consumed and CO$_2$ emissions produced during training by means of an empirical study using Deep Convolutional Neural Networks. Concretely, we study: (i) the impact of the architecture and the location where the computations are hosted on the energy consumption and emissions produced; (ii) the tradeoff between accuracy and energy efficiency; and (iii) the difference on the method of measurement of the energy consumed using softwarebased and hardwarebased tools.
 [43] arXiv:2302.00981 [pdf, other]

Title: Predicting MoleculeTarget Interaction by Learning Biomedical Network and Molecule RepresentationsComments: 9 pages, 6 figures. arXiv admin note: substantial text overlap with arXiv:2102.01649Subjects: Machine Learning (cs.LG)
The study of moleculetarget interaction is quite important for drug discovery in terms of target identification, pathway study, drugdrug interaction, etc. Most existing methodologies utilize either biomedical network information or molecule structural features to predict potential interaction link. However, the biomedical network information based methods usually suffer from cold start problem, while structure based methods often give limited performance due to the structure/interaction assumption and data quality. To address these issues, we propose a pseudosiamese Graph Neural Network method, namely MTINet+, which learns both biomedical network topological and molecule structural/chemical information as representations to predict potential interaction of given molecule and target pair. In MTINet+, 1hop subgraphs of given molecule and target pair are extracted from known interaction of biomedical network as topological information, meanwhile the molecule structural and chemical attributes are processed as molecule information. MTINet+ learns these two types of information as embedding features for predicting the pair link. In the experiments of different moleculetarget interaction tasks, MTINet+ significantly outperforms over the stateoftheart baselines. In addition, in our designed network sparsity experiments , MTINet+ shows strong robustness against different sparse biomedical networks.
 [44] arXiv:2302.00997 [pdf, ps, other]

Title: Constrained Online Twostage Stochastic Optimization: New Algorithms via Adversarial LearningAuthors: Jiashuo JiangSubjects: Machine Learning (cs.LG)
We consider an online twostage stochastic optimization with longterm constraints over a finite horizon of $T$ periods. At each period, we take the firststage action, observe a model parameter realization and then take the secondstage action from a feasible set that depends both on the firststage decision and the model parameter. We aim to minimize the cumulative objective value while guaranteeing that the longterm average secondstage decision belongs to a set. We propose a general algorithmic framework that derives online algorithms for the online twostage problem from adversarial learning algorithms. Also, the regret bound of our algorithm cam be reduced to the regret bound of embedded adversarial learning algorithms. Based on our framework, we obtain new results under various settings. When the model parameter at each period is drawn from identical distributions, we derive stateofart regret bound that improves previous bounds under special cases. Our algorithm is also robust to adversarial corruptions of model parameter realizations. When the model parameters are drawn from unknown nonstationary distributions and we are given prior estimates of the distributions, we develop a new algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$ measures the total inaccuracy of the prior estimates.
 [45] arXiv:2302.01018 [pdf, other]

Title: Graph Neural Networks for temporal graphs: State of the art, open challenges, and opportunitiesAuthors: Antonio Longa, Veronica Lachi, Gabriele Santin, Monica Bianchini, Bruno Lepri, Pietro Lio, Franco Scarselli, Andrea PasseriniSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Graph Neural Networks (GNNs) have become the leading paradigm for learning on (static) graphstructured data. However, many realworld systems are dynamic in nature, since the graph and node/edge attributes change over time. In recent years, GNNbased models for temporal graphs have emerged as a promising area of research to extend the capabilities of GNNs. In this work, we provide the first comprehensive overview of the current stateoftheart of temporal GNN, introducing a rigorous formalization of learning settings and tasks and a novel taxonomy categorizing existing approaches in terms of how the temporal aspect is represented and processed. We conclude the survey with a discussion of the most relevant open challenges for the field, from both research and application perspectives.
 [46] arXiv:2302.01020 [pdf, ps, other]

Title: Meta Learning in Decentralized Neural Networks: Towards More General AIAuthors: Yuwei SunComments: Accepted for AAAI 2023 workshopSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Metalearning usually refers to a learning algorithm that learns from other learning algorithms. The problem of uncertainty in the predictions of neural networks shows that the world is only partially predictable and a learned neural network cannot generalize to its everchanging surrounding environments. Therefore, the question is how a predictive model can represent multiple predictions simultaneously. We aim to provide a fundamental understanding of learning to learn in the contents of Decentralized Neural Networks (Decentralized NNs) and we believe this is one of the most important questions and prerequisites to building an autonomous intelligence machine. To this end, we shall demonstrate several pieces of evidence for tackling the problems above with Meta Learning in Decentralized NNs. In particular, we will present three different approaches to building such a decentralized learning system: (1) learning from many replica neural networks, (2) building the hierarchy of neural networks for different functions, and (3) leveraging different modality experts to learn crossmodal representations.
 [47] arXiv:2302.01029 [pdf, ps, other]

Title: On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation PerformanceAuthors: Guoqiang ZhangComments: 12 pages. arXiv admin note: substantial text overlap with arXiv:2203.13273Subjects: Machine Learning (cs.LG)
A number of recent adaptive optimizers improve the generalisation performance of Adam by essentially reducing the variance of adaptive stepsizes to get closer to SGD with momentum. Following the above motivation, we suppress the range of the adaptive stepsizes of Adam by exploiting the layerwise gradient statistics. In particular, at each iteration, we propose to perform three consecutive operations on the second momentum v_t before using it to update a DNN model: (1): downscaling, (2): epsilonembedding, and (3): downtranslating. The resulting algorithm is referred to as SETAdam, where SET is a brief notation of the three operations. The downscaling operation on v_t is performed layerwise by making use of the angles between the layerwise subvectors of v_t and the corresponding allone subvectors. Extensive experimental results show that SETAdam outperforms eight adaptive optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100 while matching the best performance of the eight adaptive methods when training WGANGP models for image generation tasks. Furthermore, SETAdam produces higher validation accuracies than Adam and AdaBelief for training ResNet18 over ImageNet.
 [48] arXiv:2302.01047 [pdf, other]

Title: RealTime Evaluation in Online Continual Learning: A New ParadigmAuthors: Yasir Ghunaim, Adel Bibi, Kumail Alhamoud, Motasem Alfarra, Hasan Abed Al Kader Hammoud, Ameya Prabhu, Philip H. S. Torr, Bernard GhanemSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Current evaluations of Continual Learning (CL) methods typically assume that there is no constraint on training time and computation. This is an unrealistic assumption for any realworld setting, which motivates us to propose: a practical realtime evaluation of continual learning, in which the stream does not wait for the model to complete training before revealing the next data for predictions. To do this, we evaluate current CL methods with respect to their computational costs. We hypothesize that under this new evaluation paradigm, computationally demanding CL approaches may perform poorly on streams with a varying distribution. We conduct extensive experiments on CLOC, a largescale dataset containing 39 million timestamped images with geolocation labels. We show that a simple baseline outperforms stateoftheart CL methods under this evaluation, questioning the applicability of existing methods in realistic settings. In addition, we explore various CL components commonly used in the literature, including memory sampling strategies and regularization approaches. We find that all considered methods fail to be competitive against our simple baseline. This surprisingly suggests that the majority of existing CL literature is tailored to a specific class of streams that is not practical. We hope that the evaluation we provide will be the first step towards a paradigm shift to consider the computational cost in the development of online continual learning methods.
 [49] arXiv:2302.01052 [pdf, other]

Title: Sitespecific Deep Learning Path Loss Models based on the Method of MomentsComments: EuCAP 2023Subjects: Machine Learning (cs.LG)
This paper describes deep learning models based on convolutional neural networks applied to the problem of predicting EM wave propagation over rural terrain. A surface integral equation formulation, solved with the method of moments and accelerated using the Fast Far Field approximation, is used to generate synthetic training data which comprises path loss computed over randomly generated 1D terrain profiles. These are used to train two networks, one based on fractal profiles and one based on profiles generated using a Gaussian process. The models show excellent agreement when applied to test profiles generated using the same statistical process used to create the training data and very good accuracy when applied to real life problems.
 [50] arXiv:2302.01068 [pdf, other]

Title: FedGLOSSDP: Federated, Global Learning using Synthetic Sets with Record Level Differential PrivacySubjects: Machine Learning (cs.LG)
This work proposes FedGLOSSDP, a novel approach to privacypreserving learning that uses synthetic data to train federated models. In our approach, the server recovers an approximation of the global loss landscape in a local neighborhood based on synthetic samples received from the clients. In contrast to previous, pointwise, gradientbased, linear approximation (such as FedAvg), our formulation enables a type of global optimization that is particularly beneficial in nonIID federated settings. We also present how it rigorously complements recordlevel differential privacy. Extensive results show that our novel formulation gives rise to considerable improvements in terms of convergence speed and communication costs. We argue that our new approach to federated learning can provide a potential path toward reconciling privacy and accountability by sending differentially private, synthetic data instead of gradient updates. The source code will be released upon publication.
 [51] arXiv:2302.01079 [pdf, other]

Title: Uncertainty in Fairness Assessment: Maintaining Stable Conclusions Despite FluctuationsComments: 25 pages (including references and appendix), 10 figures. Submitted to ICML 2023Subjects: Machine Learning (cs.LG)
Several recent works encourage the use of a Bayesian framework when assessing performance and fairness metrics of a classification algorithm in a supervised setting. We propose the Uncertainty Matters (UM) framework that generalizes a BetaBinomial approach to derive the posterior distribution of any criteria combination, allowing stable performance assessment in a biasaware setting.We suggest modeling the confusion matrix of each demographic group using a Multinomial distribution updated through a Bayesian procedure. We extend UM to be applicable under the popular Kfold crossvalidation procedure. Experiments highlight the benefits of UM over classical evaluation frameworks regarding informativeness and stability.
 [52] arXiv:2302.01094 [pdf, other]

Title: Confidence and Dispersity Speak: Characterising Prediction Matrix for Unsupervised Accuracy EstimationComments: This version is not fully edited and will be updated soonSubjects: Machine Learning (cs.LG)
This work aims to assess how well a model performs under distribution shifts without using labels. While recent methods study prediction confidence, this work reports prediction dispersity is another informative cue. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a wellperforming model should give predictions with high confidence and high dispersity. That is, we need to consider both properties so as to make more accurate estimates. To this end, we use the nuclear norm that has been shown to be effective in characterizing both properties. Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., ViT and ConvNeXt), different datasets (e.g., ImageNet and CUB200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). We show that the nuclear norm is more accurate and robust in accuracy estimation than existing methods. Furthermore, we validate the feasibility of other measurements (e.g., mutual information maximization) for characterizing dispersity and confidence. Lastly, we investigate the limitation of the nuclear norm, study its improved variant under severe class imbalance, and discuss potential directions.
 [53] arXiv:2302.01098 [pdf, other]

Title: A general Markov decision process formalism for actionstate entropyregularized reward maximizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Previous work has separately addressed different forms of action, state and actionstate entropy regularization, pure exploration and space occupation. These problems have become extremely relevant for regularization, generalization, speeding up learning and providing robust solutions at unprecedented levels. However, solutions of those problems are hectic, ranging from convex and nonconvex optimization, and unconstrained optimization to constrained optimization. Here we provide a general dual function formalism that transforms the constrained optimization problem into an unconstrained convex one for any mixture of action and state entropies. The cases with pure action entropy and pure state entropy are understood as limits of the mixture.
 [54] arXiv:2302.01107 [pdf, other]

Title: A Survey on Efficient Training of TransformersComments: A brief reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources. This survey provides the first systematic overview of the efficient training of Transformers, covering the recent progress in acceleration arithmetic and hardware, with a focus on the former. We analyze and compare methods that save computation and memory costs for intermediate tensors during training, together with techniques on hardware/algorithm codesign. We finally discuss challenges and promising areas for future research.
 [55] arXiv:2302.01128 [pdf, other]

Title: Mnemosyne: Learning to Train Transformers with TransformersAuthors: Deepali Jain, Krzysztof Marcin Choromanski, Sumeet Singh, Vikas Sindhwani, Tingnan Zhang, Jie Tan, Avinava DubeySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Training complex machine learning (ML) architectures requires a compute and time consuming process of selecting the right optimizer and tuning its hyperparameters. A new paradigm of learning optimizers from data has emerged as a better alternative to handdesigned ML optimizers. We propose Mnemosyne optimizer, that uses Performers: implicit lowrank attention Transformers. It can learn to train entire neural network architectures including other Transformers without any taskspecific optimizer tuning. We show that Mnemosyne: (a) generalizes better than popular LSTM optimizer, (b) in particular can successfully train Vision Transformers (ViTs) while metatrained on standard MLPs and (c) can initialize optimizers for faster convergence in Robotics applications. We believe that these results open the possibility of using Transformers to build foundational optimization models that can address the challenges of regular Transformer training. We complement our results with an extensive theoretical analysis of the compact associative memory used by Mnemosyne.
 [56] arXiv:2302.01129 [pdf, other]

Title: De Novo Molecular Generation via Connectionaware Motif MiningAuthors: Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, TieYan LiuSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
De novo molecular generation is an essential task for science discovery. Recently, fragmentbased deep generative models have attracted much research attention due to their flexibility in generating novel molecules based on existing molecule fragments. However, the motif vocabulary, i.e., the collection of frequent fragments, is usually built upon heuristic rules, which brings difficulties to capturing common substructures from large amounts of molecules. In this work, we propose a new method, MiCaM, to generate molecules based on mined connectionaware motifs. Specifically, it leverages a datadriven algorithm to automatically discover motifs from a molecule library by iteratively merging subgraphs based on their frequency. The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information, indicating how the motifs are connected with each other. Based on the mined connectionaware motifs, MiCaM builds a connectionaware generator, which simultaneously picks up motifs and determines how they are connected. We test our method on distributionlearning benchmarks (i.e., generating novel molecules to resemble the distribution of a given training set) and goaldirected benchmarks (i.e., generating molecules with target properties), and achieve significant improvements over previous fragmentbased baselines. Furthermore, we demonstrate that our method can effectively mine domainspecific motifs for different tasks.
 [57] arXiv:2302.01155 [pdf, other]

Title: Deep COVID19 Forecasting for Multiple States with Data AugmentationSubjects: Machine Learning (cs.LG)
In this work, we propose a deep learning approach to forecasting statelevel COVID19 trends of weekly cumulative death in the United States (US) and incident cases in Germany. This approach includes a transformer model, an ensemble method, and a data augmentation technique for time series. We arrange the inputs of the transformer in such a way that predictions for different states can attend to the trends of the others. To overcome the issue of scarcity of training data for this COVID19 pandemic, we have developed a novel data augmentation technique to generate useful data for training. More importantly, the generated data can also be used for model validation. As such, it has a twofold advantage: 1) more actual observations can be used for training, and 2) the model can be validated on data which has distribution closer to the expected situation. Our model has achieved some of the best statelevel results on the COVID19 Forecast Hub for the US and for Germany.
 [58] arXiv:2302.01161 [pdf, other]

Title: Vectorized Scenario Description and Motion Prediction for ScenarioBased TestingComments: 6 pages, 7 figures, 3 tables, submitted to IEEE IV 2023Subjects: Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
Automated vehicles (AVs) are tested in diverse scenarios, typically specified by parameters such as velocities, distances, or curve radii. To describe scenarios uniformly independent of such parameters, this paper proposes a vectorized scenario description defined by the road geometry and vehicles' trajectories. Data of this form are generated for three scenarios, merged, and used to train the motion prediction model VectorNet, allowing to predict an AV's trajectory for unseen scenarios. Predicting scenario evaluation metrics, VectorNet partially achieves lower errors than regression models that separately process the three scenarios' data. However, for comprehensive generalization, sufficient variance in the training data must be ensured. Thus, contrary to existing methods, our proposed method can merge diverse scenarios' data and exploit spatial and temporal nuances in the vectorized scenario description. As a result, data from specified test scenarios and realworld scenarios can be compared and combined for (predictive) analyses and scenario selection.
 [59] arXiv:2302.01172 [pdf, other]

Title: STEP: Learning N:M Structured Sparsity Masks from Scratch with PreconditionAuthors: Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir YazdanbakhshSubjects: Machine Learning (cs.LG)
Recent innovations on hardware (e.g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference. However, stateoftheart learning recipes in this regime (e.g. SRSTE) are proposed for nonadaptive optimizers like momentum SGD, while incurring nontrivial accuracy drop for Adamtrained models like attentionbased LLMs. In this paper, we first demonstrate such gap origins from poorly estimated second moment (i.e. variance) in Adam states given by the masked weights. We conjecture that learning N:M masks with Adam should take the critical regime of variance estimation into account. In light of this, we propose STEP, an Adamaware recipe that learns N:M masks with two phases: first, STEP calculates a reliable variance estimate (precondition phase) and subsequently, the variance remains fixed and is used as a precondition to learn N:M masks (masklearning phase). STEP automatically identifies the switching point of two phases by dynamically sampling variance changes over the training trajectory and testing the sample concentration. Empirically, we evaluate STEP and other baselines such as ASP and SRSTE on multiple tasks including CIFAR classification, machine translation and LLM finetuning (BERTBase, GPT2). We show STEP mitigates the accuracy drop of baseline recipes and is robust to aggressive structured sparsity ratios.
 [60] arXiv:2302.01178 [pdf, other]

Title: Convolutional Neural OperatorsSubjects: Machine Learning (cs.LG)
Although very successfully used in machine learning, convolution based neural network architectures  believed to be inconsistent in function space  have been largely ignored in the context of learning solution operators of PDEs. Here, we adapt convolutional neural networks to demonstrate that they are indeed able to process functions as inputs and outputs. The resulting architecture, termed as convolutional neural operators (CNOs), is shown to significantly outperform competing models on benchmark experiments, paving the way for the design of an alternative robust and accurate framework for learning operators.
 [61] arXiv:2302.01186 [pdf, other]

Title: The Power of Preconditioning in Overparameterized LowRank Matrix SensingSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC); Machine Learning (stat.ML)
We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the lowrank matrix sensing problem when the true rank is unknown, and when the matrix is possibly illconditioned. Using overparametrized factor representations, $\textsf{ScaledGD($\lambda$)}$ starts from a small random initialization, and proceeds by gradient descent with a specific form of damped preconditioning to combat bad curvatures induced by overparameterization and illconditioning. At the expense of light computational overhead incurred by preconditioners, $\textsf{ScaledGD($\lambda$)}$ is remarkably robust to illconditioning compared to vanilla gradient descent ($\textsf{GD}$) even with overprameterization. Specifically, we show that, under the Gaussian design, $\textsf{ScaledGD($\lambda$)}$ converges to the true lowrank matrix at a constant linear rate after a small number of iterations that scales only logarithmically with respect to the condition number and the problem dimension. This significantly improves over the convergence rate of vanilla $\textsf{GD}$ which suffers from a polynomial dependency on the condition number. Our work provides evidence on the power of preconditioning in accelerating the convergence without hurting generalization in overparameterized learning.
 [62] arXiv:2302.01188 [pdf, other]

Title: Best Possible QLearningComments: 14 pagesSubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Fully decentralized learning, where the global information, i.e., the actions of other agents, is inaccessible, is a fundamental challenge in cooperative multiagent reinforcement learning. However, the convergence and optimality of most decentralized algorithms are not theoretically guaranteed, since the transition probabilities are nonstationary as all agents are updating policies simultaneously. To tackle this challenge, we propose best possible operator, a novel decentralized operator, and prove that the policies of agents will converge to the optimal joint policy if each agent independently updates its individual stateaction value by the operator. Further, to make the update more efficient and practical, we simplify the operator and prove that the convergence and optimality still hold with the simplified one. By instantiating the simplified operator, the derived fully decentralized algorithm, best possible Qlearning (BQL), does not suffer from nonstationarity. Empirically, we show that BQL achieves remarkable improvement over baselines in a variety of cooperative multiagent tasks.
 [63] arXiv:2302.01189 [pdf, other]

Title: Reinforcement learningbased estimation for partial differential equationsComments: 21 pages, 16 figuresSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
In systems governed by nonlinear partial differential equations such as fluid flows, the design of state estimators such as Kalman filters relies on a reducedorder model (ROM) that projects the original highdimensional dynamics onto a computationally tractable lowdimensional space. However, ROMs are prone to large errors, which negatively affects the performance of the estimator. Here, we introduce the reinforcement learning reducedorder estimator (RLROE), a ROMbased estimator in which the correction term that takes in the measurements is given by a nonlinear policy trained through reinforcement learning. The nonlinearity of the policy enables the RLROE to compensate efficiently for errors of the ROM, while still taking advantage of the imperfect knowledge of the dynamics. Using examples involving the Burgers and NavierStokes equations, we show that in the limit of very few sensors, the trained RLROE outperforms a Kalman filter designed using the same ROM. Moreover, it yields accurate highdimensional state estimates for reference trajectories corresponding to various physical parameter values, without direct knowledge of the latter.
 [64] arXiv:2302.01193 [pdf, other]

Title: Imitating careful experts to avoid catastrophic eventsComments: 9 pages, 8 figures, accepted to NeurIPS 2022 Workshop on Robot Learning: Trustworthy RoboticsSubjects: Machine Learning (cs.LG); Robotics (cs.RO)
RL is increasingly being used to control robotic systems that interact closely with humans. This interaction raises the problem of safe RL: how to ensure that a RLcontrolled robotic system never, for instance, injures a human. This problem is especially challenging in rich, realistic settings where it is not even possible to clearly write down a reward function which incorporates these outcomes. In these circumstances, perhaps the only viable approach is based on IRL, which infers rewards from human demonstrations. However, IRL is massively underdetermined as many different rewards can lead to the same optimal policies; we show that this makes it difficult to distinguish catastrophic outcomes (such as injuring a human) from merely undesirable outcomes. Our key insight is that humans do display different behaviour when catastrophic outcomes are possible: they become much more careful. We incorporate carefulness signals into IRL, and find that they do indeed allow IRL to disambiguate undesirable from catastrophic outcomes, which is critical to ensuring safety in future realworld humanrobot interactions.
 [65] arXiv:2302.01198 [pdf, other]

Title: Causal Lifting and Link PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)
Current stateoftheart causal models for link prediction assume an underlying set of inherent node factors  an innate characteristic defined at the node's birth  that governs the causal evolution of links in the graph. In some causal tasks, however, link formation is pathdependent, i.e., the outcome of link interventions depends on existing links. For instance, in the customerproduct graph of an online retailer, the effect of an 85inch TV ad (treatment) likely depends on whether the costumer already has an 85inch TV. Unfortunately, existing causal methods are impractical in these scenarios. The cascading functional dependencies between links (due to path dependence) are either unidentifiable or require an impractical number of control variables. In order to remedy this shortcoming, this work develops the first causal model capable of dealing with path dependencies in link prediction. It introduces the concept of causal lifting, an invariance in causal models that, when satisfied, allows the identification of causal link prediction queries using limited interventional data. On the estimation side, we show how structural pairwise embeddings  a type of symmetrybased joint representation of node pairs in a graph  exhibit lower bias and correctly represent the causal structure of the task, as opposed to existing node embedding methods, e.g., GNNs and matrix factorization. Finally, we validate our theoretical findings on four datasets under three different scenarios for causal link prediction tasks: knowledge base completion, covariance matrix estimation and consumerproduct recommendations.
 [66] arXiv:2302.01204 [pdf, other]

Title: Laplacian Change Point Detection for Single and Multiview Dynamic GraphsComments: 30 pages, 15 figures, extended version of previous paper "Laplacian Change Point Detection for Dynamic Graphs" with novel material. arXiv admin note: substantial text overlap with arXiv:2007.01229Subjects: Machine Learning (cs.LG)
Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly detection in temporal graphs is crucial for many real world applications such as intrusion identification in network systems, detection of ecosystem disturbances and detection of epidemic outbreaks. In this paper, we focus on change point detection in dynamic graphs and address three main challenges associated with this problem: i). how to compare graph snapshots across time, ii). how to capture temporal dependencies, and iii). how to combine different views of a temporal graph. To solve the above challenges, we first propose Laplacian Anomaly Detection (LAD) which uses the spectrum of graph Laplacian as the low dimensional embedding of the graph structure at each snapshot. LAD explicitly models short term and long term dependencies by applying two sliding windows. Next, we propose MultiLAD, a simple and effective generalization of LAD to multiview graphs. MultiLAD provides the first change point detection method for multiview dynamic graphs. It aggregates the singular values of the normalized graph Laplacian from different views through the scalar power mean operation. Through extensive synthetic experiments, we show that i). LAD and MultiLAD are accurate and outperforms stateoftheart baselines and their multiview extensions by a large margin, ii). MultiLAD's advantage over contenders significantly increases when additional views are available, and iii). MultiLAD is highly robust to noise from individual views. In five real world dynamic graphs, we demonstrate that LAD and MultiLAD identify significant events as top anomalies such as the implementation of government COVID19 interventions which impacted the population mobility in multiview traffic networks.
 [67] arXiv:2302.01222 [pdf, other]

Title: Temporal fusion transformer using variational mode decomposition for wind power forecastingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The power output of a wind turbine depends on a variety of factors, including wind speed at different heights, wind direction, temperature and turbine properties. Wind speed and direction, in particular, have complex cycles and fluctuate dramatically, leading to large uncertainties in wind power output. This study uses variational mode decomposition (VMD) to decompose the wind power series and Temporal fusion transformer (TFT) to forecast wind power for the next 1h, 3h and 6h. The experimental results show that VMD outperforms other decomposition algorithms and the TFT model outperforms other decomposition models.
 [68] arXiv:2302.01223 [pdf, ps, other]

Title: Practical Bandits: An Industry PerspectiveComments: Tutorial held at The Web Conference 2023 (formerly known as WWW) in Austin, Texas (USA), on April 30  May 4, 2023Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
The bandit paradigm provides a unified modeling framework for problems that require decisionmaking under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that result from actions, bandit algorithms have seen a large and growing interest from industrial applications, such as search, recommendation and advertising. Indeed, with the bandit lens comes the promise of direct optimisation for the metrics we care about.
Nevertheless, the road to successfully applying bandits in production is not an easy one. Even when the action space and rewards are welldefined, practitioners still need to make decisions regarding multiarm or contextual approaches, on or offpolicy setups, delayed or immediate feedback, myopic or longterm optimisation, etc. To make matters worse, industrial platforms typically give rise to large action spaces in which existing approaches tend to break down. The research literature on these topics is broad and vast, but this can overwhelm practitioners, whose primary aim is to solve practical problems, and therefore need to decide on a specific instantiation or approach for each project. This tutorial will take a step towards filling that gap between the theory and practice of bandits. Our goal is to present a unified overview of the field and its existing terminology, concepts and algorithms  with a focus on problems relevant to industry. We hope our industrial perspective will help future practitioners who wish to leverage the bandit paradigm for their application.  [69] arXiv:2302.01228 [pdf, other]

Title: Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic NeuronsSubjects: Machine Learning (cs.LG)
Activity difference based learning algorithmssuch as contrastive Hebbian learning and equilibrium propagationhave been proposed as biologically plausible alternatives to error backpropagation. However, on traditional digital chips these algorithms suffer from having to solve a costly inference problem twice, making these approaches more than two orders of magnitude slower than backpropagation. In the analog realm equilibrium propagation may be promising for fast and energy efficient learning, but states still need to be inferred and stored twice. Inspired by lifted neural networks and compartmental neuron models we propose a simple energy based compartmental neuron model, termed dual propagation, in which each neuron is a dyad with two intrinsic states. At inference time these intrinsic states encode the error/activity duality through their difference and their mean respectively. The advantage of this method is that only a single inference phase is needed and that inference can be solved in layerwise closedform. Experimentally we show on common computer vision datasets, including Imagenet32x32, that dual propagation performs equivalently to backpropagation both in terms of accuracy and runtime.
 [70] arXiv:2302.01242 [pdf, other]

Title: Neuro Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept RehearsalAuthors: Emanuele Marconato, Gianpaolo Bontempo, Elisa Ficarra, Simone Calderara, Andrea Passerini, Stefano TesoSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We introduce NeuroSymbolic Continual Learning, where a model has to solve a sequence of neurosymbolic tasks, that is, it has to map subsymbolic inputs to highlevel concepts and compute predictions by reasoning consistently with prior knowledge. Our key observation is that neurosymbolic tasks, although different, often share concepts whose semantics remains stable over time. Traditional approaches fall short: existing continual strategies ignore knowledge altogether, while stock neurosymbolic architectures suffer from catastrophic forgetting. We show that leveraging prior knowledge by combining neurosymbolic architectures with continual strategies does help avoid catastrophic forgetting, but also that doing so can yield models affected by reasoning shortcuts. These undermine the semantics of the acquired concepts, even when detailed prior knowledge is provided upfront and inference is exact, and in turn continual performance. To overcome these issues, we introduce COOL, a COnceptlevel cOntinual Learning strategy tailored for neurosymbolic continual problems that acquires highquality concepts and remembers them over time. Our experiments on three novel benchmarks highlights how COOL attains sustained high performance on neurosymbolic continual learning tasks in which other strategies fail.
 [71] arXiv:2302.01244 [pdf, ps, other]

Title: Is Model Ensemble Necessary? Modelbased RL via a Single Model with Lipschitz Regularized Value FunctionComments: ICLR 2023Subjects: Machine Learning (cs.LG)
Probabilistic dynamics model ensemble is widely used in existing modelbased reinforcement learning methods as it outperforms a single dynamics model in both asymptotic performance and sample efficiency. In this paper, we provide both practical and theoretical insights on the empirical success of the probabilistic dynamics model ensemble through the lens of Lipschitz continuity. We find that, for a value function, the stronger the Lipschitz condition is, the smaller the gap between the true dynamics and learned dynamicsinduced Bellman operators is, thus enabling the converged value function to be closer to the optimal value function. Hence, we hypothesize that the key functionality of the probabilistic dynamics model ensemble is to regularize the Lipschitz condition of the value function using generated samples. To test this hypothesis, we devise two practical robust training mechanisms through computing the adversarial noise and regularizing the value network's spectral norm to directly regularize the Lipschitz condition of the value functions. Empirical results show that combined with our mechanisms, modelbased RL algorithms with a single dynamics model outperform those with an ensemble of probabilistic dynamics models. These findings not only support the theoretical insight, but also provide a practical solution for developing computationally efficient modelbased RL algorithms.
 [72] arXiv:2302.01255 [pdf, other]

Title: adSformers: Personalization from ShortTerm Sequences and Diversity of Representations in Etsy AdsAuthors: Alaa Awad, Denisa Roberts, Eden Dolev, Andrea Heyman, Zahra Ebrahimzadeh, Zoe Weil, Marcin Mejran, Vaibhav Malpani, Mahir YavuzSubjects: Machine Learning (cs.LG)
In this article, we present our approach to personalizing Etsy Ads through encoding and learning from shortterm (onehour) sequences of user actions and diverse representations. To this end we introduce a threecomponent adSformer diversifiable personalization module (ADPM) and illustrate how we use this module to derive a shortterm dynamic user representation and personalize the ClickThrough Rate (CTR) and PostClick Conversion Rate (PCCVR) models used in sponsored search (ad) ranking. The first component of the ADPM is a custom transformer encoder that learns the inherent structure from the sequence of actions. ADPM's second component enriches the signal through visual, multimodal and textual pretrained representations. Lastly, the third ADPM component includes a "learned" on the fly average pooled representation. The ADPMpersonalized CTR and PCCVR models, henceforth referred to as adSformer CTR and adSformer PCCVR, outperform the CTR and PCCVR production baselines by $+6.65\%$ and $+12.70\%$, respectively, in offline PrecisionRecall Area Under the Curve (PR AUC). At the time of this writing, following the online gains in A/B tests, such as $+5.34\%$ in return on ad spend, a seller success metric, we are ramping up the adSformers to $100\%$ traffic in Etsy Ads.
 [73] arXiv:2302.01259 [pdf, other]

Title: Geometric Deep Learning for Autonomous Driving: Unlocking the Power of Graph Neural Networks With CommonRoadGeometricSubjects: Machine Learning (cs.LG)
Heterogeneous graphs offer powerful data representations for traffic, given their ability to model the complex interaction effects among a varying number of traffic participants and the underlying road infrastructure. With the recent advent of graph neural networks (GNNs) as the accompanying deep learning framework, the graph structure can be efficiently leveraged for various machine learning applications such as trajectory prediction. As a first of its kind, our proposed Python framework offers an easytouse and fully customizable data processing pipeline to extract standardized graph datasets from traffic scenarios. Providing a platform for GNNbased autonomous driving research, it improves comparability between approaches and allows researchers to focus on model implementation instead of dataset curation.
 [74] arXiv:2302.01275 [pdf, other]

Title: ReLOAD: Reinforcement Learning with Optimistic AscentDescent for LastIterate Convergence in Constrained MDPsAuthors: Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom ZahavySubjects: Machine Learning (cs.LG)
In recent years, Reinforcement Learning (RL) has been applied to realworld problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient descentascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee lastiterate convergence, i.e., the current policy of the agent may never converge to the optimal solution. In practice, it is often observed that the policy alternates between satisfying the constraints and maximizing the reward, rarely accomplishing both objectives simultaneously. Here, we address this problem by introducing Reinforcement Learning with Optimistic AscentDescent (ReLOAD), a principled CRL method with guaranteed lastiterate convergence. We demonstrate its empirical effectiveness on a wide variety of CRL problems including discrete MDPs and continuous control. In the process we establish a benchmark of challenging CRL problems.
 [75] arXiv:2302.01301 [pdf, ps, other]

Title: MARLIN: Soft ActorCritic based Reinforcement Learning for Congestion Control in Real NetworksComments: 10 pages, 5 figures, AAAI 2023 workshop "Reinforcement Learning Ready for Production", accepted at NOMS 2023  IEEE/IFIP Network Operations and Management SymposiumSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Fast and efficient transport protocols are the foundation of an increasingly distributed world. The burden of continuously delivering improved communication performance to support nextgeneration applications and services, combined with the increasing heterogeneity of systems and network technologies, has promoted the design of Congestion Control (CC) algorithms that perform well under specific environments. The challenge of designing a generic CC algorithm that can adapt to a broad range of scenarios is still an open research question. To tackle this challenge, we propose to apply a novel Reinforcement Learning (RL) approach. Our solution, MARLIN, uses the Soft ActorCritic algorithm to maximize both entropy and return and models the learning process as an infinitehorizon task. We trained MARLIN on a real network with varying background traffic patterns to overcome the simtoreal mismatch that researchers have encountered when applying RL to CC. We evaluated our solution on the task of file transfer and compared it to TCP Cubic. While further research is required, results have shown that MARLIN can achieve comparable results to TCP with little hyperparameter tuning, in a task significantly different from its training setting. Therefore, we believe that our work represents a promising first step toward building CC algorithms based on the maximum entropy RL framework.
 [76] arXiv:2302.01312 [pdf, other]

Title: Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty ModelingSubjects: Machine Learning (cs.LG)
In this work, we demonstrate how to reliably estimate epistemic uncertainty while maintaining the flexibility needed to capture complicated aleatoric distributions. To this end, we propose an ensemble of Normalizing Flows (NF), which are stateoftheart in modeling aleatoric uncertainty. The ensembles are created via sets of fixed dropout masks, making them less expensive than creating separate NF models. We demonstrate how to leverage the unique structure of NFs, base distributions, to estimate aleatoric uncertainty without relying on samples, provide a comprehensive set of baselines, and derive unbiased estimates for differential entropy. The methods were applied to a variety of experiments, commonly used to benchmark aleatoric and epistemic uncertainty estimation: 1D sinusoidal data, 2D windy gridworld ($\it{Wet Chicken}$), $\it{Pendulum}$, and $\it{Hopper}$. In these experiments, we setup an active learning framework and evaluate each model's capability at measuring aleatoric and epistemic uncertainty. The results show the advantages of using NF ensembles in capturing complicated aleatoric while maintaining accurate epistemic uncertainty estimates.
 [77] arXiv:2302.01313 [pdf, other]

Title: Double Permutation Equivariance for Knowledge Graph CompletionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
This work provides a formalization of Knowledge Graphs (KGs) as a new class of graphs that we denote doubly exchangeable attributed graphs, where node and pairwise (joint 2node) representations must be equivariant to permutations of both node ids and edge (& node) attributes (relations & node features). Doublepermutation equivariant KG representations open a new research direction in KGs. We show that this equivariance imposes a structural representation of relations that allows neural networks to perform complex logical reasoning tasks in KGs. Finally, we introduce a general blueprint for such equivariant representations and test a simple GNNbased doublepermutation equivariant neural architecture that achieve 100% Hits@10 test accuracy in both the WN18RRv1 and NELL995v1 inductive KG completion tasks, and can accurately perform logical reasoning tasks that no existing methods can perform, to the best of our knowledge.
 [78] arXiv:2302.01324 [pdf, other]

Title: Randomized Greedy Learning for Nonmonotone Stochastic Submodular Maximization Under Fullbandit FeedbackSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
We investigate the problem of unconstrained combinatorial multiarmed bandits with fullbandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a $\frac{1}{2}$regret upper bound of $\tilde{\mathcal{O}}(n T^{\frac{2}{3}})$ for horizon $T$ and number of arms $n$. We also show in experiments that RGL empirically outperforms other fullbandit variants in submodular and nonsubmodular settings.
 [79] arXiv:2302.01326 [pdf, other]

Title: Federated Analytics: A surveyAuthors: Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Shanshan Han, Shantanu Sharma, Chaoyang He, Sharad Mehrotra, Salman AvestimehrComments: To appear in APSIPA Transactions on Signal and Information Processing, Volume 12, Issue 1Journalref: APSIPA Transactions on Signal and Information Processing, Volume 12, Issue 1, 2023Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Federated analytics (FA) is a privacypreserving framework for computing data analytics over multiple remote parties (e.g., mobile devices) or siloed institutional entities (e.g., hospitals, banks) without sharing the data among parties. Motivated by the practical use cases of federated analytics, we follow a systematic discussion on federated analytics in this article. In particular, we discuss the unique characteristics of federated analytics and how it differs from federated learning. We also explore a wide range of FA queries and discuss various existing solutions and potential use case applications for different FA queries.
 [80] arXiv:2302.01332 [pdf, other]

Title: Bayesian Metric Learning for Uncertainty Quantification in Image RetrievalComments: Code: this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
We propose the first Bayesian encoder for metric learning. Rather than relying on neural amortization as done in prior works, we learn a distribution over the network weights with the Laplace Approximation. We actualize this by first proving that the contrastive loss is a valid logposterior. We then propose three methods that ensure a positive definite Hessian. Lastly, we present a novel decomposition of the Generalized GaussNewton approximation. Empirically, we show that our Laplacian Metric Learner (LAM) estimates wellcalibrated uncertainties, reliably detects outofdistribution examples, and yields stateoftheart predictive performance.
 [81] arXiv:2302.01333 [pdf, other]

Title: Lower Bounds for Learning in Revealing POMDPsSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging \emph{partially observable} setting. While it is wellestablished that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the \emph{revealing condition}  A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds.
We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a multiplicative fashion, and achieve significantly smaller gaps against the current best upper bounds, providing a solid starting point for future studies. In particular, for \emph{multistep} revealing POMDPs, we show that (1) the latent statespace dependence is at least $\Omega(S^{1.5})$ in the PAC sample complexity, which is notably harder than the $\widetilde{\Theta}(S)$ scaling for fullyobservable MDPs; (2) Any polynomial sublinear regret is at least $\Omega(T^{2/3})$, suggesting its fundamental difference from the \emph{singlestep} case where $\widetilde{O}(\sqrt{T})$ regret is achievable. Technically, our hard instance construction adapts techniques in \emph{distribution testing}, which is new to the RL literature and may be of independent interest.
Crosslists for Fri, 3 Feb 23
 [82] arXiv:1910.11621 (crosslist from cs.CL) [pdf, other]

Title: MetaLearning with DynamicMemoryBased Prototypical Network for FewShot Event DetectionComments: Accepted by WSDM 2020Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Event detection (ED), a subtask of event extraction, involves identifying triggers and categorizing event mentions. Existing methods primarily rely upon supervised learning and require largescale labeled event datasets which are unfortunately not readily available in many reallife applications. In this paper, we consider and reformulate the ED task with limited labeled data as a FewShot Learning problem. We propose a DynamicMemoryBased Prototypical Network (DMBPN), which exploits Dynamic Memory Network (DMN) to not only learn better prototypes for event types, but also produce more robust sentence encodings for event mentions. Differing from vanilla prototypical networks simply computing event prototypes by averaging, which only consume event mentions once, our model is more robust and is capable of distilling contextual information from event mentions for multiple times due to the multihop mechanism of DMNs. The experiments show that DMBPN not only deals with sample scarcity better than a series of baseline models but also performs more robustly when the variety of event types is relatively large and the instance quantity is extremely small.
 [83] arXiv:2105.10922 (crosslist from cs.IR) [pdf, other]

Title: OntoED: Lowresource Event Detection with Ontology EmbeddingAuthors: Shumin Deng, Ningyu Zhang, Luoqiu Li, Hui Chen, Huaixiao Tou, Mosha Chen, Fei Huang, Huajun ChenComments: Accepted to appear at the ACL 2021 main conference. Add the description of evaluation metricsSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Event Detection (ED) aims to identify event trigger words from a given text and classify it into an event type. Most of current methods to ED rely heavily on training instances, and almost ignore the correlation of event types. Hence, they tend to suffer from data scarcity and fail to handle new unseen event types. To address these problems, we formulate ED as a process of event ontology population: linking event instances to predefined event types in event ontology, and propose a novel ED framework entitled OntoED with ontology embedding. We enrich event ontology with linkages among event types, and further induce more eventevent correlations. Based on the event ontology, OntoED can leverage and propagate correlation knowledge, particularly from datarich to datapoor event types. Furthermore, OntoED can be applied to new unseen event types, by establishing linkages to existing ones. Experiments indicate that OntoED is more predominant and robust than previous approaches to ED, especially in datascarce scenarios.
 [84] arXiv:2209.15214 (crosslist from cs.AI) [pdf, other]

Title: Construction and Applications of BillionScale Pretrained Multimodal Business Knowledge GraphAuthors: Shumin Deng, Chengming Wang, Zhoubo Li, Ningyu Zhang, Zelin Dai, Hehong Chen, Feiyu Xiong, Ming Yan, Qiang Chen, Mosha Chen, Jiaoyan Chen, Jeff Z. Pan, Bryan Hooi, Huajun ChenComments: OpenBG. Work in ProgressSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Business Knowledge Graphs (KGs) are important to many enterprises today, providing factual knowledge and structured data that steer many products and make them more intelligent. Despite their promising benefits, building business KG necessitates solving prohibitive issues of deficient structure and multiple modalities. In this paper, we advance the understanding of the practical challenges related to building KG in nontrivial realworld systems. We introduce the process of building an open business knowledge graph (OpenBG) derived from a wellknown enterprise, Alibaba Group. Specifically, we define a core ontology to cover various abstract products and consumption demands, with finegrained taxonomy and multimodal facts in deployed applications. OpenBG is an open business KG of unprecedented scale: 2.6 billion triples with more than 88 million entities covering over 1 million core classes/concepts and 2,681 types of relations. We release all the open resources (OpenBG benchmarks) derived from it for the community and report experimental results of KGcentric tasks. We also run up an online competition based on OpenBG benchmarks, and has attracted thousands of teams. We further pretrain OpenBG and apply it to many KG enhanced downstream tasks in business scenarios, demonstrating the effectiveness of billionscale multimodal knowledge for ecommerce. All the resources with codes have been released at \url{https://github.com/OpenBGBenchmark/OpenBG}.
 [85] arXiv:2302.00735 (crosslist from cs.RO) [pdf, other]

Title: MTPGO: GraphBased Probabilistic MultiAgent Trajectory Prediction with Neural ODEsComments: Code: this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Enabling resilient autonomous motion planning requires robust predictions of surrounding road users' future behavior. In response to this need and the associated challenges, we introduce our model, titled MTPGO. The model encodes the scene using temporal graph neural networks to produce the inputs to an underlying motion model. The motion model is implemented using neural ordinary differential equations where the statetransition functions are learned with the rest of the model. Multimodal probabilistic predictions are provided by combining the concept of mixture density networks and Kalman filtering. The results illustrate the predictive capabilities of the proposed model across various data sets, outperforming several stateoftheart methods on a number of metrics.
 [86] arXiv:2302.00753 (crosslist from physics.compph) [pdf, other]

Title: Highprecision regressors for particle physicsComments: 15 pages, 7 figure and 2 tablesSubjects: Computational Physics (physics.compph); Machine Learning (cs.LG); High Energy Physics  Experiment (hepex); High Energy Physics  Phenomenology (hepph); Machine Learning (stat.ML)
Monte Carlo simulations of physics processes at particle colliders like the Large Hadron Collider at CERN take up a major fraction of the computational budget. For some simulations, a single data point takes seconds, minutes, or even hours to compute from first principles. Since the necessary number of data points per simulation is on the order of $10^9$  $10^{12}$, machine learning regressors can be used in place of physics simulators to significantly reduce this computational burden. However, this task requires highprecision regressors that can deliver data with relative errors of less than $1\%$ or even $0.1\%$ over the entire domain of the function. In this paper, we develop optimal training strategies and tune various machine learning regressors to satisfy the highprecision requirement. We leverage symmetry arguments from particle physics to optimize the performance of the regressors. Inspired by ResNets, we design a Deep Neural Network with skip connections that outperform fully connected Deep Neural Networks. We find that at lower dimensions, boosted decision trees far outperform neural networks while at higher dimensions neural networks perform significantly better. We show that these regressors can speed up simulations by a factor of $10^3$  $10^6$ over the firstprinciples computations currently used in Monte Carlo simulations. Additionally, using symmetry arguments derived from particle physics, we reduce the number of regressors necessary for each simulation by an order of magnitude. Our work can significantly reduce the training and storage burden of Monte Carlo simulations at current and future collider experiments.
 [87] arXiv:2302.00755 (crosslist from stat.ML) [pdf, other]

Title: Hierarchical shrinkage Gaussian processes: applications to computer code emulation and dynamical system recoverySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
In many areas of science and engineering, computer simulations are widely used as proxies for physical experiments, which can be infeasible or unethical. Such simulations can often be computationally expensive, and an emulator can be trained to efficiently predict the desired response surface. A widelyused emulator is the Gaussian process (GP), which provides a flexible framework for efficient prediction and uncertainty quantification. Standard GPs, however, do not capture structured sparsity on the underlying response surface, which is present in many applications, particularly in the physical sciences. We thus propose a new hierarchical shrinkage GP (HierGP), which incorporates such structure via cumulative shrinkage priors within a GP framework. We show that the HierGP implicitly embeds the wellknown principles of effect sparsity, heredity and hierarchy for analysis of experiments, which allows our model to identify structured sparse features from the response surface with limited data. We propose efficient posterior sampling algorithms for model training and prediction, and prove desirable consistency properties for the HierGP. Finally, we demonstrate the improved performance of HierGP over existing models, in a suite of numerical experiments and an application to dynamical system recovery.
 [88] arXiv:2302.00767 (crosslist from qbio.PE) [pdf, other]

Title: ImageNomer: developing an fMRI and omics visualization tool to detect racial bias in functional connectivityComments: 11 pagesSubjects: Populations and Evolution (qbio.PE); Machine Learning (cs.LG); Neurons and Cognition (qbio.NC)
It can be difficult to identify trends and perform quality control in large, highdimensional fMRI or omics datasets. To remedy this, we develop ImageNomer, a data visualization and analysis tool that allows inspection of both subjectlevel and cohortlevel features. The tool allows visualization of phenotype correlation with functional connectivity (FC), partial connectivity (PC), dictionary components (PCA and our own method), and genomic data (singlenucleotide polymorphisms, SNPs). In addition, it allows visualization of weights from arbitrary ML models. ImageNomer is built with a Python backend and a Vue frontend. We validate ImageNomer using the Philadelphia Neurodevelopmental Cohort (PNC) dataset, which contains multitask fMRI and SNP data of healthy adolescents. Using correlation, greedy selection, or model weights, we find that a set of 10 FC features can explain 15% of variation in age, compared to 35% for the full 34,716 feature model. The four most significant FCs are either between bilateral default mode network (DMN) regions or spatially proximal subcortical areas. Additionally, we show that whereas both FC (fMRI) and SNPs (genomic) features can account for 1015% of intelligence variation, this predictive ability disappears when controlling for race. We find that FC features can be used to predict race with 85% accuracy, compared to 78% accuracy for sex prediction. Using ImageNomer, this work casts doubt on the possibility of finding unbiased intelligencerelated features in fMRI and SNPs of healthy adolescents.
 [89] arXiv:2302.00773 (crosslist from cs.NE) [pdf, other]

Title: Neural Networks for Symbolic RegressionSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
Many realworld systems can be described by mathematical formulas that are humancomprehensible, easy to analyze and can be helpful in explaining the system's behaviour. Symbolic regression is a method that generates nonlinear models from data in the form of analytic expressions. Historically, symbolic regression has been predominantly realized using genetic programming, a method that iteratively evolves a population of candidate solutions that are sampled by genetic operators crossover and mutation. This gradientfree evolutionary approach suffers from several deficiencies: it does not scale well with the number of variables and samples in the training data, models tend to grow in size and complexity without an adequate accuracy gain, and it is hard to finetune the inner model coefficients using just genetic operators. Recently, neural networks have been applied to learn the whole analytic formula, i.e., its structure as well as the coefficients, by means of gradientbased optimization algorithms. We propose a novel neural networkbased symbolic regression method that constructs physically plausible models based on limited training data and prior knowledge about the system. The method employs an adaptive weighting scheme to effectively deal with multiple loss function terms and an epochwise learning process to reduce the chance of getting stuck in poor local optima. Furthermore, we propose a parameterfree method for choosing the model with the best interpolation and extrapolation performance out of all models generated through the whole learning process. We experimentally evaluate the approach on the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the antilock braking system. The results clearly show the potential of the method to find sparse and accurate models that comply with the prior knowledge provided.
 [90] arXiv:2302.00788 (crosslist from quantph) [pdf, other]

Title: Generative Modeling with Quantum NeuronsSubjects: Quantum Physics (quantph); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The recently proposed Quantum Neuron Born Machine (QNBM) has demonstrated quality initial performance as the first quantum generative machine learning (ML) model proposed with nonlinear activations. However, previous investigations have been limited in scope with regards to the model's learnability and simulatability. In this work, we make a considerable leap forward by providing an extensive deep dive into the QNBM's potential as a generative model. We first demonstrate that the QNBM's network representation makes it nontrivial to be classically efficiently simulated. Following this result, we showcase the model's ability to learn (express and train on) a wider set of probability distributions, and benchmark the performance against a classical Restricted Boltzmann Machine (RBM). The QNBM is able to outperform this classical model on all distributions, even for the most optimally trained RBM among our simulations. Specifically, the QNBM outperforms the RBM with an improvement factor of 75.3x, 6.4x, and 3.5x for the discrete Gaussian, cardinalityconstrained, and Bars and Stripes distributions respectively. Lastly, we conduct an initial investigation into the model's generalization capabilities and use a KL test to show that the model is able to approximate the ground truth probability distribution more closely than the training distribution when given access to a limited amount of data. Overall, we put forth a stronger case in support of using the QNBM for largerscale generative tasks.
 [91] arXiv:2302.00797 (crosslist from cs.AI) [pdf, other]

Title: Combining TreeSearch, Generative Models, and Nash Bargaining Concepts in GameTheoretic Reinforcement LearningAuthors: Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. WellmanSubjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Multiagent reinforcement learning (MARL) has benefited significantly from populationbased and gametheoretic training regimes. One approach, PolicySpace Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via metastrategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new metastrategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents ($N = 346$). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian coplayer prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.
 [92] arXiv:2302.00828 (crosslist from cs.AI) [pdf]

Title: Analysis of Biomass Sustainability Indicators from a Machine Learning PerspectiveSubjects: Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Plant biomass estimation is critical due to the variability of different environmental factors and crop management practices associated with it. The assessment is largely impacted by the accurate prediction of different environmental sustainability indicators. A robust model to predict sustainability indicators is a must for the biomass community. This study proposes a robust model for biomass sustainability prediction by analyzing sustainability indicators using machine learning models. The prospect of ensemble learning was also investigated to analyze the regression problem. All experiments were carried out on a crop residue data from the Ohio state. Ten machine learning models, namely, linear regression, ridge regression, multilayer perceptron, knearest neighbors, support vector machine, decision tree, gradient boosting, random forest, stacking and voting, were analyzed to estimate three biomass sustainability indicators, namely soil erosion factor, soil conditioning index, and organic matter factor. The performance of the model was assessed using crosscorrelation (R2), root mean squared error and mean absolute error metrics. The results showed that Random Forest was the best performing model to assess sustainability indicators. The analyzed model can now serve as a guide for assessing sustainability indicators in real time.
 [93] arXiv:2302.00833 (crosslist from cs.CV) [pdf, other]

Title: RobustNeRF: Ignoring Distractors with Robust LossesAuthors: Sara Sabour, Suhani Vora, Daniel Duckworth, Ivan Krasin, David J. Fleet, Andrea TagliasacchiSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Neural radiance fields (NeRF) excel at synthesizing new views given multiview, calibrated images of a static scene. When scenes include distractors, which are not persistent during image capture (moving objects, lighting variations, shadows), artifacts appear as viewdependent effects or 'floaters'. To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem. Our method successfully removes outliers from a scene and improves upon our baselines, on synthetic and realworld scenes. Our technique is simple to incorporate in modern NeRF frameworks, with few hyperparameters. It does not assume a priori knowledge of the types of distractors, and is instead focused on the optimization problem rather than preprocessing or modeling transient objects. More results on our page https://robustnerf.github.io/public.
 [94] arXiv:2302.00855 (crosslist from qbio.MN) [pdf, other]

Title: Molecular Geometryaware Transformer for accurate 3D Atomic System modelingSubjects: Molecular Networks (qbio.MN); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Molecular dynamic simulations are important in computational physics, chemistry, material, and biology. Machine learningbased methods have shown strong abilities in predicting molecular energy and properties and are much faster than DFT calculations. Molecular energy is at least related to atoms, bonds, bond angles, torsion angles, and nonbonding atom pairs. Previous Transformer models only use atoms as inputs which lack explicit modeling of the aforementioned factors. To alleviate this limitation, we propose Moleformer, a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them using rotational and translational invariant geometryaware spatial encoding. Proposed spatial encoding calculates relative position information including distances and angles among nodes and edges. We benchmark Moleformer on OC20 and QM9 datasets, and our model achieves stateoftheart on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties compared to other Transformer and Graph Neural Network (GNN) methods which proves the effectiveness of the proposed geometryaware spatial encoding in Moleformer.
 [95] arXiv:2302.00860 (crosslist from stat.ML) [pdf, other]

Title: Interventional and Counterfactual Inference with Diffusion ModelsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting where only observational data and the causal graph are available. Utilizing the recent developments in diffusion models, we introduce diffusionbased causal models (DCM) to learn causal mechanisms, that generate unique latent encodings to allow for direct sampling under interventions as well as abduction for counterfactuals. We utilize DCM to model structural equations, seeing that diffusion models serve as a natural candidate here since they encode each node to a latent representation, a proxy for the exogenous noise, and offer flexible and accurate modeling to provide reliable causal statements and estimates. Our empirical evaluations demonstrate significant improvements over existing stateoftheart methods for answering causal queries. Our theoretical results provide a methodology for analyzing the counterfactual error for general encoder/decoder models which could be of independent interest.
 [96] arXiv:2302.00878 (crosslist from stat.ML) [pdf, other]

Title: The Contextual Lasso: Sparse Linear Models via Deep Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Sparse linear models are a gold standard tool for interpretable machine learning, a field of emerging importance as predictive models permeate decisionmaking in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than blackbox models like deep neural networks. With this capability gap in mind, we study a notuncommon situation where the input features dichotomize into two groups: explanatory features, which we wish to explain the model's predictions, and contextual features, which we wish to determine the model's explanations. This dichotomy leads us to propose the contextual lasso, a new statistical estimator that fits a sparse linear model whose sparsity pattern and coefficients can vary with the contextual features. The fitting process involves learning a nonparametric map, realized via a deep neural network, from contextual feature vector to sparse coefficient vector. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$constrained linear models. Extensive experiments on real and synthetic data suggest that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.
 [97] arXiv:2302.00883 (crosslist from cs.GR) [pdf, other]

Title: Synthesizing Physical CharacterScene InteractionsSubjects: Graphics (cs.GR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Movement is how people interact with and affect their environment. For realistic character animation, it is necessary to synthesize such interactions between virtual characters and their surroundings. Despite recent progress in character animation using machine learning, most systems focus on controlling an agent's movements in fairly simple and homogeneous environments, with limited interactions with other objects. Furthermore, many previous approaches that synthesize humanscene interactions require significant manual labeling of the training data. In contrast, we present a system that uses adversarial imitation learning and reinforcement learning to train physicallysimulated characters that perform scene interaction tasks in a natural and lifelike manner. Our method learns scene interaction behaviors from large unstructured motion datasets, without manual annotation of the motion data. These scene interactions are learned using an adversarial discriminator that evaluates the realism of a motion within the context of a scene. The key novelty involves conditioning both the discriminator and the policy networks on scene context. We demonstrate the effectiveness of our approach through three challenging scene interaction tasks: carrying, sitting, and lying down, which require coordination of a character's movements in relation to objects in the environment. Our policies learn to seamlessly transition between different behaviors like idling, walking, and sitting. By randomizing the properties of the objects and their placements during training, our method is able to generalize beyond the objects and scenarios depicted in the training dataset, producing natural characterscene interactions for a wide variety of object shapes and placements. The approach takes physicsbased character motion generation a step closer to broad applicability.
 [98] arXiv:2302.00908 (crosslist from cs.CV) [pdf, other]

Title: GANalyzer: Analysis and Manipulation of GANs Latent Space for Controllable Face SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Generative Adversarial Networks (GANs) are capable of synthesizing highquality facial images. Despite their success, GANs do not provide any information about the relationship between the input vectors and the generated images. Currently, facial GANs are trained on imbalanced datasets, which generate less diverse images. For example, more than 77% of 100K images that we randomly synthesized using the StyleGAN3 are classified as Happy, and only around 3% are Angry. The problem even becomes worse when a mixture of facial attributes is desired: less than 1% of the generated samples are Angry Woman, and only around 2% are Happy Black. To address these problems, this paper proposes a framework, called GANalyzer, for the analysis, and manipulation of the latent space of welltrained GANs. GANalyzer consists of a set of transformation functions designed to manipulate latent vectors for a specific facial attribute such as facial Expression, Age, Gender, and Race. We analyze facial attribute entanglement in the latent space of GANs and apply the proposed transformation for editing the disentangled facial attributes. Our experimental results demonstrate the strength of GANalyzer in editing facial attributes and generating any desired faces. We also create and release a balanced photorealistic human face dataset. Our code is publicly available on GitHub.
 [99] arXiv:2302.00911 (crosslist from stat.ML) [pdf, other]

Title: Conditional expectation for missing data imputationAuthors: Mai Anh Vu, Thu Nguyen, Tu T. Do, Nhan Phan, Pål Halvorsen, Michael A. Riegler, Binh T. NguyenSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Missing data is common in datasets retrieved in various areas, such as medicine, sports, and finance. In many cases, to enable proper and reliable analyses of such data, the missing values are often imputed, and it is necessary that the method used has a low root mean square error (RMSE) between the imputed and the true values. In addition, for some critical applications, it is also often a requirement that the logic behind the imputation is explainable, which is especially difficult for complex methods that are for example, based on deep learning. This motivates us to introduce a conditional Distribution based Imputation of Missing Values (DIMV) algorithm. This approach works based on finding the conditional distribution of a feature with missing entries based on the fully observed features. As will be illustrated in the paper, DIMV (i) gives a low RMSE for the imputed values compared to stateoftheart methods under comparison; (ii) is explainable; (iii) can provide an approximated confidence region for the missing values in a given sample; (iv) works for both small and large scale data; (v) in many scenarios, does not require a huge number of parameters as deep learning approaches and therefore can be used for mobile devices or web browsers; and (vi) is robust to the normally distributed assumption that its theoretical grounds rely on. In addition to DIMV, we also introduce the DPER* algorithm improving the speed of DPER for estimating the mean and covariance matrix from the data, and we confirm the speedup via experiments.
 [100] arXiv:2302.00919 (crosslist from eess.SP) [pdf, other]

Title: QCMSGM+: Improved Quantized Compressed Sensing With ScoreBased Generative Models for General Sensing MatricesComments: arXiv admin note: substantial text overlap with arXiv:2211.13006Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
In realistic compressed sensing (CS) scenarios, the obtained measurements usually have to be quantized to a finite number of bits before transmission and/or storage, thus posing a challenge in recovery, especially for extremely coarse quantization such as 1bit sign measurements. Recently Meng & Kabashima proposed an efficient quantized compressed sensing algorithm called QCSSGM using the scorebased generative models as an implicit prior. Thanks to the power of scorebased generative models in capturing the rich structure of the prior, QCSSGM achieves remarkably better performances than previous quantized CS methods. However, QCSSGM is restricted to (approximately) roworthogonal sensing matrices since otherwise the likelihood score becomes intractable. To address this challenging problem, in this paper we propose an improved version of QCSSGM, which we call QCSSGM+, which also works well for general matrices. The key idea is a Bayesian inference perspective of the likelihood score computation, whereby an expectation propagation algorithm is proposed to approximately compute the likelihood score. Experiments on a variety of baseline datasets demonstrate that the proposed QCSSGM+ outperforms QCSSGM by a large margin when sensing matrices are far from roworthogonal.
 [101] arXiv:2302.00953 (crosslist from eess.IV) [pdf]

Title: DeepLearning Tool for Early Identifying NonTraumatic Intracranial Hemorrhage Etiology based on CT ScanAuthors: Meng Zhao, Yifan Hu, Ruixuan Jiang, Yuanli Zhao, Dong Zhang, Yan Zhang, Rong Wang, Yong Cao, Qian Zhang, Yonggang Ma, Jiaxi Li, Shaochen Yu, Wenjie Li, Ran Zhang, Yefeng Zheng, Shuo Wang, Jizong ZhaoSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Background: To develop an artificial intelligence system that can accurately identify acute nontraumatic intracranial hemorrhage (ICH) etiology based on noncontrast CT (NCCT) scans and investigate whether clinicians can benefit from it in a diagnostic setting. Materials and Methods: The deep learning model was developed with 1868 eligible NCCT scans with nontraumatic ICH collected between January 2011 and April 2018. We tested the model on two independent datasets (TT200 and SD 98) collected after April 2018. The model's diagnostic performance was compared with clinicians's performance. We further designed a simulated study to compare the clinicians's performance with and without the deep learning system augmentation. Results: The proposed deep learning system achieved area under the receiver operating curve of 0.986 (95% CI 0.9671.000) on aneurysms, 0.952 (0.9170.987) on hypertensive hemorrhage, 0.950 (0.8601.000) on arteriovenous malformation (AVM), 0.749 (0.5860.912) on Moyamoya disease (MMD), 0.837 (0.7040.969) on cavernous malformation (CM), and 0.839 (0.7220.959) on other causes in TT200 dataset. Given a 90% specificity level, the sensitivities of our model were 97.1% and 90.9% for aneurysm and AVM diagnosis, respectively. The model also shows an impressive generalizability in an independent dataset SD98. The clinicians achieve significant improvements in the sensitivity, specificity, and accuracy of diagnoses of certain hemorrhage etiologies with proposed system augmentation. Conclusions: The proposed deep learning algorithms can be an effective tool for early identification of hemorrhage etiologies based on NCCT scans. It may also provide more information for clinicians for triage and further imaging examination selection.
 [102] arXiv:2302.00973 (crosslist from stat.ML) [pdf, other]

Title: A Lightweight CNN Model for Efficient Parkinson's Disease DiagnosticsAuthors: Xuechao Wang, Junqing Huang, Marianna Chatzakou, Kadri Medijainen, Pille Taba, Aaro Toomela, Sven Nomm, Michael RuzhanskySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In recent years, deep learning methods have achieved great success in various fields due to their strong performance in practical applications. In this paper, we present a lightweight neural network for Parkinson's disease diagnostics, in which a series of handdrawn data are collected to distinguish Parkinson's disease patients from healthy control subjects. The proposed model consists of a convolution neural network (CNN) cascading to longshortterm memory (LSTM) to adapt the characteristics of collected timeseries signals. To make full use of their advantages, a multilayered LSTM model is firstly used to enrich features which are then concatenated with raw data and fed into a shallow onedimensional (1D) CNN model for efficient classification. Experimental results show that the proposed model achieves a highquality diagnostic result over multiple evaluation metrics with much fewer parameters and operations, outperforming conventional methods such as support vector machine (SVM), random forest (RF), lightgbm (LGB) and CNNbased methods.
 [103] arXiv:2302.00980 (crosslist from cs.CV) [pdf, other]

Title: Domain Generalization Emerges from DreamingComments: 23 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Recent studies have proven that DNNs, unlike human vision, tend to exploit texture information rather than shape. Such texture bias is one of the factors for the poor generalization performance of DNNs. We observe that the texture bias negatively affects not only indomain generalization but also outofdistribution generalization, i.e., Domain Generalization. Motivated by the observation, we propose a new framework to reduce the texture bias of a model by a novel optimizationbased data augmentation, dubbed Stylized Dream. Our framework utilizes adaptive instance normalization (AdaIN) to augment the style of an original image yet preserve the content. We then adopt a regularization loss to predict consistent outputs between Stylized Dream and original images, which encourages the model to learn shapebased representations. Extensive experiments show that the proposed method achieves stateoftheart performance in outofdistribution settings on public benchmark datasets: PACS, VLCS, OfficeHome, TerraIncognita, and DomainNet.
 [104] arXiv:2302.00985 (crosslist from cs.DS) [pdf, other]

Title: SpeedOblivious Online Scheduling: Knowing (Precise) Speeds is not NecessarySubjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
We consider online scheduling on unrelated (heterogeneous) machines in a speedoblivious setting, where an algorithm is unaware of the exact jobdependent processing speeds. We show strong impossibility results for clairvoyant and nonclairvoyant algorithms and overcome them in models inspired by practical settings: (i) we provide competitive learningaugmented algorithms, assuming that (possibly erroneous) predictions on the speeds are given, and (ii) we provide competitive algorithms for the speedordered model, where a single global order of machines according to their unknown jobdependent speeds is known. We prove strong theoretical guarantees and evaluate our findings on a representative heterogeneous multicore processor. These seem to be the first empirical results for algorithms with predictions that are performed in a nonsynthetic environment on real hardware.
 [105] arXiv:2302.00992 (crosslist from cs.NI) [pdf, other]

Title: Exposing the CSI: A Systematic Investigation of CSIbased WiFi Sensing Capabilities and LimitationsComments: 10 pages, 13 figures, accepted for publication in Proceedings of IEEE PerCom 2023Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Thanks to the ubiquitous deployment of WiFi hotspots, channel state information (CSI)based WiFi sensing can unleash gamechanging applications in many fields, such as healthcare, security, and entertainment. However, despite one decade of active research on WiFi sensing, most existing work only considers legacy IEEE 802.11n devices, often in particular and strictlycontrolled environments. Worse yet, there is a fundamental lack of understanding of the impact on CSIbased sensing of modern WiFi features, such as 160MHz bandwidth, multipleinput multipleoutput (MIMO) transmissions, and increased spectral resolution in IEEE 802.11ax (WiFi 6). This work aims to shed light on the impact of WiFi 6 features on the sensing performance and to create a benchmark for future research on WiFi sensing. To this end, we perform an extensive CSI data collection campaign involving 3 individuals, 3 environments, and 12 activities, using WiFi 6 signals. An anonymized ground truth obtained through video recording accompanies our 80GB dataset, which contains almost two hours of CSI data from three collectors. We leverage our dataset to dissect the performance of a stateoftheart sensing framework across different environments and individuals. Our key findings suggest that (i) MIMO transmissions and higher spectral resolution might be more beneficial than larger bandwidth for sensing applications; (ii) there is a pressing need to standardize research on WiFi sensing because the path towards a truly environmentindependent framework is still uncertain. To ease the experiments' replicability and address the current lack of WiFi 6 CSI datasets, we release our 80GB dataset to the community.
 [106] arXiv:2302.00993 (crosslist from stat.ML) [pdf, other]

Title: Unpaired MultiDomain Causal Representation LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our identifiability results into a practical method to recover the shared latent causal graph. Moreover, we study how multiple domains reduce errors in falsely detecting shared causal variables in the finite data setting.
 [107] arXiv:2302.00999 (crosslist from math.OC) [pdf, ps, other]

Title: HighProbability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded VarianceAuthors: Abdurakhmon Sadiev, Marina Danilova, Eduard Gorbunov, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter RichtárikComments: 86 pagesSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
During recent years the interest of optimization and machine learning communities in highprobability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that highprobability complexity bounds are more accurate and less studied than inexpectation ones. However, SOTA highprobability nonasymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with highprobability convergence results under less restrictive assumptions. In particular, we derive new highprobability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth nonconvex / PolyakLojasiewicz / convex / strongly convex / quasistrongly convex minimization problems, (ii) Lipschitz / starcocoercive and monotone / quasistrongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.
 [108] arXiv:2302.01002 (crosslist from stat.ML) [pdf, other]

Title: Overparameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
We consider the optimisation of large and shallow neural networks via gradient flow, where the output of each hidden node is scaled by some positive parameter. We focus on the case where the node scalings are nonidentical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large neural networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime. We also provide experiments on synthetic and realworld datasets illustrating our theoretical results and showing the benefit of such scaling in terms of pruning and transfer learning.
 [109] arXiv:2302.01027 (crosslist from cs.CV) [pdf]

Title: FCBSwinV2 Transformer for Polyp SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Polyp segmentation within colonoscopy video frames using deep learning models has the potential to automate the workflow of clinicians. This could help improve the early detection rate and characterization of polyps which could progress to colorectal cancer. Recent stateoftheart deep learning polyp segmentation models have combined the outputs of Fully Convolutional Network architectures and Transformer Network architectures which work in parallel. In this paper we propose modifications to the current stateoftheart polyp segmentation model FCBFormer. The transformer architecture of the FCBFormer is replaced with a SwinV2 TransformerUNET and minor changes to the Fully Convolutional Network architecture are made to create the FCBSwinV2 Transformer. The performance of the FCBSwinV2 Transformer is evaluated on the popular colonoscopy segmentation benchmarking datasets KvasirSEG and CVCClinicDB. Generalizability tests are also conducted. The FCBSwinV2 Transformer is able to consistently achieve higher mDice scores across all tests conducted and therefore represents new stateoftheart performance. Issues found with how colonoscopy segmentation model performance is evaluated within literature are also reported and discussed. One of the most important issues identified is that when evaluating performance on the CVCClinicDB dataset it would be preferable to ensure no data leakage from video sequences occurs during the training/validation/test data partition.
 [110] arXiv:2302.01048 (crosslist from cs.SE) [pdf, other]

Title: Teaching MLOps in Higher Education through ProjectBased LearningComments: Accepted in 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSESEET)Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Building and maintaining productiongrade MLenabled components is a complex endeavor that goes beyond the current approach of academic education, focused on the optimization of ML model performance in the lab. In this paper, we present a projectbased learning approach to teaching MLOps, focused on the demonstration and experience with emerging practices and tools to automatize the construction of MLenabled components. We examine the design of a course based on this approach, including laboratory sessions that cover the endtoend ML component life cycle, from model building to production deployment. Moreover, we report on preliminary results from the first edition of the course. During the present year, an updated version of the same course is being delivered in two independent universities; the related learning outcomes will be evaluated to analyze the effectiveness of projectbased learning for this specific subject.
 [111] arXiv:2302.01051 (crosslist from stat.ML) [pdf, other]

Title: Randomized prior wavelet neural operator for uncertainty quantificationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In this paper, we propose a novel datadriven operator learning framework referred to as the \textit{Randomized Prior Wavelet Neural Operator} (RPWNO). The proposed RPWNO is an extension of the recently proposed wavelet neural operator, which boasts excellent generalizing capabilities but cannot estimate the uncertainty associated with its predictions. RPWNO, unlike the vanilla WNO, comes with inherent uncertainty quantification module and hence, is expected to be extremely useful for scientists and engineers alike. RPWNO utilizes randomized prior networks, which can account for prior information and is easier to implement for large, complex deeplearning architectures than its Bayesian counterpart. Four examples have been solved to test the proposed framework, and the results produced advocate favorably for the efficacy of the proposed framework.
 [112] arXiv:2302.01060 (crosslist from cs.RO) [pdf, ps, other]

Title: Physics Constrained Motion Prediction with Uncertainty QuantificationComments: Submitted to IV 2023Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Predicting the motion of dynamic agents is a critical task for guaranteeing the safety of autonomous systems. A particular challenge is that motion prediction algorithms should obey dynamics constraints and quantify prediction uncertainty as a measure of confidence. We present a physicsconstrained approach for motion prediction which uses a surrogate dynamical model to ensure that predicted trajectories are dynamically feasible. We propose a twostep integration consisting of intent and trajectory prediction subject to dynamics constraints. We also construct prediction regions that quantify uncertainty and are tailored for autonomous driving by using conformal prediction, a popular statistical tool. Physics Constrained Motion Prediction achieves a 41% better ADE, 56% better FDE, and 19% better IoU over a baseline in experiments using an autonomous racing dataset.
 [113] arXiv:2302.01067 (crosslist from cs.AI) [pdf, other]

Title: A Survey on Compositional Generalization in ApplicationsSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The field of compositional generalization is currently experiencing a renaissance in AI, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical compositional generalization problem. This article aims to provide a comprehensive review of top recent developments in multiple reallife applications of the compositional generalization. Specifically, we introduce a taxonomy of common applications and summarize the stateoftheart for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field.
 [114] arXiv:2302.01075 (crosslist from stat.ML) [pdf, other]

Title: MonoFlow: Rethinking Divergence GANs via the Perspective of Differential EquationsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The conventional understanding of adversarial training in generative adversarial networks (GANs) is that the discriminator is trained to estimate a divergence, and the generator learns to minimize this divergence. We argue that despite the fact that many variants of GANs were developed following this paradigm, the current theoretical understanding of GANs and their practical algorithms are inconsistent. In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs. We introduce a unified generative modeling framework  MonoFlow: the particle evolution is rescaled via a monotonically increasing mapping of the log density ratio. Under our framework, adversarial training can be viewed as a procedure first obtaining MonoFlow's vector field via training the discriminator and the generator learns to draw the particle flow defined by the corresponding vector field. We also reveal the fundamental difference between variational divergence minimization and adversarial training. This analysis helps us to identify what types of generator loss functions can lead to the successful training of GANs and suggest that GANs may have more loss designs beyond the literature (e.g., nonsaturated loss), as long as they realize MonoFlow. Consistent empirical studies are included to validate the effectiveness of our framework.
 [115] arXiv:2302.01078 (crosslist from condmat.mtrlsci) [pdf, other]

Title: Computational Discovery of Microstructured Composites with Optimal StrengthToughness TradeOffsAuthors: Beichen Li, Bolei Deng, Wan Shou, TaeHyun Oh, Yuanming Hu, Yiyue Luo, Liang Shi, Wojciech MatusikSubjects: Materials Science (condmat.mtrlsci); Machine Learning (cs.LG)
The conflict between strength and toughness is a fundamental problem in engineering materials design. However, systematic discovery of microstructured composites with optimal strengthtoughness tradeoffs has never been demonstrated due to the discrepancies between simulation and reality and the lack of dataefficient exploration of the entire Pareto front. Here, we report a widely applicable pipeline harnessing physical experiments, numerical simulations, and artificial neural networks to efficiently discover microstructured designs that are simultaneously tough and strong. Using a physicsbased simulator with moderate complexity, our strategy runs a datadriven proposalvalidation workflow in a nestedloop fashion to bridge the gap between simulation and reality in high sample efficiency. Without any prescribed expert knowledge of materials design, our approach automatically identifies existing toughness enhancement mechanisms that were traditionally discovered through trialanderror or biomimicry. We provide a blueprint for the computational discovery of optimal designs, which inverts traditional scientific approaches, and is applicable to a wide range of research problems beyond composites, including polymer chemistry, fluid dynamics, meteorology, and robotics.
 [116] arXiv:2302.01089 (crosslist from cs.CV) [pdf, other]

Title: Curriculum Learning for ab initio Deep Learned Refractive OpticsComments: Automatically design computational lenses from scratch with differentiable ray tracingSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optics (physics.optics)
Deep lens optimization has recently emerged as a new paradigm for designing computational imaging systems, however it has been limited to either simple optical systems consisting of a single DOE or metalens, or the finetuning of compound lenses from good initial designs. Here we present a deep lens design method based on curriculum learning, which is able to learn optical designs of compound lenses ab initio from randomly initialized surfaces, therefore overcoming the need for a good initial design. We demonstrate this approach with the fullyautomatic design of an extended depthoffield computational camera in a cellphonestyle form factor, highly aspherical surfaces, and a short back focal length.
 [117] arXiv:2302.01144 (crosslist from cs.CV) [pdf, other]

Title: UWCVGAN: UnderWater Image Enhancement with Capsules Vectors QuantizationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
The degradation in the underwater images is due to wavelengthdependent light attenuation, scattering, and to the diversity of the water types in which they are captured. Deep neural networks take a step in this field, providing autonomous models able to achieve the enhancement of underwater images. We introduce Underwater Capsules Vectors GAN UWCVGAN based on the discrete features quantization paradigm from VQGAN for this task. The proposed UWCVGAN combines an encoding network, which compresses the image into its latent representation, with a decoding network, able to reconstruct the enhancement of the image from the only latent representation. In contrast with VQGAN, UWCVGAN achieves feature quantization by exploiting the clusterization ability of capsule layer, making the model completely trainable and easier to manage. The model obtains enhanced underwater images with high quality and fine details. Moreover, the trained encoder is independent of the decoder giving the possibility to be embedded onto the collector as compressing algorithm to reduce the memory space required for the images, of factor $3\times$. \myUWCVGAN{ }is validated with quantitative and qualitative analysis on benchmark datasets, and we present metrics results compared with the state of the art.
 [118] arXiv:2302.01152 (crosslist from cs.AI) [pdf]

Title: A comparative study of statistical and machine learning models on nearrealtime daily emissions predictionAuthors: Xiangqian LiSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Physics and Society (physics.socph)
The rapid ascent in carbon dioxide emissions is a major cause of global warming and climate change, which pose a huge threat to human survival and impose farreaching influence on the global ecosystem. Therefore, it is very necessary to effectively control carbon dioxide emissions by accurately predicting and analyzing the change trend timely, so as to provide a reference for carbon dioxide emissions mitigation measures. This paper is aiming to select a suitable model to predict the nearrealtime daily emissions based on univariate daily timeseries data from January 1st, 2020 to September 30st, 2022 of all sectors (Power, Industry, Ground Transport, Residential, Domestic Aviation, International Aviation) in China. We proposed six prediction models, which including three statistical models: Grey prediction (GM(1,1)), autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average with exogenous factors (SARIMAX); three machine learning models: artificial neural network (ANN), random forest (RF) and long short term memory (LSTM). To evaluate the performance of these models, five criteria: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Coefficient of Determination () are imported and discussed in detail. In the results, three machine learning models perform better than that three statistical models, in which LSTM model performs the best on five criteria values for daily emissions prediction with the 3.5179e04 MSE value, 0.0187 RMSE value, 0.0140 MAE value, 14.8291% MAPE value and 0.9844 value.
 [119] arXiv:2302.01170 (crosslist from stat.ML) [pdf, other]

Title: Timewarp: Transferable Acceleration of Molecular Dynamics by Learning TimeCoarsened DynamicsAuthors: Leon Klein, Andrew Y. K. Foong, Tor Erlend Fjelde, Bruno Mlodozeniec, Marc Brockschmidt, Sebastian Nowozin, Frank Noé, Ryota TomiokaSubjects: Machine Learning (stat.ML); Statistical Mechanics (condmat.statmech); Machine Learning (cs.LG); Chemical Physics (physics.chemph)
Molecular dynamics (MD) simulation is a widely used technique to simulate molecular systems, most commonly at the allatom resolution where the equations of motion are integrated with timesteps on the order of femtoseconds ($1\textrm{fs}=10^{15}\textrm{s}$). MD is often used to compute equilibrium properties, which requires sampling from an equilibrium distribution such as the Boltzmann distribution. However, many important processes, such as binding and folding, occur over timescales of milliseconds or beyond, and cannot be efficiently sampled with conventional MD. Furthermore, new MD simulations need to be performed from scratch for each molecular system studied. We present Timewarp, an enhanced sampling method which uses a normalising flow as a proposal distribution in a Markov chain Monte Carlo method targeting the Boltzmann distribution. The flow is trained offline on MD trajectories and learns to make large steps in time, simulating the molecular dynamics of $10^{5}  10^{6}\:\textrm{fs}$. Crucially, Timewarp is transferable between molecular systems: once trained, we show that it generalises to unseen small peptides (24 amino acids), exploring their metastable states and providing wallclock acceleration when sampling compared to standard MD. Our method constitutes an important step towards developing general, transferable algorithms for accelerating MD.
 [120] arXiv:2302.01190 (crosslist from stat.ML) [pdf, other]

Title: On the Efficacy of Differentially Private Fewshot Image ClassificationAuthors: Marlon Tobaben, Aliaksandra Shysheya, John Bronskill, Andrew Paverd, Shruti Tople, Santiago ZanellaBeguelin, Richard E Turner, Antti HonkelaSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best nonprivate models. These DP models are typically pretrained on large public datasets and then finetuned on downstream datasets that are (i) relatively large, and (ii) similar in distribution to the pretraining data. However, in many applications including personalization, it is crucial to perform well in the fewshot setting, as obtaining large amounts of labeled data may be problematic; and on images from a wide variety of domains for use in various specialist settings. To understand under which conditions fewshot DP can be effective, we perform an exhaustive set of experiments that reveals how the accuracy and vulnerability to attack of fewshot DP image classification models are affected as the number of shots per class, privacy level, model architecture, dataset, and subset of learnable parameters in the model vary. We show that to achieve DP accuracy on par with nonprivate models, the shots per class must be increased as the privacy level increases by as much as 32$\times$ for CIFAR100 at $\epsilon=1$. We also find that fewshot nonprivate models are highly susceptible to membership inference attacks. DP provides clear mitigation against the attacks, but a small $\epsilon$ is required to effectively prevent them. Finally, we evaluate DP federated learning systems and establish stateoftheart performance on the challenging FLAIR federated learning benchmark.
 [121] arXiv:2302.01191 (crosslist from math.OA) [pdf, other]

Title: Noncommutative $C^*$algebra Net: Learning Neural Networks with Powerful Product Structure in $C^*$algebraSubjects: Operator Algebras (math.OA); Machine Learning (cs.LG); Functional Analysis (math.FA)
We propose a new generalization of neural networks with noncommutative $C^*$algebra. An important feature of $C^*$algebras is their noncommutative structure of products, but the existing $C^*$algebra net frameworks have only considered commutative $C^*$algebras. We show that this noncommutative structure of $C^*$algebras induces powerful effects in learning neural networks. Our framework has a wide range of applications, such as learning multiple related neural networks simultaneously with interactions and learning invariant features with respect to group actions. We also show the validity of our framework numerically, which illustrates its potential power.
 [122] arXiv:2302.01199 (crosslist from cs.NI) [pdf, other]

Title: Multiagent Reinforcement Learning with Graph QNetworks for Antenna TuningComments: 9 pages, 8 figures, to appear in IEEE NOMS 2023Subjects: Networking and Internet Architecture (cs.NI); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Future generations of mobile networks are expected to contain more and more antennas with growing complexity and more parameters. Optimizing these parameters is necessary for ensuring the good performance of the network. The scale of mobile networks makes it challenging to optimize antenna parameters using manual intervention or handengineered strategies. Reinforcement learning is a promising technique to address this challenge but existing methods often use local optimizations to scale to large network deployments. We propose a new multiagent reinforcement learning algorithm to optimize mobile network configurations globally. By using a value decomposition approach, our algorithm can be trained from a global reward function instead of relying on an adhoc decomposition of the network performance across the different cells. The algorithm uses a graph neural network architecture which generalizes to different network topologies and learns coordination behaviors. We empirically demonstrate the performance of the algorithm on an antenna tilt tuning problem and a joint tilt and power control problem in a simulated environment.
 [123] arXiv:2302.01203 (crosslist from cs.GT) [pdf, ps, other]

Title: Online Bidding in Repeated NonTruthful Auctions under Budget and ROI ConstraintsSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Online advertising platforms typically use auction mechanisms to allocate ad placements. Advertisers participate in a series of repeated auctions, and must select bids that will maximize their overall rewards while adhering to certain constraints. We focus on the scenario in which the advertiser has budget and returnoninvestment (ROI) constraints. We investigate the problem of budget and ROIconstrained bidding in repeated nontruthful auctions, such as firstprice auctions, and present a bestofbothworlds framework with noregret guarantees under both stochastic and adversarial inputs. By utilizing the notion of interval regret, we demonstrate that our framework does not require knowledge of specific parameters of the problem which could be difficult to determine in practice. Our proof techniques can be applied to both the adversarial and stochastic cases with minimal modifications, thereby providing a unified perspective on the two problems. In the adversarial setting, we also show that it is possible to loosen the traditional requirement of having a strictly feasible solution to the offline optimization problem at each round.
 [124] arXiv:2302.01217 (crosslist from stat.ML) [pdf, other]

Title: A Theoretical Justification for Image Inpainting using Denoising Diffusion Probabilistic ModelsComments: 30 pages, 5 figures, 1 TableSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)
We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. While most inpainting algorithms require retraining with each new mask, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. We analyze a recently proposed popular diffusion based inpainting algorithm called RePaint (Lugmayr et al., 2022), and show that it has a bias due to misalignment that hampers sample recovery even in a twostate diffusion process. Motivated by our analysis, we propose a modified RePaint algorithm we call RePaint$^+$ that provably recovers the underlying true sample and enjoys a linear rate of convergence. It achieves this by rectifying the misalignment error present in drift and dispersion of the reverse process. To the best of our knowledge, this is the first linear convergence result for a diffusion based image inpainting algorithm.
 [125] arXiv:2302.01226 (crosslist from cs.CV) [pdf, other]

Title: Factor Fields: A Unified Framework for Neural Fields and BeyondComments: 13 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
We present Factor Fields, a novel framework for modeling and representing signals. Factor Fields decomposes a signal into a product of factors, each of which is represented by a neural or regular field representation operating on a coordinate transformed input signal. We show that this decomposition yields a unified framework that generalizes several recent signal representations including NeRF, PlenOxels, EG3D, InstantNGP, and TensoRF. Moreover, the framework allows for the creation of powerful new signal representations, such as the CoefficientBasis Factorization (CoBaFa) which we propose in this paper. As evidenced by our experiments, CoBaFa leads to improvements over previous fast reconstruction methods in terms of the three critical goals in neural signal representation: approximation quality, compactness and efficiency. Experimentally, we demonstrate that our representation achieves better image approximation quality on 2D image regression tasks, higher geometric quality when reconstructing 3D signed distance fields and higher compactness for radiance field reconstruction tasks compared to previous fast reconstruction methods. Besides, our CoBaFa representation enables generalization by sharing the basis across signals during training, enabling generalization tasks such as image regression with sparse observations and fewshot radiance field reconstruction.
 [126] arXiv:2302.01237 (crosslist from stat.ML) [pdf, other]

Title: Robust Estimation under the Wasserstein DistanceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We study the problem of robust distribution estimation under the Wasserstein metric, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. We introduce a new outlierrobust Wasserstein distance $\mathsf{W}_p^\varepsilon$ which allows for $\varepsilon$ outlier mass to be removed from its input distributions, and show that minimum distance estimation under $\mathsf{W}_p^\varepsilon$ achieves minimax optimal robust estimation risk. Our analysis is rooted in several new results for partial OT, including an approximate triangle inequality, which may be of independent interest. To address computational tractability, we derive a dual formulation for $\mathsf{W}_p^\varepsilon$ that adds a simple penalty term to the classic Kantorovich dual objective. As such, $\mathsf{W}_p^\varepsilon$ can be implemented via an elementary modification to standard, dualitybased OT solvers. Our results are extended to sliced OT, where distributions are projected onto lowdimensional subspaces, and applications to homogeneity and independence testing are explored. We illustrate the virtues of our framework via applications to generative modeling with contaminated datasets.
 [127] arXiv:2302.01241 (crosslist from cs.AI) [pdf, other]

Title: Diagrammatization: Rationalizing with diagrammatic AI explanations for abductive reasoning on hypothesesSubjects: Artificial Intelligence (cs.AI); HumanComputer Interaction (cs.HC); Machine Learning (cs.LG)
Many visualizations have been developed for explainable AI (XAI), but they often require further reasoning by users to interpret. We argue that XAI should support abductive reasoning  inference to the best explanation  with diagrammatic reasoning to convey hypothesis generation and evaluation. Inspired by Peircean diagrammatic reasoning and the 5step abduction process, we propose Diagrammatization, an approach to provide diagrammatic, abductive explanations based on domain hypotheses. We implemented DiagramNet for a clinical application to predict diagnoses from heart auscultation, and explain with shapebased murmur diagrams. In modeling studies, we found that DiagramNet not only provides faithful murmur shape explanations, but also has better prediction performance than baseline models. We further demonstrate the usefulness of diagrammatic explanations in a qualitative user study with medical students, showing that clinicallyrelevant, diagrammatic explanations are preferred over technical saliency map explanations. This work contributes insights into providing domainconventional abductive explanations for usercentric XAI.
 [128] arXiv:2302.01243 (crosslist from cs.CV) [pdf, other]

Title: Human not in the loop: objective sample difficulty measures for Curriculum LearningComments: ISBI 2023Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Curriculum learning is a learning method that trains models in a meaningful order from easier to harder samples. A key here is to devise automatic and objective difficulty measures of samples. In the medical domain, previous work applied domain knowledge from human experts to qualitatively assess classification difficulty of medical images to guide curriculum learning, which requires extra annotation efforts, relies on subjective human experience, and may introduce bias. In this work, we propose a new automated curriculum learning technique using the variance of gradients (VoG) to compute an objective difficulty measure of samples and evaluated its effects on elbow fracture classification from Xray images. Specifically, we used VoG as a metric to rank each sample in terms of the classification difficulty, where high VoG scores indicate more difficult cases for classification, to guide the curriculum training process We compared the proposed technique to a baseline (without curriculum learning), a previous method that used human annotations on classification difficulty, and anticurriculum learning. Our experiment results showed comparable and higher performance for the binary and multiclass bone fracture classification tasks.
 [129] arXiv:2302.01248 (crosslist from stat.ML) [pdf, other]

Title: Avoiding Model Estimation in Robust Markov Decision Processes with a Generative ModelSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Robust Markov Decision Processes (MDPs) are getting more attention for learning a robust policy which is less sensitive to environment changes. There are an increasing number of works analyzing sampleefficiency of robust MDPs. However, most works study robust MDPs in a modelbased regime, where the transition probability needs to be estimated and requires $\mathcal{O}(\mathcal{S}^2\mathcal{A})$ storage in memory. A common way to solve robust MDPs is to formulate them as a distributionally robust optimization (DRO) problem. However, solving a DRO problem is nontrivial, so prior works typically assume a strong oracle to obtain the optimal solution of the DRO problem easily. To remove the need for an oracle, we first transform the original robust MDPs into an alternative form, as the alternative form allows us to use stochastic gradient methods to solve the robust MDPs. Moreover, we prove the alternative form still preserves the role of robustness. With this new formulation, we devise a sampleefficient algorithm to solve the robust MDPs in a modelfree regime, from which we benefit lower memory space $\mathcal{O}(\mathcal{S}\mathcal{A})$ without using the oracle. Finally, we validate our theoretical findings via numerical experiments and show the efficiency to solve the alternative form of robust MDPs.
 [130] arXiv:2302.01278 (crosslist from math.DS) [pdf, other]

Title: Convolutional Autoencoders, Clustering and POD for Lowdimensional Parametrization of NavierStokes EquationsComments: 16 pages, 12 figuresSubjects: Dynamical Systems (math.DS); Machine Learning (cs.LG)
Simulations of largescale dynamical systems require expensive computations. Lowdimensional parametrization of highdimensional states such as Proper Orthogonal Decomposition (POD) can be a solution to lessen the burdens by providing a certain compromise between accuracy and model complexity. However, for really lowdimensional parametrizations (for example for controller design) linear methods like the POD come to their natural limits so that nonlinear approaches will be the methods of choice. In this work we propose a convolutional autoencoder (CAE) consisting of a nonlinear encoder and an affine linear decoder and consider combinations with kmeans clustering for improved encoding performance. The proposed set of methods is compared to the standard POD approach in two cylinderwake scenarios modeled by the incompressible NavierStokes equations.
 [131] arXiv:2302.01302 (crosslist from cs.NE) [pdf, other]

Title: Bayesian Inference on Binary Spiking Networks Leveraging Nanoscale Device StochasticityAuthors: Prabodh Katti, Nicolas Skatchkovsky, Osvaldo Simeone, Bipin Rajendran, Bashir M. AlHashimiComments: Submitted and Accepted in ISCAS 2023Subjects: Neural and Evolutionary Computing (cs.NE); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
Bayesian Neural Networks (BNNs) can overcome the problem of overconfidence that plagues traditional frequentist deep neural networks, and are hence considered to be a key enabler for reliable AI systems. However, conventional hardware realizations of BNNs are resource intensive, requiring the implementation of random number generators for synaptic sampling. Owing to their inherent stochasticity during programming and read operations, nanoscale memristive devices can be directly leveraged for sampling, without the need for additional hardware resources. In this paper, we introduce a novel Phase Change Memory (PCM)based hardware implementation for BNNs with binary synapses. The proposed architecture consists of separate weight and noise planes, in which PCM cells are configured and operated to represent the nominal values of weights and to generate the required noise for sampling, respectively. Using experimentally observed PCM noise characteristics, for the exemplary Breast Cancer Dataset classification problem, we obtain hardware accuracy and expected calibration error matching that of an 8bit fixedpoint (FxP8) implementation, with projected savings of over 9$\times$ in terms of core area transistor count.
 [132] arXiv:2302.01308 (crosslist from cs.CL) [pdf, other]

Title: What Language Reveals about Perception: Distilling Psychophysical Knowledge from Large Language ModelsComments: 7 pages, 5 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Understanding the extent to which the perceptual world can be recovered from language is a fundamental problem in cognitive science. We reformulate this problem as that of distilling psychophysical information from text and show how this can be done by combining large language models (LLMs) with a classic psychophysical method based on similarity judgments. Specifically, we use the prompt autocompletion functionality of GPT3, a stateoftheart LLM, to elicit similarity scores between stimuli and then apply multidimensional scaling to uncover their underlying psychological space. We test our approach on six perceptual domains and show that the elicited judgments strongly correlate with human data and successfully recover wellknown psychophysical structures such as the color wheel and pitch spiral. We also explore meaningful divergences between LLM and human representations. Our work showcases how combining stateoftheart machine models with wellknown cognitive paradigms can shed new light on fundamental questions in perception and language research.
 [133] arXiv:2302.01310 (crosslist from stat.ML) [pdf, other]

Title: Bayesian Optimization of Multiple Objectives with Different LatenciesComments: 25 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
Multiobjective Bayesian optimization aims to find the Pareto front of optimal tradeoffs between a set of expensive objectives while collecting as few samples as possible. In some cases, it is possible to evaluate the objectives separately, and a different latency or evaluation cost can be associated with each objective. This presents an opportunity to learn the Pareto front faster by evaluating the cheaper objectives more frequently. We propose a scalarization based knowledge gradient acquisition function which accounts for the different evaluation costs of the objectives. We prove consistency of the algorithm and show empirically that it significantly outperforms a benchmark algorithm which always evaluates both objectives.
 [134] arXiv:2302.01316 (crosslist from cs.CV) [pdf, other]

Title: Are Diffusion Models Vulnerable to Membership Inference Attacks?Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Diffusionbased generative models have shown great potential for image synthesis, but there is a lack of research on the security and privacy risks they may pose. In this paper, we investigate the vulnerability of diffusion models to Membership Inference Attacks (MIAs), a common privacy concern. Our results indicate that existing MIAs designed for GANs or VAE are largely ineffective on diffusion models, either due to inapplicable scenarios (e.g., requiring the discriminator of GANs) or inappropriate assumptions (e.g., closer distances between synthetic images and member images). To address this gap, we propose Stepwise Error Comparing Membership Inference (SecMI), a blackbox MIA that infers memberships by assessing the matching of forward process posterior estimation at each timestep. SecMI follows the common overfitting assumption in MIA where member samples normally have smaller estimation errors, compared with holdout samples. We consider both the standard diffusion models, e.g., DDPM, and the texttoimage diffusion models, e.g., Stable Diffusion. Experimental results demonstrate that our methods precisely infer the membership with high confidence on both of the two scenarios across six different datasets
 [135] arXiv:2302.01327 (crosslist from cs.CV) [pdf, other]

Title: Dual PatchNormSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments, incorporating this trivial modification, often leads to improved accuracy over welltuned Vision Transformers and never hurts.
 [136] arXiv:2302.01328 (crosslist from cs.CV) [pdf, other]

Title: $IC^3$: Image Captioning by Committee ConsensusSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to approximate the reference distribution of image captions, however, doing so encourages captions that are viewpointimpoverished. Such captions often focus on only a subset of the possible details, while ignoring potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: "Image Captioning by Committee Consensus" ($IC^3$), designed to generate a single caption that captures highlevel details from several viewpoints. Notably, humans rate captions produced by $IC^3$ at least as helpful as baseline SOTA models more than two thirds of the time, and $IC^3$ captions can improve the performance of SOTA automated recall systems by up to 84%, indicating significant material improvements over existing SOTA approaches for visual description. Our code is publicly available at https://github.com/DavidMChan/captionbycommittee
Replacements for Fri, 3 Feb 23
 [137] arXiv:1905.05095 (replaced) [pdf, other]

Title: Do Kernel and Neural Embeddings Help in Training and Generalization?Comments: This work is published by Neural Processing LettersJournalref: Neural Processing Letters (2022)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [138] arXiv:1910.13659 (replaced) [pdf, other]

Title: Efficient PrivacyPreserving Stochastic Nonconvex OptimizationComments: 29 pages, 5 figures, 3 tables. This version corrects a miscalculation in the previous proof, resulting in an improved utility bound for the algorithmSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [139] arXiv:2003.04422 (replaced) [pdf, other]

Title: Correlated Initialization for Correlated DataAuthors: Johannes SchneiderSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [140] arXiv:2003.13438 (replaced) [pdf, other]

Title: Analysis of Knowledge Transfer in Kernel RegimeAuthors: Arman Rahbar, Ashkan Panahi, Chiranjib Bhattacharyya, Devdatt Dubhashi, Morteza Haghir ChehreghaniComments: The work is published by CIKM 2022Journalref: ACM International Conference on Information and Knowledge Management, October 2022, pp.16151624Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [141] arXiv:2107.08598 (replaced) [pdf, other]

Title: LearningToEnsemble by Contextual Rank Aggregation in ECommerceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [142] arXiv:2112.02393 (replaced) [pdf, ps, other]

Title: OptimizationBased Separations for Neural NetworksSubjects: Machine Learning (cs.LG)
 [143] arXiv:2202.12183 (replaced) [pdf, other]

Title: Largescale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable ConvergenceComments: 32 pages, 12 figures; Accepted by ICML2022Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [144] arXiv:2205.09072 (replaced) [pdf, ps, other]

Title: On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit BiasSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [145] arXiv:2205.15480 (replaced) [pdf, other]

Title: Posthoc Concept Bottleneck ModelsComments: ICLR 2023 Spotlight (notabletop25%)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [146] arXiv:2206.00471 (replaced) [pdf, other]

Title: Augmentation Component Analysis: Modeling Similarity via the Augmentation OverlapsComments: Accept to ICLR 2023Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [147] arXiv:2206.02001 (replaced) [pdf, other]

Title: Surprising Instabilities in Training Deep Networks and a Theoretical AnalysisSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
 [148] arXiv:2207.12141 (replaced) [pdf, other]

Title: Live in the Moment: Learning Dynamics Model Adapted to Evolving PolicyComments: 16 pages, 5 figuresSubjects: Machine Learning (cs.LG)
 [149] arXiv:2208.08881 (replaced) [pdf]

Title: Modelling the longterm fairness dynamics of datadriven targeted help on job seekersJournalref: Sci Rep 13, 1727 (2023)Subjects: Machine Learning (cs.LG)
 [150] arXiv:2208.10967 (replaced) [pdf, other]

Title: The Value of OutofDistribution DataComments: Previous versions of this work have been presented at the OutofDistribution Generalization in Computer Vision (OODCV) Workshop (ECCV 2022) and the Workshop on Distribution Shifts (NeurIPS 2022)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [151] arXiv:2210.00226 (replaced) [pdf, other]

Title: Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated LearningComments: Accepted by ICLR 2023Subjects: Machine Learning (cs.LG)
 [152] arXiv:2210.00301 (replaced) [pdf, other]

Title: Learning Globally Smooth Functions on ManifoldsSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
 [153] arXiv:2210.01407 (replaced) [pdf, other]

Title: Homotopybased training of NeuralODEs for accurate dynamics discoveryComments: 19 pages, 17 figures, submitted to ICML2023Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC); Applied Physics (physics.appph)
 [154] arXiv:2210.02419 (replaced) [pdf, other]

Title: BoundaryAware Uncertainty for Feature Attribution ExplainersSubjects: Machine Learning (cs.LG)
 [155] arXiv:2210.08347 (replaced) [pdf, other]

Title: MiniBatch Learning Strategies for modeling long term temporal dependencies: A study in environmental applicationsAuthors: Shaoming Xu, Ankush Khandelwal, Xiang Li, Xiaowei Jia, Licheng Liu, Jared Willard, Rahul Ghosh, Kelly Cutler, Michael Steinbach, Christopher Duffy, John Nieber, Vipin KumarComments: 1. Add experiments results on LSTM and Transformer. 2. Update Time efficiency table (table 4). 3. Share codes and dataSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [156] arXiv:2210.10769 (replaced) [pdf, other]

Title: "Why did the Model Fail?": Attributing Model Performance Changes to Distribution ShiftsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [157] arXiv:2210.13277 (replaced) [pdf, other]

Title: Provably Doubly Accelerated Federated Learning: The First Theoretically Successful Combination of Local Training and Communication CompressionSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
 [158] arXiv:2210.16913 (replaced) [pdf, other]

Title: Revisiting Simple Regret: Fast Rates for Returning a Good ArmSubjects: Machine Learning (cs.LG)
 [159] arXiv:2211.01916 (replaced) [pdf, ps, other]

Title: Improved Analysis of Scorebased Generative Modeling: UserFriendly Bounds under Minimal Smoothness AssumptionsComments: 36 pagesSubjects: Machine Learning (cs.LG)
 [160] arXiv:2211.05520 (replaced) [pdf, other]

Title: Unravelling the Performance of Physicsinformed Graph Neural Networks for Dynamical SystemsAuthors: Abishek Thangamuthu, Gunjan Kumar, Suresh Bishnoi, Ravinder Bhattoo, N M Anoop Krishnan, Sayan RanuComments: Accepted at 36th Conference on Neural Information Processing Systems (NeurIPS 2022)Subjects: Machine Learning (cs.LG); Computational Physics (physics.compph)
 [161] arXiv:2212.11080 (replaced) [pdf, other]

Title: Is it worth it? Comparing six deep and classical methods for unsupervised anomaly detection in time seriesComments: 17 Pages, The repository to reproduce the results is available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
 [162] arXiv:2301.02747 (replaced) [pdf, other]

Title: Sampleefficient Surrogate Model for Frequency Response of Linear PDEs using SelfAttentive Complex PolynomialsAuthors: Andrew Cohen, Weiping Dou, Jiang Zhu, Slawomir Koziel, Peter Renner, JanOve Mattsson, Xiaomeng Yang, Beidi Chen, Kevin Stone, Yuandong TianSubjects: Machine Learning (cs.LG)
 [163] arXiv:2301.05849 (replaced) [pdf, other]

Title: Survey of Knowledge Distillation in Federated Edge LearningComments: 13 pages, 1 figure, 2 tablesSubjects: Machine Learning (cs.LG)
 [164] arXiv:2301.09422 (replaced) [pdf, other]

Title: HALOC: HardwareAware Automatic LowRank Compression for Compact Neural NetworksAuthors: Jinqi Xiao, Chengming Zhang, Yu Gong, Miao Yin, Yang Sui, Lizhi Xiang, Dingwen Tao, Bo YuanComments: AAAI23Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
 [165] arXiv:2301.09977 (replaced) [pdf, other]

Title: The Backpropagation algorithm for a math studentSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Numerical Analysis (math.NA)
 [166] arXiv:2301.11546 (replaced) [pdf, other]

Title: Adapting Stepsize: A Unified Perspective to Analyze and Improve Gradientbased Methods for Adversarial AttacksSubjects: Machine Learning (cs.LG)
 [167] arXiv:2301.11673 (replaced) [pdf, other]
 [168] arXiv:2301.11962 (replaced) [pdf, other]

Title: On the Feasibility of Machine Learning Augmented Magnetic Resonance for PointofCare Identification of DiseaseAuthors: Raghav Singhal, Mukund Sudarshan, Anish Mahishi, Sri Kaushik, Luke Ginocchio, Angela Tong, Hersh Chandarana, Daniel K. Sodickson, Rajesh Ranganath, Sumit ChopraSubjects: Machine Learning (cs.LG)
 [169] arXiv:2301.11964 (replaced) [pdf, other]

Title: Adversarial Networks and Machine Learning for File ClassificationSubjects: Machine Learning (cs.LG)
 [170] arXiv:2301.12246 (replaced) [pdf, other]

Title: A Closer Look at Fewshot Classification AgainSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [171] arXiv:2301.12616 (replaced) [pdf, other]

Title: Active Sequential TwoSample TestingAuthors: Weizhi Li, Karthikeyan Natesan Ramamurthy, Prad Kadambi, Pouria Saidi, Gautam Dasarathy, Visar BerishaSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
 [172] arXiv:2301.13293 (replaced) [pdf, other]

Title: An adversarial feature learning strategy for debiasing neural networksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [173] arXiv:2301.13362 (replaced) [pdf, other]

Title: Optimizing DDPM Sampling with Shortcut FineTuningSubjects: Machine Learning (cs.LG)
 [174] arXiv:2302.00293 (replaced) [pdf, other]

Title: A Survey of Methods, Challenges and Perspectives in CausalityComments: 40 pages, 37 pages for the main paper and 3 pages for the supplement, 10 figures, submitted to ACM Computing Surveys; enabled hyperlinks in new versionSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
 [175] arXiv:2302.00353 (replaced) [pdf, other]

Title: Towards LabelEfficient Incremental Learning: A SurveySubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [176] arXiv:2302.00533 (replaced) [pdf, other]

Title: Distillation Policy OptimizationAuthors: Jianfei MaSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [177] arXiv:2302.00628 (replaced) [pdf, other]

Title: Training Normalizing Flows with the PrecisionRecall DivergenceSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [178] arXiv:2104.13669 (replaced) [pdf, other]

Title: Optimal Stopping via Randomized Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Probability (math.PR); Computational Finance (qfin.CP)
 [179] arXiv:2106.07916 (replaced) [pdf, other]

Title: Encouraging IntraClass Diversity Through a Reverse Contrastive Loss for Better SingleSource Domain GeneralizationJournalref: ICCV  Workshop on Adversarial Robustness In the Real World, 2021, Virtual, FranceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [180] arXiv:2201.12558 (replaced) [pdf, other]

Title: The KFIoU Loss for Rotated Object DetectionAuthors: Xue Yang, Yue Zhou, Gefan Zhang, Jirui Yang, Wentao Wang, Junchi Yan, Xiaopeng Zhang, Qi TianComments: 17 pages, 6 figures, 7 tables, accepted by ICLR 2023, TensorFlow code: this https URL, PyTorch code: this https URL, Jittor code: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [181] arXiv:2202.07993 (replaced) [pdf, other]

Title: Planckian Jitter: countering the colorcrippling effects of color jitter on selfsupervised trainingAuthors: Simone Zini, Alex GomezVilla, Marco Buzzelli, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de WeijerComments: Accepted at Eleventh International Conference on Learning Representations (ICLR 2023)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [182] arXiv:2203.09346 (replaced) [pdf, other]

Title: Error estimates for physics informed neural networks approximating the NavierStokes equationsSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
 [183] arXiv:2204.01612 (replaced) [pdf, other]

Title: Neural Estimation of the RateDistortion Function With Applications to Operational Source CodingSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [184] arXiv:2204.12723 (replaced) [pdf, ps, other]

Title: Informationtheoretic limitations of databased price discriminationComments: In the new version, we have (1) added a simulation and empirical study and (2) fixed some minor issues and improved the claritySubjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT); Machine Learning (cs.LG); Econometrics (econ.EM); Theoretical Economics (econ.TH)
 [185] arXiv:2205.07999 (replaced) [pdf, other]

Title: An Exponentially Increasing Stepsize for Parameter Estimation in Statistical ModelsComments: 37 pages. The authors are listed in alphabetical orderSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
 [186] arXiv:2205.08459 (replaced) [pdf, other]

Title: Dynamic Recognition of Speakers for Consent Management by Contrastive Embedding ReplayComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. The current version includes 36 pages, 8 figures, and 3 tablesSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
 [187] arXiv:2205.13134 (replaced) [pdf, other]

Title: Symbolic Physics Learner: Discovering governing equations via Monte Carlo tree searchComments: 22 pagesJournalref: ICLR 2023Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Symbolic Computation (cs.SC); Chaotic Dynamics (nlin.CD); Computational Physics (physics.compph)
 [188] arXiv:2206.08657 (replaced) [pdf, other]

Title: BridgeTower: Building Bridges Between Encoders in VisionLanguage Representation LearningComments: Accepted by AAAI 2023, OralSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
 [189] arXiv:2207.12805 (replaced) [pdf, other]

Title: Neural Design for Genetic Perturbation ExperimentsComments: 22 pages main, 15 pages appendixSubjects: Quantitative Methods (qbio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Genomics (qbio.GN)
 [190] arXiv:2208.08609 (replaced) [pdf, other]

Title: A Scalable, Interpretable, Verifiable & Differentiable Logic Gate Convolutional Neural Network Architecture From Truth TablesSubjects: Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG); Symbolic Computation (cs.SC)
 [191] arXiv:2208.10833 (replaced) [pdf, other]

Title: LogLG: Weakly Supervised Log Anomaly Detection via LogEvent Graph ConstructionAuthors: Hongcheng Guo, Yuhui Guo, Jian Yang, Jiaheng Liu, Zhoujun Li, Tieqiao Zheng, Weichao Hou, Liangfan Zheng, Bo ZhangComments: 12 pagesSubjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [192] arXiv:2208.12232 (replaced) [pdf, other]

Title: A survey, review, and future trends of skin lesion segmentation and classificationComments: This manuscript has been accepted to be published in Computers in Biology and Medicine and has a total of 106 pages (single column and double spacing), 13 figures, and 11 tablesJournalref: Computers in biology and medicine (2023): 106624Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [193] arXiv:2209.07399 (replaced) [pdf, other]

Title: A Light Recipe to Train Robust Vision TransformersComments: Cameraready version for SaTML 2023, code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [194] arXiv:2209.10492 (replaced) [pdf, other]

Title: Summarization Programs: Interpretable Abstractive Summarization with Neural Modular TreesComments: ICLR 2023Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
 [195] arXiv:2209.13446 (replaced) [pdf, other]

Title: Stochastic Optimization for Counterfactual ExplanationsSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [196] arXiv:2211.09756 (replaced) [pdf, other]

Title: An Advantage Using Feature Selection with a Quantum AnnealerSubjects: Quantum Physics (quantph); Materials Science (condmat.mtrlsci); Machine Learning (cs.LG)
 [197] arXiv:2211.11259 (replaced) [pdf]

Title: From Traditional Adaptive Data Caching to Adaptive Context Caching: A SurveyAuthors: Shakthi Weerasinghe, Arkady Zaslavsky, Seng W. Loke, Alireza Hassani, Amin Abken, Alexey MedvedevSubjects: HumanComputer Interaction (cs.HC); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
 [198] arXiv:2211.13019 (replaced) [pdf, other]

Title: Safe Optimization of an Industrial Refrigeration Process Using an Adaptive and Explorative FrameworkComments: Under review for IFAC WC 2023. arXiv admin note: substantial text overlap with arXiv:2211.05495Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
 [199] arXiv:2212.14468 (replaced) [pdf, other]

Title: An Instrumental Variable Approach to Confounded OffPolicy EvaluationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
 [200] arXiv:2301.09633 (replaced) [pdf, other]

Title: PredictionPowered InferenceAuthors: Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, Tijana ZrnicComments: Code is available at this https URLSubjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (qbio.QM); Methodology (stat.ME)
 [201] arXiv:2301.11118 (replaced) [pdf, other]

Title: Box$^2$EL: Concept and Role Box Embeddings for the Description Logic EL++Comments: Corrected the GitHub URL and updated baselinesSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
 [202] arXiv:2301.12254 (replaced) [pdf, other]

Title: Combinatorial Inference on the Optimal Assortment in Multinomial Logit ModelsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [203] arXiv:2301.13418 (replaced) [pdf, other]

Title: BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete AnnotationsAuthors: Yuanhong Chen, Yuyuan Liu, Chong Wang, Michael Elliott, Chun Fung Kwok, Carlos PenaSolorzano, Yu Tian, Fengbei Liu, Helen Frazer, Davis J. McCarthy, Gustavo CarneiroComments: Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [204] arXiv:2302.00286 (replaced) [pdf, other]

Title: Jointist: Simultaneous Improvement of Multiinstrument Transcription and Music Source Separation via Joint TrainingAuthors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, JuChiang Wang, YunNing Hung, Dorien HerremansComments: arXiv admin note: text overlap with arXiv:2206.10805Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2302, contact, help (Access key information)