Machine Learning
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Wed, 1 Feb 23
 [1] arXiv:2301.13236 [pdf, other]

Title: SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree SearchComments: arXiv admin note: text overlap with arXiv:2209.13966Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Despite the popularity of policy gradient methods, they are known to suffer from large variance and high sample complexity. To mitigate this, we introduce SoftTreeMax  a generalization of softmax that takes planning into account. In SoftTreeMax, we extend the traditional logits with the multistep discounted cumulative reward, topped with the logits of future states. We consider two variants of SoftTreeMax, one for cumulative reward and one for exponentiated reward. For both, we analyze the gradient variance and reveal for the first time the role of a tree expansion policy in mitigating this variance. We prove that the resulting variance decays exponentially with the planning horizon as a function of the expansion policy. Specifically, we show that the closer the resulting state transitions are to uniform, the faster the decay. In a practical implementation, we utilize a parallelized GPUbased simulator for fast and efficient tree search. Our differentiable treebased policy leverages all gradients at the tree leaves in each environment step instead of the traditional singlesamplebased gradient. We then show in simulation how the variance of the gradient is reduced by three orders of magnitude, leading to better sample complexity compared to the standard policy gradient. On Atari, SoftTreeMax demonstrates up to 5x better performance in a faster run time compared to distributed PPO. Lastly, we demonstrate that high reward correlates with lower variance.
 [2] arXiv:2301.13247 [pdf, other]

Title: Online Loss Function LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Loss function learning is a new metalearning paradigm that aims to automate the essential task of designing a loss function for a machine learning model. Existing techniques for loss function learning have shown promising results, often improving a model's training dynamics and final inference performance. However, a significant limitation of these techniques is that the loss functions are metalearned in an offline fashion, where the metaobjective only considers the very first few steps of training, which is a significantly shorter time horizon than the one typically used for training deep neural networks. This causes significant bias towards loss functions that perform well at the very start of training but perform poorly at the end of training. To address this issue we propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters. The experimental results show that our proposed method consistently outperforms the crossentropy loss and offline loss function learning techniques on a diverse range of neural network architectures and datasets.
 [3] arXiv:2301.13271 [pdf, other]

Title: Probabilistic Neural Data Fusion for Learning from an Arbitrary Number of Multifidelity Data SetsSubjects: Machine Learning (cs.LG)
In many applications in engineering and sciences analysts have simultaneous access to multiple data sources. In such cases, the overall cost of acquiring information can be reduced via data fusion or multifidelity (MF) modeling where one leverages inexpensive lowfidelity (LF) sources to reduce the reliance on expensive highfidelity (HF) data. In this paper, we employ neural networks (NNs) for data fusion in scenarios where data is very scarce and obtained from an arbitrary number of sources with varying levels of fidelity and cost. We introduce a unique NN architecture that converts MF modeling into a nonlinear manifold learning problem. Our NN architecture inversely learns nontrivial (e.g., nonadditive and nonhierarchical) biases of the LF sources in an interpretable and visualizable manifold where each data source is encoded via a lowdimensional distribution. This probabilistic manifold quantifies model form uncertainties such that LF sources with small bias are encoded close to the HF source. Additionally, we endow the output of our NN with a parametric distribution not only to quantify aleatoric uncertainties, but also to reformulate the network's loss function based on strictly proper scoring rules which improve robustness and accuracy on unseen HF data. Through a set of analytic and engineering examples, we demonstrate that our approach provides a high predictive power while quantifying various sources uncertainties.
 [4] arXiv:2301.13273 [pdf, other]

Title: Near Optimal Private and Robust Linear RegressionSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Statistics Theory (math.ST); Machine Learning (stat.ML)
We study the canonical statistical estimation problem of linear regression from $n$ i.i.d.~examples under $(\varepsilon,\delta)$differential privacy when some response variables are adversarially corrupted. We propose a variant of the popular differentially private stochastic gradient descent (DPSGD) algorithm with two innovations: a fullbatch gradient descent to improve sample complexity and a novel adaptive clipping to guarantee robustness. When there is no adversarial corruption, this algorithm improves upon the existing stateoftheart approach and achieves a near optimal sample complexity. Under labelcorruption, this is the first efficient linear regression algorithm to guarantee both $(\varepsilon,\delta)$DP and robustness. Synthetic experiments confirm the superiority of our approach.
 [5] arXiv:2301.13287 [pdf, other]

Title: MILO: ModelAgnostic Subset Selection Framework for Efficient Model Training and TuningAuthors: Kirshnateja Killamsetty, Alexandre V. Evfimievski, Tejaswini Pedapati, Kiran Kate, Lucian Popa, Rishabh IyerSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Training deep networks and tuning hyperparameters on large datasets is computationally intensive. One of the primary research directions for efficient training is to reduce training costs by selecting wellgeneralizable subsets of training data. Compared to simple adaptive random subset selection baselines, existing intelligent subset selection approaches are not competitive due to the timeconsuming subset selection step, which involves computing modeldependent gradients and feature embeddings and applies greedy maximization of submodular objectives. Our key insight is that removing the reliance on downstream model parameters enables subset selection as a preprocessing step and enables one to train multiple models at no additional cost. In this work, we propose MILO, a modelagnostic subset selection framework that decouples the subset selection from model training while enabling superior model convergence and performance by using an easytohard curriculum. Our empirical results indicate that MILO can train models $3\times  10 \times$ faster and tune hyperparameters $20\times  75 \times$ faster than fulldataset training or tuning without compromising performance.
 [6] arXiv:2301.13289 [pdf, other]

Title: On the Statistical Benefits of Temporal Difference LearningComments: 26 pages, 7 figures, submitted to ICML 2023Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Given a dataset on actions and resulting longterm rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by minimizing the degree of temporal inconsistency between estimates made at successive timesteps. Focusing on finite state Markov chains, we provide a crisp asymptotic theory of the statistical advantages of this approach. First, we show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in meansquared error of value estimates. Depending on problem structure, the reduction could be enormous or nonexistent. Next, we prove that there can be dramatic improvements in estimates of the difference in valuetogo for two states: TD's errors are bounded in terms of a novel measure  the problem's trajectory crossing time  which can be much smaller than the problem's time horizon.
 [7] arXiv:2301.13293 [pdf, other]

Title: Sifer: Overcoming simplicity bias in deep networks using a feature sieveSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Simplicity bias is the concerning tendency of deep networks to overdepend on simple, weakly predictive features, to the exclusion of stronger, more complex features. This causes biased, incorrect model predictions in many realworld applications, exacerbated by incomplete training data containing spurious featurelabel correlations. We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the feature sieve. We aim to automatically identify and suppress easilycomputable spurious features in lower layers of the network, thereby allowing the higher network levels to extract and utilize richer, more meaningful representations. We provide concrete evidence of this differential suppression & enhancement of relevant features on both controlled datasets and realworld images, and report substantial gains on many realworld debiasing benchmarks (11.4% relative gain on ImagenetA; 3.2% on BAR, etc). Crucially, we outperform many baselines that incorporate knowledge about known spurious or biased attributes, despite our method not using any such information. We believe that our feature sieve work opens up exciting new research directions in automated adversarial feature extraction & representation learning for deep networks.
 [8] arXiv:2301.13304 [pdf, other]

Title: Understanding SelfDistillation in the Presence of Label NoiseSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Selfdistillation (SD) is the process of first training a \enquote{teacher} model and then using its predictions to train a \enquote{student} model with the \textit{same} architecture. Specifically, the student's objective function is $\big(\xi*\ell(\text{teacher's predictions}, \text{ student's predictions}) + (1\xi)*\ell(\text{given labels}, \text{ student's predictions})\big)$, where $\ell$ is some loss function and $\xi$ is some parameter $\in [0,1]$. Empirically, SD has been observed to provide performance gains in several settings. In this paper, we theoretically characterize the effect of SD in two supervised learning problems with \textit{noisy labels}. We first analyze SD for regularized linear regression and show that in the high label noise regime, the optimal value of $\xi$ that minimizes the expected error in estimating the ground truth parameter is surprisingly greater than 1. Empirically, we show that $\xi > 1$ works better than $\xi \leq 1$ even with the crossentropy loss for several classification datasets when 50\% or 30\% of the labels are corrupted. Further, we quantify when optimal SD is better than optimal regularization. Next, we analyze SD in the case of logistic regression for binary classification with random label corruption and quantify the range of label corruption in which the student outperforms the teacher in terms of accuracy. To our knowledge, this is the first result of its kind for the crossentropy loss.
 [9] arXiv:2301.13310 [pdf, other]

Title: Alternating Updates for Efficient TransformersSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
It is well established that increasing scale in deep transformer networks leads to improved quality and performance. This increase in scale often comes with an increase in compute cost and inference latency. Consequently, research into methods which help realize the benefits of increased scale without leading to an increase in the compute cost becomes important. We introduce Alternating Updates (AltUp), a simpletoimplement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation without increasing the computation time by working on a subblock of the representation at each layer. Our experiments on various transformer models and language tasks demonstrate the consistent effectiveness of alternating updates on a diverse set of benchmarks. Finally, we present extensions of AltUp to the sequence dimension, and demonstrate how AltUp can be synergistically combined with existing approaches, such as Sparse MixtureofExperts models, to obtain efficient models with even higher capacity.
 [10] arXiv:2301.13313 [pdf, other]

Title: Incorporating Recurrent Reinforcement Learning into Model Predictive Control for Adaptive Control in Autonomous DrivingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Model Predictive Control (MPC) is attracting tremendous attention in the autonomous driving task as a powerful control technique. The success of an MPC controller strongly depends on an accurate internal dynamics model. However, the static parameters, usually learned by system identification, often fail to adapt to both internal and external perturbations in realworld scenarios. In this paper, we firstly (1) reformulate the problem as a Partially Observed Markov Decision Process (POMDP) that absorbs the uncertainties into observations and maintains Markov property into hidden states; and (2) learn a recurrent policy continually adapting the parameters of the dynamics model via Recurrent Reinforcement Learning (RRL) for optimal and adaptive control; and (3) finally evaluate the proposed algorithm (referred as $\textit{MPCRRL}$) in CARLA simulator and leading to robust behaviours under a wide range of perturbations.
 [11] arXiv:2301.13318 [pdf, other]

Title: Proxybased ZeroShot Entity Linking by Effective Candidate RetrievalSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
A recent advancement in the domain of biomedical Entity Linking is the development of powerful twostage algorithms, an initial candidate retrieval stage that generates a shortlist of entities for each mention, followed by a candidate ranking stage. However, the effectiveness of both stages are inextricably dependent on computationally expensive components. Specifically, in candidate retrieval via dense representation retrieval it is important to have hard negative samples, which require repeated forward passes and nearest neighbour searches across the entire entity label set throughout training. In this work, we show that pairing a proxybased metric learning loss with an adversarial regularizer provides an efficient alternative to hard negative sampling in the candidate retrieval stage. In particular, we show competitive performance on the recall@1 metric, thereby providing the option to leave out the expensive candidate ranking step. Finally, we demonstrate how the model can be used in a zeroshot setting to discover out of knowledge base biomedical entities.
 [12] arXiv:2301.13323 [pdf, other]

Title: Fairness and Accuracy under Domain GeneralizationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
As machine learning (ML) algorithms are increasingly used in highstakes applications, concerns have arisen that they may be biased against certain social groups. Although many approaches have been proposed to make ML models fair, they typically rely on the assumption that data distributions in training and deployment are identical. Unfortunately, this is commonly violated in practice and a model that is fair during training may lead to an unexpected outcome during its deployment. Although the problem of designing robust ML models under dataset shifts has been widely studied, most existing works focus only on the transfer of accuracy. In this paper, we study the transfer of both fairness and accuracy under domain generalization where the data at test time may be sampled from neverbeforeseen domains. We first develop theoretical bounds on the unfairness and expected loss at deployment, and then derive sufficient conditions under which fairness and accuracy can be perfectly transferred via invariant representation learning. Guided by this, we design a learning algorithm such that fair ML models learned with training data still have high fairness and accuracy when deployment environments change. Experiments on realworld data validate the proposed algorithm. Model implementation is available at https://github.com/pth1993/FATDM.
 [13] arXiv:2301.13324 [pdf, other]

Title: V2N Service Scaling with Deep Reinforcement LearningComments: Accepted at the 36th IEEE/IFIP Network Operations and Management Symposium (NOMS 2023)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The fifth generation (5G) of wireless networks is set out to meet the stringent requirements of vehicular use cases. Edge computing resources can aid in this direction by moving processing closer to endusers, reducing latency. However, given the stochastic nature of traffic loads and availability of physical resources, appropriate autoscaling mechanisms need to be employed to support costefficient and performant services. To this end, we employ Deep Reinforcement Learning (DRL) for vertical scaling in Edge computing to support vehiculartonetwork communications. We address the problem using Deep Deterministic Policy Gradient (DDPG). As DDPG is a modelfree offpolicy algorithm for learning continuous actions, we introduce a discretization approach to support discrete scaling actions. Thus we address scalability problems inherent to highdimensional discrete action spaces. Employing a realworld vehicular trace data set, we show that DDPG outperforms existing solutions, reducing (at minimum) the average number of active CPUs by 23% while increasing the longterm reward by 24%.
 [14] arXiv:2301.13326 [pdf, other]

Title: A Framework for Adapting Offline Algorithms to Solve Combinatorial MultiArmed Bandit Problems with Bandit FeedbackSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Systems and Control (eess.SY)
We investigate the problem of stochastic, combinatorial multiarmed bandits where the learner only has access to bandit feedback and the reward function can be nonlinear. We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$regret dependence on the horizon $T$. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm  the offline algorithm can be used as black box subroutine.
To demonstrate the utility of the proposed framework, the proposed framework is applied to multiple problems in submodular maximization, adapting approximation algorithms for cardinality and for knapsack constraints. The new CMAB algorithms for knapsack constraints outperform a fullbandit method developed for the adversarial setting in experiments with realworld data.  [15] arXiv:2301.13330 [pdf, other]

Title: Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energyefficient InferenceAuthors: Deepika Bablani, Jeffrey L. Mckinstry, Steven K. Esser, Rathinakumar Appuswamy, Dharmendra S. ModhaSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
For effective and efficient deep neural network inference, it is desirable to achieve stateoftheart accuracy with the simplest networks requiring the least computation, memory, and power. Quantizing networks to lower precision is a powerful technique for simplifying networks. It is generally desirable to quantize as aggressively as possible without incurring significant accuracy degradation. As each layer of a network may have different sensitivity to quantization, mixed precision quantization methods selectively tune the precision of individual layers of a network to achieve a minimum drop in task performance (e.g., accuracy). To estimate the impact of layer precision choice on task performance two methods are introduced: i) Entropy Approximation Guided Layer selection (EAGL) is fast and uses the entropy of the weight distribution, and ii) Accuracyaware Layer Precision Selection (ALPS) is straightforward and relies on single epoch finetuning after layer precision reduction. Using EAGL and ALPS for layer precision selection, fullprecision accuracy is recovered with a mix of 4bit and 2bit layers for ResNet50 and ResNet101 classification networks, demonstrating improved performance across the entire accuracythroughput frontier, and equivalent performance for the PSPNet segmentation network in our own commensurate comparison over leading mixed precision layer selection techniques, while requiring orders of magnitude less compute time to reach a solution.
 [16] arXiv:2301.13336 [pdf, other]

Title: The Fair Value of Data Under Heterogeneous Privacy ConstraintsComments: 29 pages, 5 figuresSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT)
Modern data aggregation often takes the form of a platform collecting data from a network of users. More than ever, these users are now requesting that the data they provide is protected with a guarantee of privacy. This has led to the study of optimal data acquisition frameworks, where the optimality criterion is typically the maximization of utility for the agent trying to acquire the data. This involves determining how to allocate payments to users for the purchase of their data at various privacy levels. The main goal of this paper is to characterize a fair amount to pay users for their data at a given privacy level. We propose an axiomatic definition of fairness, analogous to the celebrated Shapley value. Two concepts for fairness are introduced. The first treats the platform and users as members of a common coalition and provides a complete description of how to divide the utility among the platform and users. In the second concept, fairness is defined only among users, leading to a potential fairnessconstrained mechanism design problem for the platform. We consider explicit examples involving private heterogeneous data and show how these notions of fairness can be applied. To the best of our knowledge, these are the first fairness concepts for data that explicitly consider privacy constraints.
 [17] arXiv:2301.13338 [pdf, other]

Title: Continuous Spatiotemporal TransformersSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Modeling spatiotemporal dynamical systems is a fundamental challenge in machine learning. Transformer models have been very successful in NLP and computer vision where they provide interpretable representations of data. However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for the modeling of continuous systems. This new framework guarantees a continuous and smooth output via optimization in Sobolev space. We benchmark CST against traditional transformers as well as other spatiotemporal dynamics modeling methods and achieve superior performance in a number of tasks on synthetic and real systems, including learning brain dynamics from calcium imaging data.
 [18] arXiv:2301.13340 [pdf, other]

Title: Affinity Uncertaintybased Hard Negative Mining in Graph Contrastive LearningComments: 11 pages, 4 figuresSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Hard negative mining has shown effective in enhancing selfsupervised contrastive learning (CL) on diverse data types, including graph contrastive learning (GCL). Existing hardnessaware CL methods typically treat negative instances that are most similar to the anchor instance as hard negatives, which helps improve the CL performance, especially on image data. However, this approach often fails to identify the hard negatives but leads to many false negatives on graph data. This is mainly due to that the learned graph representations are not sufficiently discriminative due to oversmooth representations and/or noni.i.d. issues in graph data. To tackle this problem, this paper proposes a novel approach that builds a discriminative model on collective affinity information (i.e, two sets of pairwise affinities between the negative instances and the anchor instance) to mine hard negatives in GCL. In particular, the proposed approach evaluates how confident/uncertain the discriminative model is about the affinity of each negative instance to an anchor instance to determine its hardness weight relative to the anchor instance. This uncertainty information is then incorporated into existing GCL loss functions via a weighting term to enhance their performance. The enhanced GCL is theoretically grounded that the resulting GCL loss is equivalent to a triplet loss with an adaptive margin being exponentially proportional to the learned uncertainty of each negative instance. Extensive experiments on 10 graph datasets show that our approach i) consistently enhances different stateoftheart GCL methods in both graph and node classification tasks, and ii) significantly improves their robustness against adversarial attacks.
 [19] arXiv:2301.13343 [pdf, other]

Title: FewShot ImagetoSemantics Translation for Policy Transfer in Reinforcement LearningComments: The 2022 International Joint Conference on Neural Networks (IJCNN2022)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
We investigate policy transfer using imagetosemantics translation to mitigate learning difficulties in visionbased robotics control agents. This problem assumes two environments: a simulator environment with semantics, that is, lowdimensional and essential information, as the state space, and a realworld environment with images as the state space. By learning mapping from images to semantics, we can transfer a policy, pretrained in the simulator, to the real world, thereby eliminating realworld onpolicy agent interactions to learn, which are costly and risky. In addition, using imagetosemantics mapping is advantageous in terms of the computational efficiency to train the policy and the interpretability of the obtained policy over other types of simtoreal transfer strategies. To tackle the main difficulty in learning imagetosemantics mapping, namely the human annotation cost for producing a training dataset, we propose two techniques: pair augmentation with the transition function in the simulator environment and active learning. We observed a reduction in the annotation cost without a decline in the performance of the transfer, and the proposed approach outperformed the existing approach without annotation.
 [20] arXiv:2301.13349 [pdf, other]

Title: Unconstrained Dynamic Regret via Sparse CodingSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
Motivated by time series forecasting, we study Online Linear Optimization (OLO) under the coupling of two problem structures: the domain is unbounded, and the performance of an algorithm is measured by its dynamic regret. Handling either of them requires the regret bound to depend on certain complexity measure of the comparator sequence  specifically, the comparator norm in unconstrained OLO, and the path length in dynamic regret. In contrast to a recent work (Jacobsen & Cutkosky, 2022) that adapts to the combination of these two complexity measures, we propose an alternative complexity measure by recasting the problem into sparse coding. Adaptivity can be achieved by a simple modular framework, which naturally exploits more intricate prior knowledge of the environment. Along the way, we also present a new gradient adaptive algorithm for static unconstrained OLO, designed using novel continuous time machinery. This could be of independent interest.
 [21] arXiv:2301.13362 [pdf, other]

Title: Optimizing DDPM Sampling with Shortcut FineTuningSubjects: Machine Learning (cs.LG)
In this study, we propose Shortcut Finetuning (SFT), a new approach for addressing the challenge of fast sampling of pretrained Denoising Diffusion Probabilistic Models (DDPMs). SFT advocates for the finetuning of DDPM samplers through the direct minimization of Integral Probability Metrics (IPM), instead of learning the backward diffusion process. This enables samplers to discover an alternative and more efficient sampling shortcut, deviating from the backward diffusion process. We also propose a new algorithm that is similar to the policy gradient method for finetuning DDPMs by proving that under certain assumptions, the gradient descent of diffusion models is equivalent to the policy gradient approach. Through empirical evaluation, we demonstrate that our finetuning method can further enhance existing fast DDPM samplers, resulting in sample quality comparable to or even surpassing that of the fullstep model across various datasets.
 [22] arXiv:2301.13370 [pdf, other]

Title: On the Correctness of Automatic Differentiation for Neural Networks with MachineRepresentable ParametersSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Recent work has shown that automatic differentiation over the reals is almost always correct in a mathematically precise sense. However, actual programs work with machinerepresentable numbers (e.g., floatingpoint numbers), not reals. In this paper, we study the correctness of automatic differentiation when the parameter space of a neural network consists solely of machinerepresentable numbers. For a neural network with bias parameters, we prove that automatic differentiation is correct at all parameters where the network is differentiable. In contrast, it is incorrect at all parameters where the network is nondifferentiable, since it never informs nondifferentiability. To better understand this nondifferentiable set of parameters, we prove a tight bound on its size, which is linear in the number of nondifferentiabilities in activation functions, and provide a simple necessary and sufficient condition for a parameter to be in this set. We further prove that automatic differentiation always computes a Clarke subderivative, even on the nondifferentiable set. We also extend these results to neural networks possibly without bias parameters.
 [23] arXiv:2301.13375 [pdf, other]

Title: Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness GuaranteesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Robustness and safety are critical for the trustworthy deployment of deep reinforcement learning in realworld decision making applications. In particular, we require algorithms that can guarantee robust, safe performance in the presence of general environment disturbances, while making limited assumptions on the data collection process during training. In this work, we propose a safe reinforcement learning framework with robustness guarantees through the use of an optimal transport cost uncertainty set. We provide an efficient, theoretically supported implementation based on Optimal Transport Perturbations, which can be applied in a completely offline fashion using only data collected in a nominal training environment. We demonstrate the robust, safe performance of our approach on a variety of continuous control tasks with safety constraints in the RealWorld Reinforcement Learning Suite.
 [24] arXiv:2301.13376 [pdf, other]

Title: Quantized Neural Networks for LowPrecision Accumulation with Guaranteed Overflow AvoidanceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)
We introduce a quantizationaware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We leverage weight normalization as a means of constraining parameters during training using accumulator bit width bounds that we derive. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floatingpoint baseline. We then show that this reduction translates to increased design efficiency for custom FPGAbased accelerators. Finally, we show that our algorithm not only constrains weights to fit into an accumulator of userdefined bit width, but also increases the sparsity and compressibility of the resulting weights. Across all of our benchmark models trained with 8bit weights and activations, we observe that constraining the hidden layers of quantized neural networks to fit into 16bit accumulators yields an average 98.2% sparsity with an estimated compression rate of 46.5x all while maintaining 99.2% of the floatingpoint performance.
 [25] arXiv:2301.13381 [pdf, other]

Title: When SourceFree Domain Adaptation Meets Learning with Noisy LabelsAuthors: Li Yi, Gezheng Xu, Pengcheng Xu, Jiaqi Li, Ruizhi Pu, Charles Ling, A. Ian McLeod, Boyu WangComments: 33 pages, 16 figures, accepted by ICLR 2023Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Recent stateoftheart sourcefree domain adaptation (SFDA) methods have focused on learning meaningful cluster structures in the feature space, which have succeeded in adapting the knowledge from source domain to unlabeled target domain without accessing the private source data. However, existing methods rely on the pseudolabels generated by source models that can be noisy due to domain shift. In this paper, we study SFDA from the perspective of learning with label noise (LLN). Unlike the label noise in the conventional LLN scenario, we prove that the label noise in SFDA follows a different distribution assumption. We also prove that such a difference makes existing LLN methods that rely on their distribution assumptions unable to address the label noise in SFDA. Empirical evidence suggests that only marginal improvements are achieved when applying the existing LLN methods to solve the SFDA problem. On the other hand, although there exists a fundamental difference between the label noise in the two scenarios, we demonstrate theoretically that the earlytime training phenomenon (ETP), which has been previously observed in conventional label noise settings, can also be observed in the SFDA problem. Extensive experiments demonstrate significant improvements to existing SFDA algorithms by leveraging ETP to address the label noise in SFDA.
 [26] arXiv:2301.13389 [pdf, other]

Title: Differentially Private Kernel Inducing Points (DPKIP) for Privacypreserving Data DistillationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
While it is tempting to believe that data distillation preserves privacy, distilled data's empirical robustness against known attacks does not imply a provable privacy guarantee. Here, we develop a provably privacypreserving data distillation algorithm, called differentially private kernel inducing points (DPKIP). DPKIP is an instantiation of DPSGD on kernel ridge regression (KRR). Following a recent work, we use neural tangent kernels and minimize the KRR loss to estimate the distilled datapoints (i.e., kernel inducing points). We provide a computationally efficient JAX implementation of DPKIP, which we test on several popular image and tabular datasets to show its efficacy in data distillation with differential privacy guarantees.
 [27] arXiv:2301.13392 [pdf, other]

Title: Combinatorial Causal Bandits without Graph SkeletonComments: 36 pages, 2 figuresSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
In combinatorial causal bandits (CCB), the learning agent chooses a subset of variables in each round to intervene and collects feedback from the observed variables to minimize expected regret or sample complexity. Previous works study this problem in both general causal models and binary generalized linear models (BGLMs). However, all of them require prior knowledge of causal graph structure. This paper studies the CCB problem without the graph structure on binary general causal models and BGLMs. We first provide an exponential lower bound of cumulative regrets for the CCB problem on general causal models. To overcome the exponentially large space of parameters, we then consider the CCB problem on BGLMs. We design a regret minimization algorithm for BGLMs even without the graph skeleton and show that it still achieves $O(\sqrt{T}\ln T)$ expected regret. This asymptotic regret is the same as the stateofart algorithms relying on the graph structure. Moreover, we sacrifice the regret to $O(T^{\frac{2}{3}}\ln T)$ to remove the weight gap covered by the asymptotic notation. At last, we give some discussions and algorithms for pure exploration of the CCB problem without the graph structure.
 [28] arXiv:2301.13393 [pdf, other]

Title: Probably AnytimeSafe Stochastic Combinatorial SemiBanditsComments: 56 pages, 5 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)
Motivated by concerns about making online decisions that incur undue amount of risk at each time step, in this paper, we formulate the probably anytimesafe stochastic combinatorial semibandits problem. In this problem, the agent is given the option to select a subset of size at most $K$ from a set of $L$ ground items. Each item is associated to a certain mean reward as well as a variance that represents its risk. To mitigate the risk that the agent incurs, we require that with probability at least $1\delta$, over the entire horizon of time $T$, each of the choices that the agent makes should contain items whose sum of variances does not exceed a certain variance budget. We call this probably anytimesafe constraint. Under this constraint, we design and analyze an algorithm {\sc PASCombUCB} that minimizes the regret over the horizon of time $T$. By developing accompanying informationtheoretic lower bounds, we show under both the problemdependent and problemindependent paradigms, {\sc PASCombUCB} is almost asymptotically optimal. Our problem setup, the proposed {\sc PASCombUCB} algorithm, and novel analyses are applicable to domains such as recommendation systems and transportation in which an agent is allowed to choose multiple items at a single time step and wishes to control the risk over the whole time horizon.
 [29] arXiv:2301.13395 [pdf, other]

Title: Faster PredictandOptimize with ThreeOperator SplittingSubjects: Machine Learning (cs.LG)
In many practical settings, a combinatorial problem must be repeatedly solved with similar, but distinct parameters w. Yet, w is not directly observed; only contextual data d that correlates with w is available. It is tempting to use a neural network to predict w given d, but training such a model requires reconciling the discrete nature of combinatorial optimization with the gradientbased frameworks used to train neural networks. One approach to overcoming this issue is to consider a continuous relaxation of the combinatorial problem.
While existing such approaches have shown to be highly effective on small problems (10100 variables) they do not scale well to large problems. In this work, we show how recent results in operator splitting can be used to design such a system which is easy to train and scales effortlessly to problems with thousands of variables.  [30] arXiv:2301.13397 [pdf, other]

Title: Sequential Strategic ScreeningSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
We initiate the study of strategic behavior in screening processes with multiple classifiers. We focus on two contrasting settings: a conjunctive setting in which an individual must satisfy all classifiers simultaneously, and a sequential setting in which an individual to succeed must satisfy classifiers one at a time. In other words, we introduce the combination of strategic classification with screening processes.
We show that sequential screening pipelines exhibit new and surprising behavior where individuals can exploit the sequential ordering of the tests to zigzag between classifiers without having to simultaneously satisfy all of them. We demonstrate an individual can obtain a positive outcome using a limited manipulation budget even when far from the intersection of the positive regions of every classifier. Finally, we consider a learner whose goal is to design a sequential screening process that is robust to such manipulations, and provide a construction for the learner that optimizes a natural objective.  [31] arXiv:2301.13401 [pdf, other]

Title: Classified as unknown: A novel Bayesian neural networkComments: 12 pages, 12 figuresSubjects: Machine Learning (cs.LG); Applications (stat.AP)
We establish estimations for the parameters of the output distribution for the softmax activation function using the probit function. As an application, we develop a new efficient Bayesian learning algorithm for fully connected neural networks, where training and predictions are performed within the Bayesian inference framework in closedform. This approach allows sequential learning and requires no computationally expensive gradient calculation and Monte Carlo sampling. Our work generalizes the Bayesian algorithm for a single perceptron for binary classification in \cite{H} to multilayer perceptrons for multiclass classification.
 [32] arXiv:2301.13420 [pdf, other]

Title: Superhuman FairnessSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The fairness of machine learningbased decisions has become an increasingly important focus in the design of supervised machine learning methods. Most fairness approaches optimize a specified tradeoff between performance measure(s) (e.g., accuracy, log loss, or AUC) and fairness metric(s) (e.g., demographic parity, equalized odds). This begs the question: are the right performancefairness tradeoffs being specified? We instead recast fair machine learning as an imitation learning task by introducing superhuman fairness, which seeks to simultaneously outperform human decisions on multiple predictive performance and fairness measures. We demonstrate the benefits of this approach given suboptimal decisions.
 [33] arXiv:2301.13441 [pdf, other]

Title: CMLCompiler: A Unified Compiler for Classical Machine LearningSubjects: Machine Learning (cs.LG)
Classical machine learning (CML) occupies nearly half of machine learning pipelines in production applications. Unfortunately, it fails to utilize the stateofthepractice devices fully and performs poorly. Without a unified framework, the hybrid deployments of deep learning (DL) and CML also suffer from severe performance and portability issues. This paper presents the design of a unified compiler, called CMLCompiler, for CML inference. We propose two unified abstractions: operator representations and extended computational graphs. The CMLCompiler framework performs the conversion and graph optimization based on two unified abstractions, then outputs an optimized computational graph to DL compilers or frameworks. We implement CMLCompiler on TVM. The evaluation shows CMLCompiler's portability and superior performance. It achieves up to 4.38x speedup on CPU, 3.31x speedup on GPU, and 5.09x speedup on IoT devices, compared to the stateoftheart solutions  scikitlearn, intel sklearn, and hummingbird. Our performance of CML and DL mixed pipelines achieves up to 3.04x speedup compared with crossframework implementations.
 [34] arXiv:2301.13442 [pdf, other]

Title: Scaling laws for singleagent reinforcement learningComments: 33 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Recent work has shown that, in generative modeling, crossentropy loss improves smoothly with model size and training compute, following a power law plus constant scaling law. One challenge in extending these results to reinforcement learning is that the main performance objective of interest, mean episode return, need not vary smoothly. To overcome this, we introduce *intrinsic performance*, a monotonic function of the return defined as the minimum compute required to achieve the given return across a family of models of different sizes. We find that, across a range of environments, intrinsic performance scales as a power law in model size and environment interactions. Consequently, as in generative modeling, the optimal model size scales as a power law in the training compute budget. Furthermore, we study how this relationship varies with the environment and with other properties of the training setup. In particular, using a toy MNISTbased environment, we show that varying the "horizon length" of the task mostly changes the coefficient but not the exponent of this relationship.
 [35] arXiv:2301.13443 [pdf, other]

Title: Retiring $Δ$DP: New DistributionLevel Metrics for Demographic ParityComments: Under reviewSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Demographic parity is the most widely recognized measure of group fairness in machine learning, which ensures equal treatment of different demographic groups. Numerous works aim to achieve demographic parity by pursuing the commonly used metric $\Delta DP$. Unfortunately, in this paper, we reveal that the fairness metric $\Delta DP$ can not precisely measure the violation of demographic parity, because it inherently has the following drawbacks: \textit{i)} zerovalue $\Delta DP$ does not guarantee zero violation of demographic parity, \textit{ii)} $\Delta DP$ values can vary with different classification thresholds. To this end, we propose two new fairness metrics, \textsf{A}rea \textsf{B}etween \textsf{P}robability density function \textsf{C}urves (\textsf{ABPC}) and \textsf{A}rea \textsf{B}etween \textsf{C}umulative density function \textsf{C}urves (\textsf{ABCC}), to precisely measure the violation of demographic parity in distribution level. The new fairness metrics directly measure the difference between the distributions of the prediction probability for different demographic groups. Thus our proposed new metrics enjoy: \textit{i)} zerovalue \textsf{ABCC}/\textsf{ABPC} guarantees zero violation of demographic parity; \textit{ii)} \textsf{ABCC}/\textsf{ABPC} guarantees demographic parity while the classification threshold adjusted. We further reevaluate the existing fair models with our proposed fairness metrics and observe different fairness behaviors of those models under the new metrics.
 [36] arXiv:2301.13446 [pdf, ps, other]

Title: Sharp VarianceDependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic EnvironmentsComments: 43 pages, 1 figureSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study variancedependent regret bounds for Markov decision processes (MDPs). Algorithms with variancedependent regret guarantees can automatically exploit environments with low variance (e.g., enjoying constant regret on deterministic MDPs). The existing algorithms are either varianceindependent or suboptimal. We first propose two new environment norms to characterize the finegrained variance properties of the environment. For modelbased methods, we design a variant of the MVP algorithm (Zhang et al., 2021a) and use new analysis techniques show to this algorithm enjoys variancedependent bounds with respect to our proposed norms. In particular, this bound is simultaneously minimax optimal for both stochastic and deterministic MDPs, the first result of its kind. We further initiate the study on modelfree algorithms with variancedependent regret bounds by designing a referencefunctionbased algorithm with a novel cappeddoubling reference update schedule. Lastly, we also provide lower bounds to complement our upper bounds.
 [37] arXiv:2301.13464 [pdf, other]

Title: Training with MixedPrecision FloatingPoint AssignmentsSubjects: Machine Learning (cs.LG)
When training deep neural networks, keeping all tensors in high precision (e.g., 32bit or even 16bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8bit floats) can lead to unacceptable accuracy loss. Hence, it is important to use a precision assignment  a mapping from all tensors (arising in training) to precision levels (high or low)  that keeps most of the tensors in low precision and leads to sufficiently accurate models. We provide a technique that explores this memoryaccuracy tradeoff by generating precision assignments that (i) use less memory and (ii) lead to more accurate models at the same time, compared to the precision assignments considered by prior work in lowprecision floatingpoint training. Our method typically provides > 2x memory reduction over a baseline precision assignment while preserving training accuracy, and gives further reductions by trading off accuracy. Compared to other baselines which sometimes cause training to diverge, our method provides similar or better memory reduction while avoiding divergence.
 [38] arXiv:2301.13465 [pdf, other]

Title: GDOD: Effective Gradient Descent using Orthogonal Decomposition for MultiTask LearningAuthors: Xin Dong, Ruize Wu, Chao Xiong, Hai Li, Lei Cheng, Yong He, Shiyou Qian, Jian Cao, Linjian MoJournalref: Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022: 386395Subjects: Machine Learning (cs.LG)
Multitask learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into taskshared and taskconflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the taskshared components. Moreover, we prove the convergence of GDOD theoretically under both convex and nonconvex assumptions. Experiment results on several multitask datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms stateoftheart optimization methods in terms of AUC and Logloss metrics.
 [39] arXiv:2301.13492 [pdf, other]

Title: CompanyasTribe: Company Financial Risk Assessment on TribeStyle Graph with Hierarchical Graph Neural NetworksComments: accepted by SIGKDD2022Subjects: Machine Learning (cs.LG)
Company financial risk is ubiquitous and early risk assessment for listed companies can avoid considerable losses. Traditional methods mainly focus on the financial statements of companies and lack the complex relationships among them. However, the financial statements are often biased and lagged, making it difficult to identify risks accurately and timely. To address the challenges, we redefine the problem as \textbf{company financial risk assessment on tribestyle graph} by taking each listed company and its shareholders as a tribe and leveraging financial news to build intertribe connections. Such tribestyle graphs present different patterns to distinguish risky companies from normal ones. However, most nodes in the tribestyle graph lack attributes, making it difficult to directly adopt existing graph learning methods (e.g., Graph Neural Networks(GNNs)). In this paper, we propose a novel Hierarchical Graph Neural Network (THGNN) for Tribestyle graphs via two levels, with the first level to encode the structure pattern of the tribes with contrastive learning, and the second level to diffuse information based on the intertribe relations, achieving effective and efficient risk assessment. Extensive experiments on the realworld company dataset show that our method achieves significant improvements on financial risk assessment over previous competing methods. Also, the extensive ablation studies and visualization comprehensively show the effectiveness of our method.
 [40] arXiv:2301.13501 [pdf, other]

Title: Auxiliary Learning as an Asymmetric Bargaining GameSubjects: Machine Learning (cs.LG)
Auxiliary learning is an effective method for enhancing the generalization capabilities of trained models, particularly when dealing with small datasets. However, this approach may present several difficulties: (i) optimizing multiple objectives can be more challenging, and (ii) how to balance the auxiliary tasks to best assist the main task is unclear. In this work, we propose a novel approach, named AuxiNash, for balancing tasks in auxiliary learning by formalizing the problem as generalized bargaining game with asymmetric task bargaining power. Furthermore, we describe an efficient procedure for learning the bargaining power of tasks based on their contribution to the performance of the main task and derive theoretical guarantees for its convergence. Finally, we evaluate AuxiNash on multiple multitask benchmarks and find that it consistently outperforms competing methods.
 [41] arXiv:2301.13516 [pdf, other]

Title: Recurrences reveal shared causal drivers of complex time seriesAuthors: William GilpinComments: 8 pages, 5 figuresSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Chaotic Dynamics (nlin.CD)
Many experimental time series measurements share an unobserved causal driver. Examples include genes targeted by transcription factors, ocean flows influenced by largescale atmospheric currents, and motor circuits steered by descending neurons. Reliably inferring this unseen driving force is necessary to understand the intermittent nature of topdown control schemes in diverse biological and engineered systems. Here, we introduce a new unsupervised learning algorithm that uses recurrences in time series measurements to gradually reconstruct an unobserved driving signal. Drawing on the mathematical theory of skewproduct dynamical systems, we identify recurrence events shared across response time series, which implicitly define a recurrence graph with glasslike structure. As the amount or quality of observed data improves, this recurrence graph undergoes a percolation transition manifesting as weak ergodicity breaking for random walks on the induced landscape  revealing the shared driver's dynamics, even in the presence of strongly corrupted or noisy measurements. Across several thousand random dynamical systems, we empirically quantify the dependence of reconstruction accuracy on the rate of information transfer from a chaotic driver to the response systems, and we find that effective reconstruction proceeds through gradual approximation of the driver's dominant unstable periodic orbits. Through extensive benchmarks against classical and neuralnetworkbased signal processing techniques, we demonstrate our method's strong ability to extract causal driving signals from diverse realworld datasets spanning neuroscience, genomics, fluid dynamics, and physiology.
 [42] arXiv:2301.13527 [pdf, other]

Title: RealTime Outlier Detection with Dynamic Process LimitsComments: 7 pages, 4 figures, 24th International Conference on Process ControlSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Anomaly detection methods are part of the systems where rare events may endanger an operation's profitability, safety, and environmental aspects. Although many stateoftheart anomaly detection methods were developed to date, their deployment is limited to the operation conditions present during the model training. Online anomaly detection brings the capability to adapt to data drifts and change points that may not be represented during model development resulting in prolonged service life. This paper proposes an online anomaly detection algorithm for existing realtime infrastructures where lowlatency detection is required and novel patterns in data occur unpredictably. The online inverse cumulative distributionbased approach is introduced to eliminate common problems of offline anomaly detectors, meanwhile providing dynamic process limits to normal operation. The benefit of the proposed method is the ease of use, fast computation, and deployability as shown in two case studies of real microgrid operation data.
 [43] arXiv:2301.13530 [pdf, other]

Title: DomainGeneralizable MultipleDomain ClusteringComments: 12 pages, 5 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Accurately clustering highdimensional measurements is vital for adequately analyzing scientific data. Deep learning machinery has remarkably improved clustering capabilities in recent years due to its ability to extract meaningful representations. In this work, we are given unlabeled samples from multiple source domains, and we aim to learn a shared classifier that assigns the examples to various clusters. Evaluation is done by using the classifier for predicting cluster assignments in a previously unseen domain. This setting generalizes the problem of unsupervised domain generalization to the case in which no supervised learning samples are given (completely unsupervised). Towards this goal, we present an endtoend model and evaluate its capabilities on several multidomain image datasets. Specifically, we demonstrate that our model is more accurate than schemes that require finetuning using samples from the target domain or some level of supervision.
 [44] arXiv:2301.13565 [pdf, other]

Title: Learning Against Distributional Uncertainty: On the Tradeoff Between Robustness and SpecificityComments: 23 Pages, 3 FiguresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Trustworthy machine learning aims at combating distributional uncertainties in training data distributions compared to population distributions. Typical treatment frameworks include the Bayesian approach, (minmax) distributionally robust optimization (DRO), and regularization. However, two issues have to be raised: 1) All these methods are biased estimators of the true optimal cost; 2) the prior distribution in the Bayesian method, the radius of the distributional ball in the DRO method, and the regularizer in the regularization method are difficult to specify. This paper studies a new framework that unifies the three approaches and that addresses the two challenges mentioned above. The asymptotic properties (e.g., consistency and asymptotic normalities), nonasymptotic properties (e.g., unbiasedness and generalization error bound), and a MonteCarlobased solution method of the proposed model are studied. The new model reveals the tradeoff between the robustness to the unseen data and the specificity to the training data.
 [45] arXiv:2301.13572 [pdf, other]

Title: BALANCE: Bayesian Linear Attribution for Root Cause LocalizationAuthors: Chaoyu Chen, Hang Yu, Zhichao Lei, Jianguo Li, Shaokang Ren, Tingkai Zhang, Silin Hu, Jianchao Wang, Wenhui ShiComments: Accepted by SIGMOD 2023; 15 pagesSubjects: Machine Learning (cs.LG); Databases (cs.DB)
Root Cause Analysis (RCA) plays an indispensable role in distributed data system maintenance and operations, as it bridges the gap between fault detection and system recovery. Existing works mainly study multidimensional localization or graphbased root cause localization. This paper opens up the possibilities of exploiting the recently developed framework of explainable AI (XAI) for the purpose of RCA. In particular, we propose BALANCE (BAyesian Linear AttributioN for root CausE localization), which formulates the problem of RCA through the lens of attribution in XAI and seeks to explain the anomalies in the target KPIs by the behavior of the candidate root causes. BALANCE consists of three innovative components. First, we propose a Bayesian multicollinear feature selection (BMFS) model to predict the target KPIs given the candidate root causes in a forward manner while promoting sparsity and concurrently paying attention to the correlation between the candidate root causes. Second, we introduce attribution analysis to compute the attribution score for each candidate in a backward manner. Third, we merge the estimated root causes related to each KPI if there are multiple KPIs. We extensively evaluate the proposed BALANCE method on one synthesis dataset as well as three realworld RCA tasks, that is, bad SQL localization, container fault localization, and fault type diagnosis for Exathlon. Results show that BALANCE outperforms the stateoftheart (SOTA) methods in terms of accuracy with the least amount of running time, and achieves at least $6\%$ notably higher accuracy than SOTA methods for real tasks. BALANCE has been deployed to production to tackle realworld RCA problems, and the online results further advocate its usage for realtime diagnosis in distributed data systems.
 [46] arXiv:2301.13573 [pdf, other]

Title: Skill Decision TransformerSubjects: Machine Learning (cs.LG)
Recent work has shown that Large Language Models (LLMs) can be incredibly effective for offline reinforcement learning (RL) by representing the traditional RL problem as a sequence modelling problem (Chen et al., 2021; Janner et al., 2021). However many of these methods only optimize for high returns, and may not extract much information from a diverse dataset of trajectories. Generalized Decision Transformers (GDTs) (Furuta et al., 2021) have shown that utilizing future trajectory information, in the form of information statistics, can help extract more information from offline trajectory data. Building upon this, we propose Skill Decision Transformer (Skill DT). Skill DT draws inspiration from hindsight relabelling (Andrychowicz et al., 2017) and skill discovery methods to discover a diverse set of primitive behaviors, or skills. We show that Skill DT can not only perform offline statemarginal matching (SMM), but can discovery descriptive behaviors that can be easily sampled. Furthermore, we show that through purely rewardfree optimization, Skill DT is still competitive with supervised offline RL approaches on the D4RL benchmark. The code and videos can be found on our project page: https://github.com/shyamsn97/skilldt
 [47] arXiv:2301.13584 [pdf, other]

Title: Support Exploration Algorithm for Sparse Support RecoveryAuthors: Mimoun Mohamed (LIS, I2M), François Malgouyres (IMT), Valentin Emiya (QARMA), Caroline Chaux (IPAL)Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
We introduce a new algorithm promoting sparsity called {\it Support Exploration Algorithm (SEA)} and analyze it in the context of support recovery/model selection problems.The algorithm can be interpreted as an instance of the {\it straightthrough estimator (STE)} applied to the resolution of a sparse linear inverse problem. SEA uses a nonsparse exploratory vector and makes it evolve in the input space to select the sparse support. We put to evidence an oracle update rule for the exploratory vector and consider the STE update. The theoretical analysis establishes general sufficient conditions of support recovery. The general conditions are specialized to the case where the matrix $A$ performing the linear measurements satisfies the {\it Restricted Isometry Property (RIP)}.Experiments show that SEA can efficiently improve the results of any algorithm. Because of its exploratory nature, SEA also performs remarkably well when the columns of $A$ are strongly coherent.
 [48] arXiv:2301.13589 [pdf, other]

Title: Policy Gradient for sRectangular Robust Markov Decision ProcessesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We present a novel robust policy gradient method (RPG) for srectangular robust Markov Decision Processes (MDPs). We are the first to derive the adversarial kernel in a closed form and demonstrate that it is a onerank perturbation of the nominal kernel. This allows us to derive an RPG that is similar to the one used in nonrobust MDPs, except with a robust Qvalue function and an additional correction term. Both robust Qvalues and correction terms are efficiently computable, thus the time complexity of our method matches that of nonrobust MDPs, which is significantly faster compared to existing black box methods.
 [49] arXiv:2301.13616 [pdf, other]

Title: AntiExploration by Random Network DistillationComments: Source code: this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing outofdistribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the antiexploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Featurewise Linear Modulation (FiLM), resulting in a simple and efficient ensemblefree algorithm based on Soft ActorCritic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemblebased methods and outperforming ensemblefree approaches by a wide margin.
 [50] arXiv:2301.13618 [pdf, other]

Title: Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement LearningAuthors: Gabriele Castellano, JuanJosé Nieto, Jordi Luque, Ferrán Diego, Carlos Segura, Diego Perino, Flavio Esposito, Fulvio Risso, Aravindh RamanSubjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Many realtime applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving computation close to the data sources enables us to meet stringent latency and throughput requirements. However, the constrained nature of edge networks poses several additional challenges to the management of inference workloads: edge clusters can not provide unlimited processing power to DNN models, and often a tradeoff between network and processing time should be considered when it comes to endtoend delay requirements. In this paper, we focus on the problem of scheduling inference queries on DNN models in edge networks at short timescales (i.e., few milliseconds). By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP, highlighting the need for a dynamic scheduling policy that can adapt to network conditions and workloads. We therefore design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions. Our results show that ASET effectively provides the best performance compared to static policies when scheduling over a distributed pool of edge resources.
 [51] arXiv:2301.13622 [pdf, other]

Title: Learning Data Representations with Joint Diffusion ModelsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We introduce a joint diffusion model that simultaneously learns meaningful internal representations fit for both generative and predictive tasks. Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical observations that indicate the usefulness of internal representations built by contemporary deep diffusionbased generative models in both generative and predictive settings. We then introduce an extension of the vanilla diffusion model with a classifier that allows for stable joint training with shared parametrization between those objectives. The resulting joint diffusion model offers superior performance across various tasks, including generative modeling, semisupervised classification, and domain adaptation.
 [52] arXiv:2301.13629 [pdf, other]

Title: DiffSTG: Probabilistic SpatioTemporal Graph Forecasting with Denoising Diffusion ModelsSubjects: Machine Learning (cs.LG)
Spatiotemporal graph neural networks (STGNN) have emerged as the dominant model for spatiotemporal graph (STG) forecasting. Despite their success, they fail to model intrinsic uncertainties within STG data, which cripples their practicality in downstream tasks for decisionmaking. To this end, this paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex ST dependencies. In this study, we present the first attempt to generalize the popular denoising diffusion probabilistic models to STGs, leading to a novel nonautoregressive framework called DiffSTG, along with the first denoising network UGnet for STG in the framework. Our approach combines the spatiotemporal learning capabilities of STGNNs with the uncertainty measurements of diffusion models. Extensive experiments validate that DiffSTG reduces the Continuous Ranked Probability Score (CRPS) by 4%14%, and Root Mean Squared Error (RMSE) by 2%7% over existing methods on three realworld datasets.
 [53] arXiv:2301.13635 [pdf, other]

Title: Active Learningbased Domain Adaptive Localized Polynomial Chaos ExpansionSubjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.dataan)
The paper presents a novel methodology to build surrogate models of complicated functions by an active learningbased sequential decomposition of the input random space and construction of localized polynomial chaos expansions, referred to as domain adaptive localized polynomial chaos expansion (DALPCE). The approach utilizes sequential decomposition of the input random space into smaller subdomains approximated by loworder polynomial expansions. This allows approximation of functions with strong nonlinearties, discontinuities, and/or singularities. Decomposition of the input random space and local approximations alleviates the Gibbs phenomenon for these types of problems and confines error to a very small vicinity near the nonlinearity. The global behavior of the surrogate model is therefore significantly better than existing methods as shown in numerical examples. The whole process is driven by an active learning routine that uses the recently proposed $\Theta$ criterion to assess local variance contributions. The proposed approach balances both \emph{exploitation} of the surrogate model and \emph{exploration} of the input random space and thus leads to efficient and accurate approximation of the original mathematical model. The numerical results show the superiority of the DALPCE in comparison to (i) a single global polynomial chaos expansion and (ii) the recently proposed stochastic spectral embedding (SSE) method developed as an accurate surrogate model and which is based on a similar domain decomposition process. This method represents general framework upon which further extensions and refinements can be based, and which can be combined with any technique for nonintrusive polynomial chaos expansion construction.
 [54] arXiv:2301.13636 [pdf, other]

Title: Transport with Support: DataConditional Diffusion BridgesComments: 23 pages, 11 figuresSubjects: Machine Learning (cs.LG)
The dynamic Schr\"odinger bridge problem provides an appealing setting for solving optimal transport problems by learning nonlinear diffusion processes using efficient iterative solvers. Recent works have demonstrated stateoftheart results (eg. in modelling singlecell embryo RNA sequences or sampling from complex posteriors) but are limited to learning bridges with only initial and terminal constraints. Our work extends this paradigm by proposing the Iterative Smoothing Bridge (ISB). We integrate Bayesian filtering and optimal control into learning the diffusion process, enabling constrained stochastic processes governed by sparse observations at intermediate stages and terminal constraints. We assess the effectiveness of our method on synthetic and realworld data and show that the ISB generalises well to highdimensional data, is computationally efficient, and provides accurate estimates of the marginals at intermediate and terminal times.
 [55] arXiv:2301.13637 [pdf, other]

Title: Tricking AI chips into Simulating the Human Brain: A Detailed Performance AnalysisComments: 11 pages, 4 figuresSubjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
Challenging the Nvidia monopoly, dedicated AIaccelerator chips have begun emerging for tackling the computational challenge that the inference and, especially, the training of modern deep neural networks (DNNs) poses to modern computers. The field has been ridden with studies assessing the performance of these contestants across various DNN model types. However, AIexperts are aware of the limitations of current DNNs and have been working towards the fourth AI wave which will, arguably, rely on more biologically inspired models, predominantly on spiking neural networks (SNNs). At the same time, GPUs have been heavily used for simulating such models in the field of computational neuroscience, yet AIchips have not been tested on such workloads. The current paper aims at filling this important gap by evaluating multiple, cuttingedge AIchips (Graphcore IPU, GroqChip, Nvidia GPU with Tensor Cores and Google TPU) on simulating a highly biologically detailed model of a brain region, the inferior olive (IO). This IO application stresstests the different AIplatforms for highlighting architectural tradeoffs by varying its compute density, memory requirements and floatingpoint numerical accuracy. Our performance analysis reveals that the simulation problem maps extremely well onto the GPU and TPU architectures, which for networks of 125,000 cells leads to a 28x respectively 1,208x speedup over CPU runtimes. At this speed, the TPU sets a new record for largest realtime IO simulation. The GroqChip outperforms both platforms for small networks but, due to implementing some floatingpoint operations at reduced accuracy, is found not yet usable for brain simulation.
 [56] arXiv:2301.13642 [pdf, other]

Title: An Efficient Solution to sRectangular Robust Markov Decision ProcessesComments: arXiv admin note: substantial text overlap with arXiv:2205.14327Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
We present an efficient robust value iteration for \texttt{s}rectangular robust Markov Decision Processes (MDPs) with a time complexity comparable to standard (nonrobust) MDPs which is significantly faster than any existing method. We do so by deriving the optimal robust Bellman operator in concrete forms using our $L_p$ water filling lemma. We unveil the exact form of the optimal policies, which turn out to be novel threshold policies with the probability of playing an action proportional to its advantage.
 [57] arXiv:2301.13644 [pdf, other]

Title: Exploring QSAR Models for ActivityCliff PredictionComments: Submitted to Journal of CheminformaticsSubjects: Machine Learning (cs.LG); Biomolecules (qbio.BM); Machine Learning (stat.ML)
Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that quantitative structureactivity relationship (QSAR) models struggle to predict ACs and that ACs thus form a major source of prediction error. However, a study to explore the ACprediction power of modern QSAR methods and its relationship to general QSARprediction performance is lacking. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extendedconnectivity fingerprints, physicochemicaldescriptor vectors and graph isomorphism networks) with three regression techniques (random forests, knearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or nonACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARSCoV2 main protease. We observe low ACsensitivity amongst the tested models when the activities of both compounds are unknown, but a substantial increase in ACsensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for ACclassification and can thus be employed as baseline ACprediction models or simple compoundoptimisation tools. For general QSARprediction, however, extendedconnectivity fingerprints still consistently deliver the best performance. Our results provide strong support for the hypothesis that indeed QSAR methods frequently fail to predict ACs. We propose twinnetwork training for deep learning models as a potential future pathway to increase ACsensitivity and thus overall QSAR performance.
 [58] arXiv:2301.13671 [pdf, ps, other]

Title: Enhancing HyperToReal Space Projections Through Euclidean Norm MetaHeuristic OptimizationSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The continuous computational power growth in the last decades has made solving several optimization problems significant to humankind a tractable task; however, tackling some of them remains a challenge due to the overwhelming amount of candidate solutions to be evaluated, even by using sophisticated algorithms. In such a context, a set of natureinspired stochastic methods, called metaheuristic optimization, can provide robust approximate solutions to different kinds of problems with a small computational burden, such as derivativefree real function optimization. Nevertheless, these methods may converge to inadequate solutions if the function landscape is too harsh, e.g., enclosing too many local optima. Previous works addressed this issue by employing a hypercomplex representation of the search space, like quaternions, where the landscape becomes smoother and supposedly easier to optimize. Under this approach, metaheuristic computations happen in the hypercomplex space, whereas variables are mapped back to the real domain before function evaluation. Despite this latter operation being performed by the Euclidean norm, we have found that after the optimization procedure has finished, it is usually possible to obtain even better solutions by employing the Minkowski $p$norm instead and finetuning $p$ through an auxiliary subproblem with neglecting additional cost and no hyperparameters. Such behavior was observed in eight wellestablished benchmarking functions, thus fostering a new research direction for hypercomplex metaheuristic optimization.
 [59] arXiv:2301.13694 [pdf, other]

Title: Are Defenses for Graph Neural Networks Robust?Comments: 34 pages, 36th Conference on Neural Information Processing Systems (NeurIPS 2022)Subjects: Machine Learning (cs.LG)
A cursory reading of the literature suggests that we have made a lot of progress in designing effective adversarial defenses for Graph Neural Networks (GNNs). Yet, the standard methodology has a serious flaw  virtually all of the defenses are evaluated against nonadaptive attacks leading to overly optimistic robustness estimates. We perform a thorough robustness analysis of 7 of the most popular defenses spanning the entire spectrum of strategies, i.e., aimed at improving the graph, the architecture, or the training. The results are sobering  most defenses show no or only marginal improvement compared to an undefended baseline. We advocate using custom adaptive attacks as a gold standard and we outline the lessons we learned from successfully designing such attacks. Moreover, our diverse collection of perturbed graphs forms a (blackbox) unit test offering a first glance at a model's robustness.
 [60] arXiv:2301.13703 [pdf, other]

Title: Dissecting the Effects of SGD Noise in Distinct Regimes of Deep LearningComments: 18 pages, 14 figuresSubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (condmat.disnn)
Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $\alpha$ are varied. For gradient descent, $\alpha$ is a key parameter that controls if the network is `lazy' ($\alpha\gg 1$) or instead learns features ($\alpha\ll 1$). For classification of MNIST and CIFAR10 images, our central results are: (i) obtaining phase diagrams for performance in the $(\alpha,T)$ plane. They show that SGD noise can be detrimental or instead useful depending on the training regime. Moreover, although increasing $T$ or decreasing $\alpha$ both allow the net to escape the lazy regime, these changes can have opposite effects on performance. (ii) Most importantly, we find that key dynamical quantities (including the total variations of weights during training) depend on both $T$ and $P$ as power laws, and the characteristic temperature $T_c$, where the noise of SGD starts affecting performance, is a power law of $P$. These observations indicate that a key effect of SGD noise occurs late in training, by affecting the stopping process whereby all data are fitted. We argue that due to SGD noise, nets must develop a stronger `signal', i.e. larger informative weights, to fit the data, leading to a longer training time. The same effect occurs at larger training set $P$. We confirm this view in the perceptron model, where signal and noise can be precisely measured. Interestingly, exponents characterizing the effect of SGD depend on the density of data near the decision boundary, as we explain.
 [61] arXiv:2301.13732 [pdf, other]

Title: Preserving local densities in lowdimensional embeddingsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Lowdimensional embeddings and visualizations are an indispensable tool for analysis of highdimensional data. Stateoftheart methods, such as tSNE and UMAP, excel in unveiling local structures hidden in highdimensional data and are therefore routinely applied in standard analysis pipelines in biology. We show, however, that these methods fail to reconstruct local properties, such as relative differences in densities (Fig. 1) and that apparent differences in cluster size can arise from computational artifact caused by differing sample sizes (Fig. 2). Providing a theoretical analysis of this issue, we then suggest dtSNE, which approximately conserves local densities. In an extensive study on synthetic benchmark and real world data comparing against five stateoftheart methods, we empirically show that dtSNE provides similar global reconstruction, but yields much more accurate depictions of local distances and relative densities.
 [62] arXiv:2301.13733 [pdf]

Title: A Bayesian Generative Adversarial Network (GAN) to Generate Synthetic TimeSeries Data, Application in Combined Sewer Flow PredictionAuthors: Amin E. Bakhshipour, Alireza Koochali, Ulrich Dittmer, Ali Haghighi, Sheraz Ahmad, Andreas DengelComments: Accepted in WDSA/CCWI 2022 ConferenceSubjects: Machine Learning (cs.LG)
Despite various breakthroughs in machine learning and data analysis techniques for improving smart operation and management of urban water infrastructures, some key limitations obstruct this progress. Among these shortcomings, the absence of freely available data due to data privacy or high costs of data gathering and the nonexistence of adequate rare or extreme events in the available data plays a crucial role. Here, Generative Adversarial Networks (GANs) can help overcome these challenges. In machine learning, generative models are a class of methods capable of learning data distribution to generate artificial data. In this study, we developed a GAN model to generate synthetic time series to balance our limited recorded time series data and improve the accuracy of a datadriven model for combined sewer flow prediction. We considered the sewer system of a small town in Germany as the test case. Precipitation and inflow to the storage tanks are used for the DataDriven model development. The aim is to predict the flow using precipitation data and examine the impact of data augmentation using synthetic data in model performance. Results show that GAN can successfully generate synthetic time series from real data distribution, which helps more accurate peak flow prediction. However, the model without data augmentation works better for dry weather prediction. Therefore, an ensemble model is suggested to combine the advantages of both models.
 [63] arXiv:2301.13734 [pdf, other]

Title: Improving Monte Carlo Evaluation with Offline DataSubjects: Machine Learning (cs.LG)
Monte Carlo (MC) methods are the most widely used methods to estimate the performance of a policy. Given an interested policy, MC methods give estimates by repeatedly running this policy to collect samples and taking the average of the outcomes. Samples collected during this process are called online samples. To get an accurate estimate, MC methods consume massive online samples. When online samples are expensive, e.g., online recommendations and inventory management, we want to reduce the number of online samples while achieving the same estimate accuracy. To this end, we use offpolicy MC methods that evaluate the interested policy by running a different policy called behavior policy. We design a tailored behavior policy such that the variance of the offpolicy MC estimator is provably smaller than the ordinary MC estimator. Importantly, this tailored behavior policy can be efficiently learned from existing offline data, i,e., previously logged data, which are much cheaper than online samples. With reduced variance, our offpolicy MC method requires fewer online samples to evaluate the performance of a policy compared with the ordinary MC method. Moreover, our offpolicy MC estimator is always unbiased.
 [64] arXiv:2301.13737 [pdf, other]

Title: SelfConsistent Velocity Matching of Probability FlowsSubjects: Machine Learning (cs.LG)
We present a discretizationfree scalable framework for solving a large class of massconserving partial differential equations (PDEs), including the timedependent FokkerPlanck equation and the Wasserstein gradient flow. The main observation is that the timevarying velocity field of the PDE solution needs to be selfconsistent: it must satisfy a fixedpoint equation involving the flow characterized by the same velocity field. By parameterizing the flow as a timedependent neural network, we propose an endtoend iterative optimization framework called selfconsistent velocity matching to solve this class of PDEs. Compared to existing approaches, our method does not suffer from temporal or spatial discretization, covers a wide range of PDEs, and scales to high dimensions. Experimentally, our method recovers analytical solutions accurately when they are available and achieves comparable or better performance in high dimensions with less training time compared to recent largescale JKObased methods that are designed for solving a more restrictive family of PDEs.
 [65] arXiv:2301.13748 [pdf, other]

Title: Archetypal Analysis++: Rethinking the Initialization StrategyComments: 20 pages, 13 figures, preprintSubjects: Machine Learning (cs.LG)
Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential. Frequently used initialization methods yield either suboptimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective, similar to $k$means++. In fact, we argue that $k$means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of $k$means++ to AA++. In an extensive empirical evaluation of 13 realworld data sets of varying sizes and dimensionalities and considering two preprocessing strategies, we show that AA++ almost consistently outperforms all baselines, including the most frequently used ones.
 [66] arXiv:2301.13757 [pdf, other]

Title: Toward Efficient GradientBased Value EstimationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Gradientbased methods for value estimation in reinforcement learning have favorable stability properties, but they are typically much slower than Temporal Difference (TD) learning methods. We study the root causes of this slowness and show that Mean Square Bellman Error (MSBE) is an illconditioned loss function in the sense that its Hessian has large conditionnumber. To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batchfree proximal method that approximately follows the GaussNewton direction and is asymptotically robust to parameterization. Our main algorithm, called RANS, is efficient in the sense that it is significantly faster than the residual gradient methods while having almost the same computational complexity, and is competitive with TD on the classic problems that we tested.
 [67] arXiv:2301.13764 [pdf, other]

Title: SemiSupervised Classification with Graph Convolutional Kernel MachinesSubjects: Machine Learning (cs.LG)
We present a deep Graph Convolutional Kernel Machine (GCKM) for semisupervised node classification in graphs. First, we introduce an unsupervised kernel machine propagating the node features in a onehop neighbourhood. Then, we specify a semisupervised classification kernel machine through the lens of the FenchelYoung inequality. The deep graph convolutional kernel machine is obtained by stacking multiple shallow kernel machines. After showing that unsupervised and semisupervised layer corresponds to an eigenvalue problem and a linear system on the aggregated node features, respectively, we derive an efficient endtoend training algorithm in the dual variables. Numerical experiments demonstrate that our approach is competitive with stateoftheart graph neural networks for homophilious and heterophilious benchmark datasets. Notably, GCKM achieves superior performance when very few labels are available.
 [68] arXiv:2301.13767 [pdf, other]

Title: Multicalibration as Boosting for RegressionComments: Code available here: this https URLSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
We study the connection between multicalibration and boosting for squared error regression. First we prove a useful characterization of multicalibration in terms of a ``swap regret'' like condition on squared error. Using this characterization, we give an exceedingly simple algorithm that can be analyzed both as a boosting algorithm for regression and as a multicalibration algorithm for a class H that makes use only of a standard squared error regression oracle for H. We give a weak learning assumption on H that ensures convergence to Bayes optimality without the need to make any realizability assumptions  giving us an agnostic boosting algorithm for regression. We then show that our weak learning assumption on H is both necessary and sufficient for multicalibration with respect to H to imply Bayes optimality. We also show that if H satisfies our weak learning condition relative to another class C then multicalibration with respect to H implies multicalibration with respect to C. Finally we investigate the empirical performance of our algorithm experimentally using an open source implementation that we make available. Our code repository can be found at https://github.com/Declancharrison/LevelSetBoosting.
 [69] arXiv:2301.13799 [pdf, other]

Title: Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural NetworksComments: PreprintSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
From natural language processing to genome sequencing, largescale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and instead must be distributed across multiple devices. This has motivated the research of new compute and network systems capable of handling such tasks. In particular, recent work has focused on developing management schemes which decide how to allocate distributed resources such that some overall objective, such as minimising the job completion time (JCT), is optimised. However, such studies omit explicit consideration of how much a job should be distributed, usually assuming that maximum distribution is desirable. In this work, we show that maximum parallelisation is suboptimal in relation to usercritical metrics such as throughput and blocking rate. To address this, we propose PACML (partitioning for asynchronous computing with machine learning). PACML leverages a graph neural network and reinforcement learning to learn how much to partition computation graphs such that the number of jobs which meet arbitrary userdefined JCT requirements is maximised. In experiments with five real deep learning computation graphs on a recently proposed optical architecture across four userdefined JCT requirement distributions, we demonstrate PACML achieving up to 56.2% lower blocking rates in dynamic job arrival settings than the canonical maximum parallelisation strategy used by most prior works.
 [70] arXiv:2301.13816 [pdf, other]

Title: Executionbased Code Generation using Deep Reinforcement LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
The utilization of programming language (PL) models, pretrained on largescale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised finetuning objectives borrowed from text generation, neglecting specific sequencelevel features of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization (PPO) deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs. Extensive experiments on three code generation tasks demonstrate the effectiveness of our proposed approach compared to SOTA methods, improving the success rate of compilation and functional correctness over different PLs. Our code can be found at https://github.com/reddylabcoderesearch/PPOCoder .
 [71] arXiv:2301.13821 [pdf, ps, other]

Title: Complete Neural Networks for Euclidean GraphsSubjects: Machine Learning (cs.LG)
We propose a 2WLlike geometric graph isomorphism test and prove it is complete when applied to Euclidean Graphs in $\mathbb{R}^3$. We then use recent results on multiset embeddings to devise an efficient geometric GNN model with equivalent separation power. We verify empirically that our GNN model is able to separate particularly challenging synthetic examples, and demonstrate its usefulness for a chemical property prediction problem.
 [72] arXiv:2301.13833 [pdf, other]

Title: A Mathematical Model for Curriculum LearningSubjects: Machine Learning (cs.LG)
Curriculum learning (CL)  training using samples that are generated and presented in a meaningful order  was introduced in the machine learning context around a decade ago. While CL has been extensively used and analysed empirically, there has been very little mathematical justification for its advantages. We introduce a CL model for learning the class of kparities on d bits of a binary string with a neural network trained by stochastic gradient descent (SGD). We show that a wise choice of training examples, involving two or more product distributions, allows to reduce significantly the computational cost of learning this class of functions, compared to learning under the uniform distribution. We conduct experiments to support our analysis. Furthermore, we show that for another class of functions  namely the `Hamming mixtures'  CL strategies involving a bounded number of product distributions are not beneficial, while we conjecture that CL with unbounded many curriculum steps can learn this class efficiently.
 [73] arXiv:2301.13845 [pdf, other]

Title: Interpreting Robustness Proofs of Deep Neural NetworksSubjects: Machine Learning (cs.LG)
In recent years numerous methods have been developed to formally verify the robustness of deep neural networks (DNNs). Though the proposed techniques are effective in providing mathematical guarantees about the DNNs behavior, it is not clear whether the proofs generated by these methods are humaninterpretable. In this paper, we bridge this gap by developing new concepts, algorithms, and representations to generate human understandable interpretations of the proofs. Leveraging the proposed method, we show that the robustness proofs of standard DNNs rely on spurious input features, while the proofs of DNNs trained to be provably robust filter out even the semantically meaningful features. The proofs for the DNNs combining adversarial and provably robust training are the most effective at selectively filtering out spurious features as well as relying on humanunderstandable input features.
 [74] arXiv:2301.13857 [pdf, other]

Title: Learning in POMDPs is SampleEfficient with Hindsight ObservabilitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability. However, in many realistic problems, more information is either revealed or can be computed during some point of the learning process. Motivated by diverse applications ranging from robotics to data center scheduling, we formulate a \setting (\setshort) as a POMDP where the latent states are revealed to the learner in hindsight and only during training. We introduce new algorithms for the tabular and function approximation settings that are provably sampleefficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable. We give a lower bound showing that the tabular algorithm is optimal in its dependence on latent state and observation cardinalities.
 [75] arXiv:2301.13862 [pdf, other]

Title: Salient Conditional Diffusion for Defending Against Backdoor AttacksComments: 12 pages, 5 figuresSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), a stateoftheart defense against backdoor attacks. Sancdifi uses a denoising diffusion probabilistic model (DDPM) to degrade an image with noise and then recover said image using the learned reverse diffusion. Critically, we compute saliency mapbased masks to condition our diffusion, allowing for stronger diffusion on the most salient pixels by the DDPM. As a result, Sancdifi is highly effective at diffusing out triggers in data poisoned by backdoor attacks. At the same time, it reliably recovers salient features when applied to clean data. This performance is achieved without requiring access to the model parameters of the Trojan network, meaning Sancdifi operates as a blackbox defense.
 [76] arXiv:2301.13867 [pdf, other]

Title: Mathematical Capabilities of ChatGPTAuthors: Simon Frieder, Luca Pinchetti, RyanRhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Alexis Chevalier, Julius BernerComments: The GHOSTS dataset will be available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
We investigate the mathematical capabilities of ChatGPT by testing it on publicly available datasets, as well as handcrafted ones, and measuring its performance against other models trained on a mathematical corpus, such as Minerva. We also test whether ChatGPT can be a useful assistant to professional mathematicians by emulating various use cases that come up in the daily professional activities of mathematicians (question answering, theorem searching). In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of naturallanguage mathematics, used to benchmark language models, only cover elementary mathematics. We address this issue by introducing a new dataset: GHOSTS. It is the first naturallanguage dataset made and curated by working researchers in mathematics that (1) aims to cover graduatelevel mathematics and (2) provides a holistic overview of the mathematical capabilities of language models. We benchmark ChatGPT on GHOSTS and evaluate performance against finegrained criteria. We make this new dataset publicly available to assist a communitydriven comparison of ChatGPT with (future) large language models in terms of advanced mathematical comprehension. We conclude that contrary to many positive reports in the media (a potential case of selection bias), ChatGPT's mathematical abilities are significantly below those of an average mathematics graduate student. Our results show that ChatGPT often understands the question but fails to provide correct solutions. Hence, if your goal is to use it to pass a university exam, you would be better off copying from your average peer!
 [77] arXiv:2301.13868 [pdf, other]

Title: PADL: LanguageDirected PhysicsBased Character ControlSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Graphics (cs.GR)
Developing systems that can synthesize natural and lifelike motions for simulated characters has long been a focus for computer animation. But in order for these systems to be useful for downstream applications, they need not only produce highquality motions, but must also provide an accessible and versatile interface through which users can direct a character's behaviors. Natural language provides a simpletouse and expressive medium for specifying a user's intent. Recent breakthroughs in natural language processing (NLP) have demonstrated effective use of languagebased interfaces for applications such as image generation and program synthesis. In this work, we present PADL, which leverages recent innovations in NLP in order to take steps towards developing languagedirected controllers for physicsbased character animation. PADL allows users to issue natural language commands for specifying both highlevel tasks and lowlevel skills that a character should perform. We present an adversarial imitation learning approach for training policies to map highlevel language commands to lowlevel controls that enable a character to perform the desired task and skill specified by a user's commands. Furthermore, we propose a multitask aggregation method that leverages a languagebased multiplechoice questionanswering approach to determine highlevel task objectives from language commands. We show that our framework can be applied to effectively direct a simulated humanoid character to perform a diverse array of complex motor skills.
Crosslists for Wed, 1 Feb 23
 [78] arXiv:2301.11916 (crosslist from cs.CL) [pdf, other]

Title: Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for InContext LearningSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In recent years, pretrained large language models have demonstrated remarkable efficiency in achieving an inferencetime fewshot learning capability known as incontext learning. However, existing literature has highlighted the sensitivity of this capability to the selection of fewshot demonstrations. The underlying mechanisms by which this capability arises from regular language model pretraining objectives remain poorly understood. In this study, we aim to examine the incontext learning phenomenon through a Bayesian lens, viewing large language models as topic models that implicitly infer taskrelated information from demonstrations. On this premise, we propose an algorithm for selecting optimal demonstrations from a set of annotated data and demonstrate a significant 12.5% improvement relative to the random selection baseline, averaged over eight GPT2 and GPT3 models on eight different realworld text classification datasets. Our empirical findings support our hypothesis that large language models implicitly infer a latent concept variable.
 [79] arXiv:2301.12762 (crosslist from cs.IR) [pdf]

Title: Causalitybased CTR Prediction using Graph Neural NetworksComments: 40 pages, 6 figures, 5 tablesJournalref: Information Processing & Management 60.1 (2023): 103137Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As a prevalent problem in online advertising, CTR prediction has attracted plentiful attention from both academia and industry. Recent studies have been reported to establish CTR prediction models in the graph neural networks (GNNs) framework. However, most of GNNsbased models handle feature interactions in a complete graph, while ignoring causal relationships among features, which results in a huge drop in the performance on outofdistribution data. This paper is dedicated to developing a causalitybased CTR prediction model in the GNNs framework (CausalGNN) integrating representations of feature graph, user graph and ad graph in the context of online advertising. In our model, a structured representation learning method (GraphFwFM) is designed to capture highorder representations on feature graph based on causal discovery among field features in gated graph neural networks (GGNNs), and GraphSAGE is employed to obtain graph representations of users and ads. Experiments conducted on three public datasets demonstrate the superiority of CausalGNN in AUC and Logloss and the effectiveness of GraphFwFM in capturing highorder representations on causal feature graph.
 [80] arXiv:2301.13246 (crosslist from cs.SE) [pdf, other]

Title: Conversational Automated Program RepairSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Automated Program Repair (APR) can help developers automatically generate patches for bugs. Due to the impressive performance obtained using Large PreTrained Language Models (LLMs) on many code related tasks, researchers have started to directly use LLMs for APR. However, prior approaches simply repeatedly sample the LLM given the same constructed input/prompt created from the original buggy code, which not only leads to generating the same incorrect patches repeatedly but also miss the critical information in testcases. To address these limitations, we propose conversational APR, a new paradigm for program repair that alternates between patch generation and validation in a conversational manner. In conversational APR, we iteratively build the input to the model by combining previously generated patches with validation feedback. As such, we leverage the longterm context window of LLMs to not only avoid generating previously incorrect patches but also incorporate validation feedback to help the model understand the semantic meaning of the program under test. We evaluate 10 different LLM including the newly developed ChatGPT model to demonstrate the improvement of conversational APR over the prior LLM for APR approach.
 [81] arXiv:2301.13261 (crosslist from cs.AI) [pdf, other]

Title: Emergence of Maps in the Memories of Blind Navigation AgentsComments: Accepted to ICLR 2023Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines  specifically, artificial intelligence (AI) navigation agents  also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly mapfree neuralnetworks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms. Specifically, we train 'blind' agents  with sensing limited to only egomotion and no other sensing of any kind  to perform PointGoal navigation ('go to $\Delta$ x, $\Delta$ y') via reinforcement learning. Our agents are composed of navigationagnostic components (fullyconnected and recurrent neural networks), and our experimental setup provides no inductive bias towards mapping. Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments (~95% success); (2) they utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent 'forgets' exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation.
 [82] arXiv:2301.13262 (crosslist from physics.fludyn) [pdf, other]

Title: Temporal Consistency Loss for PhysicsInformed Neural NetworksSubjects: Fluid Dynamics (physics.fludyn); Machine Learning (cs.LG)
Physicsinformed neural networks (PINNs) have been widely used to solve partial differential equations in a forward and inverse manner using deep neural networks. However, training these networks can be challenging for multiscale problems. While statistical methods can be employed to scale the regression loss on data, it is generally challenging to scale the loss terms for equations. This paper proposes a method for scaling the mean squared loss terms in the objective function used to train PINNs. Instead of using automatic differentiation to calculate the temporal derivative, we use backward Euler discretization. This provides us with a scaling term for the equations. In this work, we consider the two and threedimensional NavierStokes equations and determine the kinematic viscosity using the spatiotemporal data on the velocity and pressure fields. We first consider numerical datasets to test our method. We test the sensitivity of our method to the time step size, the number of timesteps, noise in the data, and spatial resolution. Finally, we use the velocity field obtained using Particle Image Velocimetry (PIV) experiments to generate a reference pressure field. We then test our framework using the velocity and reference pressure field.
 [83] arXiv:2301.13269 (crosslist from stat.ML) [pdf, other]

Title: Structure Learning and Parameter Estimation for Graphical Models via Penalized Maximum Likelihood MethodsAuthors: Maryia Shpak (Maria CurieSklodowska University in Lublin)Comments: PhD Thesis. arXiv admin note: text overlap with arXiv:1207.1401, arXiv:1207.1402, arXiv:2002.00269, arXiv:1504.05006 by other authorsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Probabilistic graphical models (PGMs) provide a compact and flexible framework to model very complex reallife phenomena. They combine the probability theory which deals with uncertainty and logical structure represented by a graph which allows one to cope with the computational complexity and also interpret and communicate the obtained knowledge. In the thesis, we consider two different types of PGMs: Bayesian networks (BNs) which are static, and continuous time Bayesian networks which, as the name suggests, have a temporal component. We are interested in recovering their true structure, which is the first step in learning any PGM. This is a challenging task, which is interesting in itself from the causal point of view, for the purposes of interpretation of the model and the decisionmaking process. All approaches for structure learning in the thesis are united by the same idea of maximum likelihood estimation with the LASSO penalty. The problem of structure learning is reduced to the problem of finding nonzero coefficients in the LASSO estimator for a generalized linear model. In the case of CTBNs, we consider the problem both for complete and incomplete data. We support the theoretical results with experiments.
 [84] arXiv:2301.13279 (crosslist from cs.AI) [pdf, other]

Title: Learning Coordination Policies over Heterogeneous Graphs for HumanRobot Teams via Recurrent Neural Schedule PropagationComments: 8 pages, 2 figures, 3 TablesJournalref: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)
As humanrobot collaboration increases in the workforce, it becomes essential for humanrobot teams to coordinate efficiently and intuitively. Traditional approaches for humanrobot scheduling either utilize exact methods that are intractable for largescale problems and struggle to account for stochastic, time varying human task performance, or applicationspecific heuristics that require expert domain knowledge to develop. We propose a deep learningbased framework, called HybridNet, combining a heterogeneous graphbased encoder with a recurrent schedule propagator for scheduling stochastic humanrobot teams under upper and lowerbound temporal constraints. The HybridNet's encoder leverages Heterogeneous Graph Attention Networks to model the initial environment and team dynamics while accounting for the constraints. By formulating task scheduling as a sequential decisionmaking process, the HybridNet's recurrent neural schedule propagator leverages Long ShortTerm Memory (LSTM) models to propagate forward consequences of actions to carry out fast schedule generation, removing the need to interact with the environment between every taskagent pair selection. The resulting scheduling policy network provides a computationally lightweight yet highly expressive model that is endtoend trainable via Reinforcement Learning algorithms. We develop a virtual task scheduling environment for mixed humanrobot teams in a multiround setting, capable of modeling the stochastic learning behaviors of human workers. Experimental results showed that HybridNet outperformed other humanrobot scheduling solutions across problem sizes for both deterministic and stochastic human performance, with faster runtime compared to pureGNNbased schedulers.
 [85] arXiv:2301.13303 (crosslist from stat.ML) [pdf, other]

Title: Variational sparse inverse Cholesky approximation for latent Gaussian processes via double KullbackLeibler minimizationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SICrestricted KullbackLeibleroptimal approximation of the prior. We then focus on a particular SIC ordering and nearestneighborbased sparsity pattern resulting in highly accurate prior and posterior approximations. For this setting, our variational approximation can be computed via stochastic gradient descent in polylogarithmic time per iteration. We provide numerical comparisons showing that the proposed doubleKullbackLeibleroptimal Gaussianprocess approximation (DKLGP) can sometimes be vastly more accurate than alternative approaches such as inducingpoint and meanfield approximations at similar computational complexity.
 [86] arXiv:2301.13306 (crosslist from cs.GT) [pdf, ps, other]

Title: Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing DynamicsSubjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
We study a game between autobidding algorithms that compete in an online advertising platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple rounds of a repeated auction, subject to budget and/or returnoninvestment constraints. We propose a gradientbased learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret. Our algorithm uses only bandit feedback and can be used with the first or secondprice auction, as well as with any "intermediate" auction format. Our main result is that when these autobidders play against each other, the resulting expected liquid welfare over all rounds is at least half of the expected optimal liquid welfare achieved by any allocation. This holds whether or not the bidding dynamics converges to an equilibrium and regardless of the correlation structure between advertiser valuations.
 [87] arXiv:2301.13314 (crosslist from math.OC) [pdf, ps, other]

Title: SingleLoop Switching Subgradient Methods for NonSmooth Weakly Convex Optimization with NonSmooth Convex ConstraintsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
In this paper, we consider a general nonconvex constrained optimization problem, where the objective function is weakly convex and the constraint function is convex while they can both be nonsmooth. This class of problems arises from many applications in machine learning such as fairnessaware supervised learning. To solve this problem, we consider the classical switching subgradient method by Polyak (1965), which is an intuitive and easily implementable firstorder method. Before this work, its iteration complexity was only known for convex optimization. We prove its oracle complexity for finding a nearly stationary point when the objective function is nonconvex. The analysis is derived separately when the constraint function is deterministic and stochastic. Compared to existing methods, especially the doubleloop methods, the switching gradient method can be applied to nonsmooth problems and only has a single loop, which saves the effort on tuning the number of inner iterations.
 [88] arXiv:2301.13331 (crosslist from cs.AI) [pdf, other]

Title: Fast Resolution Agnostic Neural Techniques to Solve Partial Differential EquationsAuthors: Hrishikesh Viswanath, Md Ashiqur Rahman, Abhijeet Vyas, Andrey Shor, Beatriz Medeiros, Stephanie Hernandez, Suhas Eswarappa Prameela, Aniket BeraSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Computational Physics (physics.compph)
Numerical approximations of partial differential equations (PDEs) are routinely employed to formulate the solution of physics, engineering and mathematical problems involving functions of several variables, such as the propagation of heat or sound, fluid flow, elasticity, electrostatics, electrodynamics, and more. While this has led to solving many complex phenomena, there are still significant limitations. Conventional approaches such as Finite Element Methods (FEMs) and Finite Differential Methods (FDMs) require considerable time and are computationally expensive. In contrast, machine learningbased methods such as neural networks are faster once trained, but tend to be restricted to a specific discretization. This article aims to provide a comprehensive summary of conventional methods and recent machine learningbased methods to approximate PDEs numerically. Furthermore, we highlight several key architectures centered around the neural operator, a novel and fast approach (1000x) to learning the solution operator of a PDE. We will note how these new computational approaches can bring immense advantages in tackling many problems in fundamental and applied physics.
 [89] arXiv:2301.13348 (crosslist from stat.ML) [pdf, other]

Title: A Reinforcement Learning Framework for Dynamic Mediation AnalysisSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Mediation analysis learns the causal effect transmitted via mediator variables between treatments and outcomes and receives increasing attention in various scientific domains to elucidate causal relations. Most existing works focus on pointexposure studies where each subject only receives one treatment at a single time point. However, there are a number of applications (e.g., mobile health) where the treatments are sequentially assigned over time and the dynamic mediation effects are of primary interest. Proposing a reinforcement learning (RL) framework, we are the first to evaluate dynamic mediation effects in settings with infinite horizons. We decompose the average treatment effect into an immediate direct effect, an immediate mediation effect, a delayed direct effect, and a delayed mediation effect. Upon the identification of each effect component, we further develop robust and semiparametrically efficient estimators under the RL framework to infer these causal effects. The superior performance of the proposed method is demonstrated through extensive numerical studies, theoretical results, and an analysis of a mobile health dataset.
 [90] arXiv:2301.13356 (crosslist from cs.CV) [pdf, other]

Title: Inference Time Evidences of Adversarial Attacks for Forensic on TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Vision Transformers (ViTs) are becoming a very popular paradigm for vision tasks as they achieve stateoftheart performance on image classification. However, although early works implied that this network structure had increased robustness against adversarial attacks, some works argue ViTs are still vulnerable. This paper presents our first attempt toward detecting adversarial attacks during inference time using the network's input and outputs as well as latent features. We design four quantifications (or derivatives) of input, output, and latent vectors of ViTbased models that provide a signature of the inference, which could be beneficial for the attack detection, and empirically study their behavior over clean samples and adversarial samples. The results demonstrate that the quantifications from input (images) and output (posterior probabilities) are promising for distinguishing clean and adversarial samples, while latent vectors offer less discriminative power, though they give some insights on how adversarial perturbations work.
 [91] arXiv:2301.13368 (crosslist from stat.ME) [pdf, other]

Title: Misspecificationrobust Sequential Neural LikelihoodComments: 21 pages, 5 figuresSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
Simulationbased inference (SBI) techniques are now an essential tool for the parameter estimation of mechanistic and simulatable models with intractable likelihoods. Statistical approaches to SBI such as approximate Bayesian computation and Bayesian synthetic likelihood have been well studied in the well specified and misspecified settings. However, most implementations are inefficient in that many model simulations are wasted. Neural approaches such as sequential neural likelihood (SNL) have been developed that exploit all model simulations to build a surrogate of the likelihood function. However, SNL approaches have been shown to perform poorly under model misspecification. In this paper, we develop a new method for SNL that is robust to model misspecification and can identify areas where the model is deficient. We demonstrate the usefulness of the new approach on several illustrative examples.
 [92] arXiv:2301.13371 (crosslist from stat.ML) [pdf, other]

Title: Demystifying DisagreementontheLine in High DimensionsSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Evaluating the performance of machine learning models under distribution shift is challenging, especially when we only have unlabeled data from the shifted (target) domain, along with labeled data from the original (source) domain. Recent work suggests that the notion of disagreement, the degree to which two models trained with different randomness differ on the same input, is a key to tackle this problem. Experimentally, disagreement and prediction error have been shown to be strongly connected, which has been used to estimate model performance. Experiments have lead to the discovery of the disagreementontheline phenomenon, whereby the classification error under the target domain is often a linear function of the classification error under the source domain; and whenever this property holds, disagreement under the source and target domain follow the same linear relation. In this work, we develop a theoretical foundation for analyzing disagreement in highdimensional random features regression; and study under what conditions the disagreementontheline phenomenon occurs in our setting. Experiments on CIFAR10C, Tiny ImageNetC, and Camelyon17 are consistent with our theory and support the universality of the theoretical findings.
 [93] arXiv:2301.13380 (crosslist from cs.SD) [pdf, other]

Title: Automated Timefrequency Domain Audio Crossfades using Graph CutsJournalref: Late Breaking/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
The problem of transitioning smoothly from one audio clip to another arises in many music consumption scenarios, especially as music consumption has moved from professionally curated and livestreamed radios to personal playback devices and services. we present the first steps toward a new method of automatically transitioning from one audio clip to another by discretizing the frequency spectrum into bins and then finding transition times for each bin. We phrase the problem as one of graph flow optimization; specifically mincut/maxflow.
 [94] arXiv:2301.13383 (crosslist from cs.SD) [pdf, other]

Title: An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music GenerationComments: This is a draft before submitted to TISMIR as a journal paperSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Pitch and meter are two fundamental music features for symbolic music generation tasks, where researchers usually choose different encoding methods depending on specific goals. However, the advantages and drawbacks of different encoding methods have not been frequently discussed. This paper presents a integrated analysis of the influence of two lowlevel feature, pitch and meter, on the performance of a tokenbased sequential music generation model. First, the commonly used MIDI number encoding and a less used classoctave encoding are compared. Second, an dense intrabar metric grid is imposed to the encoded sequence as auxiliary features. Different complexity and resolutions of the metric grid are compared. For complexity, the single token approach and the multiple token approach are compared; for grid resolution, 0 (ablation), 1 (barlevel), 4 (downbeatlevel) 12, (8thtripletlevel) up to 64 (64thnotegridlevel) are compared; for duration resolution, 4, 8, 12 and 16 subdivisions per beat are compared. All different encodings are tested on separately trained TransformerXL models for a melody generation task. Regarding distribution similarity of several objective evaluation metrics to the test dataset, results suggest that the classoctave encoding significantly outperforms the takenforgranted MIDI encoding on pitchrelated metrics; finer grids and multipletoken grids improve the rhythmic quality, but also suffer from overfitting at early training stage. Results display a general phenomenon of overfitting from two aspects, the pitch embedding space and the test loss of the singletoken grid encoding. From a practical perspective, we both demonstrate the feasibility and raise the concern of easy overfitting problem of using smaller networks and lower embedding dimensions on the generation task. The findings can also contribute to futural models in terms of feature engineering.
 [95] arXiv:2301.13387 (crosslist from qbio.GN) [pdf, other]

Title: Deep Learning for ReferenceFree Geolocation for Poplar TreesComments: Accepted at NeurIPS 2022 AI for Science WorkshopSubjects: Genomics (qbio.GN); Machine Learning (cs.LG)
A core task in precision agriculture is the identification of climatic and ecological conditions that are advantageous for a given crop. The most succinct approach is geolocation, which is concerned with locating the native region of a given sample based on its genetic makeup. Here, we investigate genomic geolocation of Populus trichocarpa, or poplar, which has been identified by the US Department of Energy as a fastrotation biofuel crop to be harvested nationwide. In particular, we approach geolocation from a referencefree perspective, circumventing the need for computeintensive processes such as variant calling and alignment. Our model, MashNet, predicts latitude and longitude for poplar trees from randomlysampled, unaligned sequence fragments. We show that our model performs comparably to Locator, a stateoftheart method based on aligned wholegenome sequence data. MashNet achieves an error of 34.0 km^2 compared to Locator's 22.1 km^2. MashNet allows growers to quickly and efficiently identify natural varieties that will be most productive in their growth environment based on genotype. This paper explores geolocation for precision agriculture while providing a framework and data source for further development by the machine learning community.
 [96] arXiv:2301.13388 (crosslist from cs.HC) [pdf, other]

Title: Large Music Recommendation Studies for Small TeamsJournalref: Late Breaking/Demo, Proc. of the 22nd Int. Society for Music Information Retrieval Conf., Online, 2021Subjects: HumanComputer Interaction (cs.HC); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Running live music recommendation studies without direct industry partnerships can be a prohibitively daunting task, especially for small teams. In order to help future researchers interested in such evaluations, we present a number of struggles we faced in the process of generating our own such evaluation system alongside potential solutions. These problems span the topics of users, data, computation, and application architecture.
 [97] arXiv:2301.13413 (crosslist from cs.RO) [pdf, other]

Title: Fine Robotic Manipulation without Force/Torque SensorSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Force Sensing and Force Control are essential to many industrial applications. Typically, a 6axis Force/Torque (F/T) sensor is mounted between the robot's wrist and the endeffector in order to measure the forces and torques exerted by the environment onto the robot (the external wrench). Although a typical 6axis F/T sensor can provide highly accurate measurements, it is expensive and vulnerable to drift and external impacts. Existing methods aiming at estimating the external wrench using only the robot's internal signals are limited in scope: for example, wrench estimation accuracy was mostly validated in freespace motions and simple contacts as opposed to tasks like assembly that require highprecision force control. Here we present a Neural Network based method and argue that by devoting particular attention to the training data structure, it is possible to accurately estimate the external wrench in a wide range of scenarios based solely on internal signals. As an illustration, we demonstrate a pin insertion experiment with 100micron clearance and a handguiding experiment, both performed without external F/T sensors or joint torque sensors. Our result opens the possibility of equipping the existing 2.7 million industrial robots with Force Sensing and Force Control capabilities without any additional hardware.
 [98] arXiv:2301.13415 (crosslist from cs.AI) [pdf, other]

Title: LogAI: A Library for Log Analytics and IntelligenceComments: 17 pages, 7 figures, technical report for open source code, paper release with codeSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Software and System logs record runtime information about processes executing within a system. These logs have become the most critical and ubiquitous forms of observability data that help developers understand system behavior, monitor system health and resolve issues. However, the volume of logs generated can be humongous (of the order of petabytes per day) especially for complex distributed systems, such as cloud, search engine, social media, etc. This has propelled a lot of research on developing AIbased log based analytics and intelligence solutions that can process huge volume of raw logs and generate insights. In order to enable users to perform multiple types of AIbased log analysis tasks in a uniform manner, we introduce LogAI (https://github.com/salesforce/logai), a onestop open source library for log analytics and intelligence. LogAI supports tasks such as log summarization, log clustering and log anomaly detection. It adopts the OpenTelemetry data model, to enable compatibility with different log management platforms. LogAI provides a unified model interface and provides popular timeseries, statistical learning and deep learning models. Alongside this, LogAI also provides an outofthebox GUI for users to conduct interactive analysis. With LogAI, we can also easily benchmark popular deep learning algorithms for log anomaly detection without putting in redundant effort to process the logs. We have opensourced LogAI to cater to a wide range of applications benefiting both academic research and industrial prototyping.
 [99] arXiv:2301.13418 (crosslist from cs.CV) [pdf, other]

Title: BRAIxDet: Learning to Detect Malignant Breast Lesion with Incomplete AnnotationsAuthors: Yuanhong Chen, Yuyuan Liu, Chong Wang, Michael Elliott, Chun Fung Kwok, Carlos Pe naSolorzano, Yu Tian, Fengbei Liu, Helen Frazer, Davis J. McCarthy, Gustavo CarneiroSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Methods to detect malignant lesions from screening mammograms are usually trained with fully annotated datasets, where images are labelled with the localisation and classification of cancerous lesions. However, realworld screening mammogram datasets commonly have a subset that is fully annotated and another subset that is weakly annotated with just the global classification (i.e., without lesion localisation). Given the large size of such datasets, researchers usually face a dilemma with the weakly annotated subset: to not use it or to fully annotate it. The first option will reduce detection accuracy because it does not use the whole dataset, and the second option is too expensive given that the annotation needs to be done by expert radiologists. In this paper, we propose a middleground solution for the dilemma, which is to formulate the training as a weakly and semisupervised learning problem that we refer to as malignant breast lesion detection with incomplete annotations. To address this problem, our new method comprises two stages, namely: 1) pretraining a multiview mammogram classifier with weak supervision from the whole dataset, and 2) extending the trained classifier to become a multiview detector that is trained with semisupervised studentteacher learning, where the training set contains fully and weaklyannotated mammograms. We provide extensive detection results on two realworld screening mammogram datasets containing incomplete annotations, and show that our proposed approach achieves stateoftheart results in the detection of malignant breast lesions with incomplete annotations.
 [100] arXiv:2301.13428 (crosslist from cs.CV) [pdf, other]

Title: Contrast and Clustering: Learning Neighborhood Pair Representation for Sourcefree Domain AdaptationComments: conference paperSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Domain adaptation has attracted a great deal of attention in the machine learning community, but it requires access to source data, which often raises concerns about data privacy. We are thus motivated to address these issues and propose a simple yet efficient method. This work treats domain adaptation as an unsupervised clustering problem and trains the target model without access to the source data. Specifically, we propose a loss function called contrast and clustering (CaC), where a positive pair term pulls neighbors belonging to the same class together in the feature space to form clusters, while a negative pair term pushes samples of different classes apart. In addition, extended neighbors are taken into account by querying the nearest neighbor indexes in the memory bank to mine for more valuable negative pairs. Extensive experiments on three common benchmarks, VisDA, OfficeHome and Office31, demonstrate that our method achieves stateoftheart performance. The code will be made publicly available at https://github.com/yukilulu/CaC.
 [101] arXiv:2301.13445 (crosslist from cs.CV) [pdf, other]

Title: A Survey of Explainable AI in Deep Visual Modeling: Methods and MetricsAuthors: Naveed AkhtarComments: Short accessible survey (9pgs)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Deep visual models have widespread applications in highstake domains. Hence, their blackbox nature is currently attracting a large interest of the research community. We present the first survey in Explainable AI that focuses on the methods and metrics for interpreting deep visual models. Covering the landmark contributions along the stateoftheart, we not only provide a taxonomic organization of the existing techniques, but also excavate a range of evaluation metrics and collate them as measures of different properties of model explanations. Along the insightful discussion on the current trends, we also discuss the challenges and future avenues for this research direction.
 [102] arXiv:2301.13447 (crosslist from eess.SY) [pdf, other]

Title: A DataDriven Modeling and Control Framework for PhysicsBased Building EmulatorsSubjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We present a datadriven modeling and control framework for physicsbased building emulators. Our approach comprises: (a) Offline training of differentiable surrogate models that speed up model evaluations, provide cheap gradients, and have good predictive accuracy for the receding horizon in Model Predictive Control (MPC) and (b) Formulating and solving nonlinear building HVAC MPC problems. We extensively verify the modeling and control performance using multiple surrogate models and optimization frameworks for different available test cases in the Building Optimization Testing Framework (BOPTEST). The framework is compatible with other modeling techniques and customizable with different control formulations. The modularity makes the approach futureproof for test cases currently in development for physicsbased building emulators and provides a path toward prototyping predictive controllers in large buildings.
 [103] arXiv:2301.13459 (crosslist from cs.CV) [pdf, other]

Title: Learning Generalized Hybrid Proximity Representation for Image RecognitionComments: The paper has been accepted by the IEEE ICTAI 2022Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
Recently, deep metric learning techniques received attention, as the learned distance representations are useful to capture the similarity relationship among samples and further improve the performance of various of supervised or unsupervised learning tasks. We propose a novel supervised metric learning method that can learn the distance metrics in both geometric and probabilistic space for image recognition. In contrast to the previous metric learning methods which usually focus on learning the distance metrics in Euclidean space, our proposed method is able to learn better distance representation in a hybrid approach. To achieve this, we proposed a Generalized Hybrid Metric Loss (GHMLoss) to learn the general hybrid proximity features from the image data by controlling the tradeoff between geometric proximity and probabilistic proximity. To evaluate the effectiveness of our method, we first provide theoretical derivations and proofs of the proposed loss function, then we perform extensive experiments on two public datasets to show the advantage of our method compared to other stateoftheart metric learning methods.
 [104] arXiv:2301.13462 (crosslist from physics.aoph) [pdf, other]

Title: Towards Learned Emulation of Interannual Water Isotopologue Variations in General Circulation ModelsSubjects: Atmospheric and Oceanic Physics (physics.aoph); Machine Learning (cs.LG)
Simulating abundances of stable water isotopologues, i.e. molecules differing in their isotopic composition, within climate models allows for comparisons with proxy data and, thus, for testing hypotheses about past climate and validating climate models under varying climatic conditions. However, many models are run without explicitly simulating water isotopologues. We investigate the possibility to replace the explicit physicsbased simulation of oxygen isotopic composition in precipitation using machine learning methods. These methods estimate isotopic composition at each time step for given fields of surface temperature and precipitation amount. We implement convolutional neural networks (CNNs) based on the successful UNet architecture and test whether a spherical network architecture outperforms the naive approach of treating Earth's latitudelongitude grid as a flat image. Conducting a case study on a last millennium run with the iHadCM3 climate model, we find that roughly 40\% of the temporal variance in the isotopic composition is explained by the emulations on interannual and monthly timescale, with spatially varying emulation quality. A modified version of the standard UNet architecture for flat images yields results that are equally good as the predictions by the spherical CNN. We test generalization to last millennium runs of other climate models and find that while the tested deep learning methods yield the best results on iHadCM3 data, the performance drops when predicting on other models and is comparable to simple pixelwise linear regression. An extended choice of predictor variables and improving the robustness of learned climateoxygen isotope relationships should be explored in future work.
 [105] arXiv:2301.13476 (crosslist from cs.SE) [pdf, other]

Title: An investigation of challenges encountered when specifying training data and runtime monitors for safety critical ML applicationsSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
Context and motivation: The development and operation of critical software that contains machine learning (ML) models requires diligence and established processes. Especially the training data used during the development of ML models have major influences on the later behaviour of the system. Runtime monitors are used to provide guarantees for that behaviour. Question / problem: We see major uncertainty in how to specify training data and runtime monitoring for critical ML models and by this specifying the final functionality of the system. In this interviewbased study we investigate the underlying challenges for these difficulties. Principal ideas/results: Based on ten interviews with practitioners who develop ML models for critical applications in the automotive and telecommunication sector, we identified 17 underlying challenges in 6 challenge groups that relate to the challenge of specifying training data and runtime monitoring. Contribution: The article provides a list of the identified underlying challenges related to the difficulties practitioners experience when specifying training data and runtime monitoring for ML models. Furthermore, interconnection between the challenges were found and based on these connections recommendation proposed to overcome the root causes for the challenges.
 [106] arXiv:2301.13486 (crosslist from stat.ML) [pdf, other]

Title: Robust Linear Regression: Gradientdescent, Earlystopping, and BeyondSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In this work we study the robustness to adversarial attacks, of earlystopping strategies on gradientdescent (GD) methods for linear regression. More precisely, we show that earlystopped GD is optimally robust (up to an absolute constant) against Euclideannorm adversarial attacks. However, we show that this strategy can be arbitrarily suboptimal in the case of general Mahalanobis attacks. This observation is compatible with recent findings in the case of classification~\cite{Vardi2022GradientMP} that show that GD provably converges to nonrobust models. To alleviate this issue, we propose to apply instead a GD scheme on a transformation of the data adapted to the attack. This data transformation amounts to apply featuredepending learning rates and we show that this modified GD is able to handle any Mahalanobis attack, as well as more general attacks under some conditions. Unfortunately, choosing such adapted transformations can be hard for general attacks. To the rescue, we design a simple and tractable estimator whose adversarial risk is optimal up to within a multiplicative constant of 1.1124 in the population regime, and works for any norm.
 [107] arXiv:2301.13506 (crosslist from cs.SE) [pdf, other]

Title: DNN Explanation for Safety Analysis: an Empirical Evaluation of Clusteringbased ApproachesComments: 10 Tables, 14 FiguresSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG)
The adoption of deep neural networks (DNNs) in safetycritical contexts is often prevented by the lack of effective means to explain their results, especially when they are erroneous. In our previous work, we proposed a whitebox approach (HUDD) and a blackbox approach (SAFE) to automatically characterize DNN failures. They both identify clusters of similar images from a potentially large set of images leading to DNN failures. However, the analysis pipelines for HUDD and SAFE were instantiated in specific ways according to common practices, deferring the analysis of other pipelines to future work. In this paper, we report on an empirical evaluation of 99 different pipelines for root cause analysis of DNN failures. They combine transfer learning, autoencoders, heatmaps of neuron relevance, dimensionality reduction techniques, and different clustering algorithms. Our results show that the best pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters almost exclusively capturing images of the same failure scenario, thus facilitating root cause analysis. Further, it generates distinct clusters for each root cause of failure, thus enabling engineers to detect all the unsafe scenarios. Interestingly, these results hold even for failure scenarios that are only observed in a small percentage of the failing images.
 [108] arXiv:2301.13507 (crosslist from cs.IR) [pdf, ps, other]

Title: An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio FeaturesSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Hit song prediction, one of the emerging fields in music information retrieval (MIR), remains a considerable challenge. Being able to understand what makes a given song a hit is clearly beneficial to the whole music industry. Previous approaches to hit song prediction have focused on using audio features of a record. This study aims to improve the prediction result of the top 10 hits among Billboard Hot 100 songs using more alternative metadata, including song audio features provided by Spotify, song lyrics, and novel metadatabased features (title topic, popularity continuity and genre class). Five machine learning approaches are applied, including: knearest neighbours, Naive Bayes, Random Forest, Logistic Regression and Multilayer Perceptron. Our results show that Random Forest (RF) and Logistic Regression (LR) with all features (including novel features, song audio features and lyrics features) outperforms other models, achieving 89.1% and 87.2% accuracy, and 0.91 and 0.93 AUC, respectively. Our findings also demonstrate the utility of our novel music metadata features, which contributed most to the models' discriminative performance.
 [109] arXiv:2301.13514 (crosslist from cs.CV) [pdf, other]

Title: Fourier Sensitivity and Regularization of Computer Vision ModelsComments: Published in TMLR, this https URLJournalref: TMLR 2022Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Recent work has empirically shown that deep neural networks latch on to the Fourier statistics of training data and show increased sensitivity to Fourierbasis directions in the input. Understanding and modifying this Fouriersensitivity of computer vision models may help improve their robustness. Hence, in this paper we study the frequency sensitivity characteristics of deep neural networks using a principled approach. We first propose a basis trick, proving that unitary transformations of the inputgradient of a function can be used to compute its gradient in the basis induced by the transformation. Using this result, we propose a general measure of any differentiable model's Fouriersensitivity using the unitary Fouriertransform of its inputgradient. When applied to deep neural networks, we find that computer vision models are consistently sensitive to particular frequencies dependent on the dataset, training method and architecture. Based on this measure, we further propose a Fourierregularization framework to modify the Fouriersensitivities and frequency bias of models. Using our proposed regularizerfamily, we demonstrate that deep neural networks obtain improved classification accuracy on robustness evaluations.
 [110] arXiv:2301.13524 (crosslist from quantph) [pdf, other]

Title: Quantum contextual bandits and recommender systems for quantum dataComments: 15 pages, 9 figuresSubjects: Quantum Physics (quantph); Information Retrieval (cs.IR); Machine Learning (cs.LG)
We study a recommender system for quantum data using the linear contextual bandit framework. In each round, a learner receives an observable (the context) and has to recommend from a finite set of unknown quantum states (the actions) which one to measure. The learner has the goal of maximizing the reward in each round, that is the outcome of the measurement on the unknown state. Using this model we formulate the low energy quantum state recommendation problem where the context is a Hamiltonian and the goal is to recommend the state with the lowest energy. For this task, we study two families of contexts: the Ising model and a generalized cluster model. We observe that if we interpret the actions as different phases of the models then the recommendation is done by classifying the correct phase of the given Hamiltonian and the strategy can be interpreted as an online quantum phase classifier.
 [111] arXiv:2301.13532 (crosslist from stat.ML) [pdf, other]

Title: Populationwise Labeling of Sulcal Graphs using Multigraph MatchingAuthors: Rohit Yadav (AMU, INT, LIS), FrançoisXavier Dupé (LIS, QARMA), S. Takerkart (INT), Guillaume Auzias (INT)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Populationwise matching of the cortical fold is necessary to identify biomarkers of neurological or psychiatric disorders. The difficulty comes from the massive interindividual variations in the morphology and spatial organization of the folds. This task is challenging at both methodological and conceptual levels. In the widely used registrationbased techniques, these variations are considered as noise and the matching of folds is only implicit. Alternative approaches are based on the extraction and explicit identification of the cortical folds. In particular, representing cortical folding patterns as graphs of sulcal basinstermed sulcal graphsenables to formalize the task as a graphmatching problem. In this paper, we propose to address the problem of sulcal graph matching directly at the population level using multigraph matching techniques. First, we motivate the relevance of multigraph matching framework in this context. We then introduce a procedure to generate populations of artificial sulcal graphs, which allows us benchmarking several state of the art multigraph matching methods. Our results on both artificial and real data demonstrate the effectiveness of multigraph matching techniques to obtain a populationwise consistent labeling of cortical folds at the sulcal basins level.
 [112] arXiv:2301.13536 (crosslist from cs.NI) [pdf, other]

Title: Low Complexity Adaptive Machine Learning Approaches for EndtoEnd Latency PredictionAuthors: Pierre Larrenie (LIGM), JeanFrançois Bercher (LIGM), Olivier Venard (ESYCOM), Iyad LahsenCherif (INPT)Journalref: 5th International Conference on Machine Learning for Networking (MLN'2022), Nov 2022, Paris, FranceSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
Software Defined Networks have opened the door to statistical and AIbased techniques to improve efficiency of networking. Especially to ensure a certain Quality of Service (QoS) for specific applications by routing packets with awareness on content nature (VoIP, video, files, etc.) and its needs (latency, bandwidth, etc.) to use efficiently resources of a network. Monitoring and predicting various Key Performance Indicators (KPIs) at any level may handle such problems while preserving network bandwidth. The question addressed in this work is the design of efficient, lowcost adaptive algorithms for KPI estimation, monitoring and prediction. We focus on endtoend latency prediction, for which we illustrate our approaches and results on data obtained from a public generator provided after the recent international challenge on GNN [12]. In this paper, we improve our previously proposed lowcost estimators [6] by adding the adaptive dimension, and show that the performances are minimally modified while gaining the ability to track varying networks.
 [113] arXiv:2301.13545 (crosslist from cs.RO) [pdf, other]

Title: Holistic Graphbased Motion PredictionComments: Accepted on ICRA 2023Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
Motion prediction for automated vehicles in complex environments is a difficult task that is to be mastered when automated vehicles are to be used in arbitrary situations. Many factors influence the future motion of traffic participants starting with traffic rules and reaching from the interaction between each other to personal habits of human drivers. Therefore we present a novel approach for a graphbased prediction based on a heterogeneous holistic graph representation that combines temporal information, properties and relations between traffic participants as well as relations with static elements like the road network. The information are encoded through different types of nodes and edges that both are enriched with arbitrary features. We evaluated the approach on the INTERACTION and the Argoverse dataset and conducted an informative ablation study to demonstrate the benefit of different types of information for the motion prediction quality.
 [114] arXiv:2301.13549 (crosslist from cs.CV) [pdf, other]

Title: Review of methods for automatic cerebral microbleeds detectionAuthors: Maria Ferlin, Zuzanna Klawikowska, Michał Grochowski, Małgorzata Grzywińska, Edyta SzurowskaComments: 32 pages, 6 figures, 3 tables, 174 referencesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Cerebral microbleeds detection is an important and challenging task. With the gaining popularity of the MRI, the ability to detect cerebral microbleeds also raises. Unfortunately, for radiologists, it is a timeconsuming and laborious procedure. For this reason, various solutions to automate this process have been proposed for several years, but none of them is currently used in medical practice. In this context, the need to systematize the existing knowledge and best practices has been recognized as a factor facilitating the imminent synthesis of a real CMBs detection system practically applicable in medicine. To the best of our knowledge, all available publications regarding automatic cerebral microbleeds detection have been gathered, described, and assessed in this paper in order to distinguish the current research state and provide a starting point for future studies.
 [115] arXiv:2301.13569 (crosslist from cs.CV) [pdf, other]

Title: NPMatch: Towards a New Probabilistic Model for SemiSupervised LearningComments: An journal version of our previous ICML 2022 paper arXiv:2207.01066 . Codes are available at: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Semisupervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data. In this work, we adjust neural processes (NPs) to the semisupervised image classification task, resulting in a new method named NPMatch. NPMatch is suited to this task for two reasons. Firstly, NPMatch implicitly compares data points when making predictions, and as a result, the prediction of each unlabeled data point is affected by the labeled data points that are similar to it, which improves the quality of pseudolabels. Secondly, NPMatch is able to estimate uncertainty that can be used as a tool for selecting unlabeled samples with reliable pseudolabels. Compared with uncertaintybased SSL methods implemented with MonteCarlo (MC) dropout, NPMatch estimates uncertainty with much less computational overhead, which can save time at both the training and the testing phases. We conducted extensive experiments on five public datasets under three semisupervised image classification settings, namely, the standard semisupervised image classification, the imbalanced semisupervised image classification, and the multilabel semisupervised image classification, and NPMatch outperforms stateoftheart (SOTA) approaches or achieves competitive results on them, which shows the effectiveness of NPMatch and its potential for SSL. The codes are at https://github.com/JianfWang/NPMatch
 [116] arXiv:2301.13576 (crosslist from cs.AI) [pdf, other]

Title: Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022Authors: PierreEtienne Martin (MPIEVA), Jordan Calandre (MIA), Boris Mansencal (LaBRI), Jenny BenoisPineau (LaBRI), Renaud Péteri (MIA), Laurent Mascarilla (MIA), Julien MorlierComments: MediaEval 2022 Workshop, Jan 2023, Bergen, Norway. arXiv admin note: substantial text overlap with arXiv:2112.11384Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); HumanComputer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM)
Sports video analysis is a widespread research topic. Its applications are very diverse, like events detection during a match, video summary, or finegrained movement analysis of athletes. As part of the MediaEval 2022 benchmarking initiative, this task aims at detecting and classifying subtle movements from sport videos. We focus on recordings of table tennis matches. Conducted since 2019, this task provides a classification challenge from untrimmed videos recorded under natural conditions with known temporal boundaries for each stroke. Since 2021, the task also provides a stroke detection challenge from unannotated, untrimmed videos. This year, the training, validation, and test sets are enhanced to ensure that all strokes are represented in each dataset. The dataset is now similar to the one used in [1, 2]. This research is intended to build tools for coaches and athletes who want to further evaluate their sport performances.
 [117] arXiv:2301.13667 (crosslist from cs.RO) [pdf, other]

Title: Collisionaware Inhand 6D Object Pose Estimation using Multiple Visionbased Tactile SensorsComments: Accepted for publication at 2023 IEEE International Conference on Robotics and Automation (ICRA)Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
In this paper, we address the problem of estimating the inhand 6D pose of an object in contact with multiple visionbased tactile sensors. We reason on the possible spatial configurations of the sensors along the object surface. Specifically, we filter contact hypotheses using geometric reasoning and a Convolutional Neural Network (CNN), trained on simulated objectagnostic images, to promote those that better comply with the actual tactile images from the sensors. We use the selected sensors configurations to optimize over the space of 6D poses using a Gradient Descentbased approach. We finally rank the obtained poses by penalizing those that are in collision with the sensors. We carry out experiments in simulation using the DIGIT visionbased sensor with several objects, from the standard YCB model set. The results demonstrate that our approach estimates object poses that are compatible with actual objectsensor contacts in $87.5\%$ of cases while reaching an average positional error in the order of $2$ centimeters. Our analysis also includes qualitative results of experiments with a real DIGIT sensor.
 [118] arXiv:2301.13668 (crosslist from cs.CL) [pdf]

Title: Automated Sentiment and Hate Speech Analysis of Facebook Data by Employing Multilingual Transformer ModelsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
In recent years, there has been a heightened consensus within academia and in the public discourse that Social Media Platforms (SMPs), amplify the spread of hateful and negative sentiment content. Researchers have identified how hateful content, political propaganda, and targeted messaging contributed to realworld harms including insurrections against democratically elected governments, genocide, and breakdown of social cohesion due to heightened negative discourse towards certain communities in parts of the world. To counter these issues, SMPs have created semiautomated systems that can help identify toxic speech. In this paper we analyse the statistical distribution of hateful and negative sentiment contents within a representative Facebook dataset (n= 604,703) scrapped through 648 public Facebook pages which identify themselves as proponents (and followers) of farright Hindutva actors. These pages were identified manually using keyword searches on Facebook and on CrowdTangleand classified as farright Hindutva pages based on page names, page descriptions, and discourses shared on these pages. We employ stateoftheart, opensource XLMT multilingual transformerbased language models to perform sentiment and hate speech analysis of the textual contents shared on these pages over a period of 5.5 years. The result shows the statistical distributions of the predicted sentiment and the hate speech labels; top actors, and top page categories. We further discuss the benchmark performances and limitations of these pretrained language models.
 [119] arXiv:2301.13669 (crosslist from quantph) [pdf, other]

Title: Reinforcement learning and decision making via singlephoton quantum walksComments: 10+6 pages, 6+5 figures, 2 tables. F. Flamini and M. Krumm contributed equally to this workSubjects: Quantum Physics (quantph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Variational quantum algorithms represent a promising approach to quantum machine learning where classical neural networks are replaced by parametrized quantum circuits. Here, we present a variational approach to quantize projective simulation (PS), a reinforcement learning model aimed at interpretable artificial intelligence. Decision making in PS is modeled as a random walk on a graph describing the agent's memory. To implement the quantized model, we consider quantum walks of single photons in a lattice of tunable MachZehnder interferometers. We propose variational algorithms tailored to reinforcement learning tasks, and we show, using an example from transfer learning, that the quantized PS learning model can outperform its classical counterpart. Finally, we discuss the role of quantum interference for training and decision making, paving the way for realizations of interpretable quantum learning agents.
 [120] arXiv:2301.13674 (crosslist from eess.IV) [pdf, other]

Title: Improved distinct bone segmentation in upperbody CT through multiresolution networksAuthors: Eva Schnider, Julia Wolleb, Antal Huck, Mireille Toranelli, Georg Rauter, Magdalena MüllerGerbl, Philippe C. CattinComments: Under submissionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Purpose: Automated distinct bone segmentation from CT scans is widely used in planning and navigation workflows. UNet variants are known to provide excellent results in supervised semantic segmentation. However, in distinct bone segmentation from upper body CTs a large field of view and a computationally taxing 3D architecture are required. This leads to lowresolution results lacking detail or localisation errors due to missing spatial context when using highresolution inputs.
Methods: We propose to solve this problem by using endtoend trainable segmentation networks that combine several 3D UNets working at different resolutions. Our approach, which extends and generalizes HookNet and MRN, captures spatial information at a lower resolution and skips the encoded information to the target network, which operates on smaller highresolution inputs. We evaluated our proposed architecture against single resolution networks and performed an ablation study on information concatenation and the number of context networks.
Results: Our proposed best network achieves a median DSC of 0.86 taken over all 125 segmented bone classes and reduces the confusion among similarlooking bones in different locations. These results outperform our previously published 3D UNet baseline results on the task and distinctbone segmentation results reported by other groups.
Conclusion: The presented multiresolution 3D UNets address current shortcomings in bone segmentation from upperbody CT scans by allowing for capturing a larger field of view while avoiding the cubic growth of the input pixels and intermediate computations that quickly outgrow the computational capacities in 3D. The approach thus improves the accuracy and efficiency of distinct bone segmentation from upperbody CT.  [121] arXiv:2301.13688 (crosslist from cs.AI) [pdf, other]

Title: The Flan Collection: Designing Data and Methods for Effective Instruction TuningAuthors: Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam RobertsSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable FlanT5 to outperform prior work by 317%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zeroshot, fewshot, and chainofthought) actually yields stronger (2%+) performance in all settings. In further experiments, we show FlanT5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instructiontuned models as more computationallyefficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at https://github.com/googleresearch/FLAN/tree/main/flan/v2.
 [122] arXiv:2301.13691 (crosslist from cs.AI) [pdf, other]

Title: Time Series Forecasting via SemiAsymmetric Convolutional Architecture with Global Atrous Sliding WindowAuthors: Yuanpeng HeComments: 13pages,8 figuresSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The proposed method in this paper is designed to address the problem of time series forecasting. Although some exquisitely designed models achieve excellent prediction performances, how to extract more useful information and make accurate predictions is still an open issue. Most of modern models only focus on a short range of information, which are fatal for problems such as time series forecasting which needs to capture longterm information characteristics. As a result, the main concern of this work is to further mine relationship between local and global information contained in time series to produce more precise predictions. In this paper, to satisfactorily realize the purpose, we make three main contributions that are experimentally verified to have performance advantages. Firstly, original time series is transformed into difference sequence which serves as input to the proposed model. And secondly, we introduce the global atrous sliding window into the forecasting model which references the concept of fuzzy time series to associate relevant global information with temporal data within a time period and utilizes centralbidirectional atrous algorithm to capture underlyingrelated features to ensure validity and consistency of captured data. Thirdly, a variation of widelyused asymmetric convolution which is called semiasymmetric convolution is devised to more flexibly extract relationships in adjacent elements and corresponding associated global features with adjustable ranges of convolution on vertical and horizontal directions. The proposed model in this paper achieves stateoftheart on most of time series datasets provided compared with competitive modern models.
 [123] arXiv:2301.13710 (crosslist from stat.ML) [pdf, other]

Title: On the Initialisation of Wide LowRank Feedforward Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The edgeofchaos dynamics of wide randomly initialized lowrank feedforward networks are analyzed. Formulae for the optimal weight and bias variances are extended from the fullrank to lowrank setting and are shown to follow from multiplicative scaling. The principle second order effect, the variance of the inputoutput Jacobian, is derived and shown to increase as the rank to width ratio decreases. These results inform practitioners how to randomly initialize feedforward networks with a reduced number of learnable parameters while in the same ambient dimension, allowing reductions in the computational cost and memory constraints of the associated network.
 [124] arXiv:2301.13724 (crosslist from stat.ML) [pdf, other]

Title: The passive symmetries of machine learningAuthors: Soledad Villar (JHU), David W. Hogg (NYU, MPIA, Flatiron), Weichi Yao (NYU), George A. Kevrekidis (JHU, LANL), Bernhard Schölkopf (MPIIS)Subjects: Machine Learning (stat.ML); Instrumentation and Methods for Astrophysics (astroph.IM); Machine Learning (cs.LG); Mathematical Physics (mathph); Data Analysis, Statistics and Probability (physics.dataan)
Any representation of data involves arbitrary investigator choices. Because those choices are external to the datagenerating process, each choice leads to an exact symmetry, corresponding to the group of transformations that takes one possible representation to another. These are the passive symmetries; they include coordinate freedom, gauge symmetry and units covariance, all of which have led to important results in physics. Our goal is to understand the implications of passive symmetries for machine learning: Which passive symmetries play a role (e.g., permutation symmetry in graph neural networks)? What are dos and don'ts in machine learning practice? We assay conditions under which passive symmetries can be implemented as group equivariances. We also discuss links to causal modeling, and argue that the implementation of passive symmetries is particularly valuable when the goal of the learning problem is to generalize out of sample. While this paper is purely conceptual, we believe that it can have a significant impact on helping machine learning make the transition that took place for modern physics in the first half of the Twentieth century.
 [125] arXiv:2301.13728 (crosslist from physics.fludyn) [pdf, ps, other]

Title: Convolutional autoencoder for the spatiotemporal latent representation of turbulenceSubjects: Fluid Dynamics (physics.fludyn); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
Turbulence is characterised by chaotic dynamics and a highdimensional state space, which make the phenomenon challenging to predict. However, turbulent flows are often characterised by coherent spatiotemporal structures, such as vortices or largescale modes, which can help obtain a latent description of turbulent flows. However, current approaches are often limited by either the need to use some form of thresholding on quantities defining the isosurfaces to which the flow structures are associated or the linearity of traditional modal flow decomposition approaches, such as those based on proper orthogonal decomposition. This problem is exacerbated in flows that exhibit extreme events, which are rare and sudden changes in a turbulent state. The goal of this paper is to obtain an efficient and accurate reducedorder latent representation of a turbulent flow that exhibits extreme events. Specifically, we employ a threedimensional multiscale convolutional autoencoder (CAE) to obtain such latent representation. We apply it to a threedimensional turbulent flow. We show that the Multiscale CAE is efficient, requiring less than 10% degrees of freedom than proper orthogonal decomposition for compressing the data and is able to accurately reconstruct flow states related to extreme events. The proposed deep learning architecture opens opportunities for nonlinear reducedorder modeling of turbulent flows from data.
 [126] arXiv:2301.13731 (crosslist from stat.ML) [pdf, other]

Title: A relaxed proximal gradient descent algorithm for convergent plugandplay with proximal denoiserSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Optimization and Control (math.OC)
This paper presents a new convergent PlugandPlay (PnP) algorithm. PnP methods are efficient iterative algorithms for solving image inverse problems formulated as the minimization of the sum of a datafidelity term and a regularization term. PnP methods perform regularization by plugging a pretrained denoiser in a proximal algorithm, such as Proximal Gradient Descent (PGD). To ensure convergence of PnP schemes, many works study specific parametrizations of deep denoisers. However, existing results require either unverifiable or suboptimal hypotheses on the denoiser, or assume restrictive conditions on the parameters of the inverse problem. Observing that these limitations can be due to the proximal algorithm in use, we study a relaxed version of the PGD algorithm for minimizing the sum of a convex function and a weakly convex one. When plugged with a relaxed proximal denoiser, we show that the proposed PnP$\alpha$PGD algorithm converges for a wider range of regularization parameters, thus allowing more accurate image restoration.
 [127] arXiv:2301.13741 (crosslist from cs.CV) [pdf, other]

Title: UPop: Unified and Progressive Pruning for Compressing VisionLanguage TransformersComments: 16 pages, 5 figures, 13 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Realworld data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, e.g., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially visonlanguage Transformers, is still underexplored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (UPop) as a universal visonlanguage Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. Experiments on multiple generative and discriminative visionlanguage tasks, including Visual Reasoning, Image Caption, Visual Question Answer, ImageText Retrieval, TextImage Retrieval, and Image Classification, demonstrate the effectiveness and versatility of the proposed UPop framework.
 [128] arXiv:2301.13743 (crosslist from cs.CV) [pdf, other]

Title: ZeroshotLearning CrossModality Data Translation Through Mutual Information Guided Stochastic DiffusionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Crossmodality data translation has attracted great interest in image computing. Deep generative models (\textit{e.g.}, GANs) show performance improvement in tackling those problems. Nevertheless, as a fundamental challenge in image translation, the problem of ZeroshotLearning CrossModality Data Translation with fidelity remains unanswered. This paper proposes a new unsupervised zeroshotlearning method named Mutual Information guided Diffusion crossmodality data translation Model (MIDiffusion), which learns to translate the unseen source data to the target domain. The MIDiffusion leverages a scorematchingbased generative model, which learns the prior knowledge in the target domain. We propose a differentiable localwiseMILayer ($LMI$) for conditioning the iterative denoising sampling. The $LMI$ captures the identical crossmodality features in the statistical domain for the diffusion guidance; thus, our method does not require retraining when the source domain is changed, as it does not rely on any direct mapping between the source and target domains. This advantage is critical for applying crossmodality data translation methods in practice, as a reasonable amount of source domain dataset is not always available for supervised training. We empirically show the advanced performance of MIDiffusion in comparison with an influential group of generative models, including adversarialbased and other scorematchingbased models.
 [129] arXiv:2301.13749 (crosslist from stat.CO) [pdf, ps, other]

Title: Multifidelity covariance estimation in the logEuclidean geometryComments: 27 pages, 10 figures, code supplementSubjects: Computation (stat.CO); Machine Learning (cs.LG); Numerical Analysis (math.NA)
We introduce a multifidelity estimator of covariance matrices that employs the logEuclidean geometry of the symmetric positivedefinite manifold. The estimator fuses samples from a hierarchy of data sources of differing fidelities and costs for variance reduction while guaranteeing definiteness, in contrast with previous approaches. The new estimator makes covariance estimation tractable in applications where simulation or data collection is expensive; to that end, we develop an optimal sample allocation scheme that minimizes the meansquared error of the estimator given a fixed budget. Guaranteed definiteness is crucial to metric learning, data assimilation, and other downstream tasks. Evaluations of our approach using data from physical applications (heat conduction, fluid dynamics) demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks.
 [130] arXiv:2301.13755 (crosslist from cs.AI) [pdf, other]

Title: Retrosynthetic Planning with Dual Value NetworksAuthors: Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, TieYan LiuSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of MLbased singlestep reaction predictors with multistep planners has led to promising results. However, the singlestep predictors are mostly trained offline to optimize the singlestep accuracy, without considering complete routes. Here, we leverage reinforcement learning (RL) to improve the singlestep predictor, by using a treeshaped MDP to optimize complete routes while retaining singlestep accuracy. Desirable routes should be both synthesizable and of low cost. We propose an online training algorithm, called Planning with Dual Value Networks (PDVN), in which two value networks predict the synthesizability and cost of molecules, respectively. To maintain the singlestep accuracy, we design a twobranch network structure for the singlestep predictor. On the widelyused USPTO dataset, our PDVN algorithm improves the search success rate of existing multistep planners (e.g., increasing the success rate from 85.79% to 98.95% for Retro*, and reducing the number of model calls by half while solving 99.47% molecules for RetroGraph). Furthermore, PDVN finds shorter synthesis routes (e.g., reducing the average route length from 5.76 to 4.83 for Retro*, and from 5.63 to 4.78 for RetroGraph).
 [131] arXiv:2301.13758 (crosslist from cs.AI) [pdf, other]

Title: Learning, Fast and Slow: A GoalDirected MemoryBased Approach for Dynamic EnvironmentsComments: 22 pagesSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Modelbased next state prediction and state value prediction are slow to converge. To address these challenges, we do the following: i) Instead of a neural network, we do modelbased planning using a parallel memory retrieval system (which we term the slow mechanism); ii) Instead of learning state values, we guide the agent's actions using goaldirected exploration, by using a neural network to choose the next action given the current state and the goal state (which we term the fast mechanism). The goaldirected exploration is trained online using hippocampal replay of visited states and future imagined states every single time step, leading to fast and efficient training. Empirical studies show that our proposed method has a 92% solve rate across 100 episodes in a dynamically changing grid world, significantly outperforming stateoftheart actor critic mechanisms such as PPO (54%), TRPO (50%) and A2C (24%). Ablation studies demonstrate that both mechanisms are crucial. We posit that the future of Reinforcement Learning (RL) will be to model goals and subgoals for various tasks, and plan it out in a goaldirected memorybased approach.
 [132] arXiv:2301.13778 (crosslist from stat.ML) [pdf, other]

Title: Differentially Private Distributed Bayesian Linear Regression with MCMCComments: 20 pages, 3 figures, code available at: this https URLSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
We propose a novel Bayesian inference framework for distributed differentially private linear regression. We consider a distributed setting where multiple parties hold parts of the data and share certain summary statistics of their portions in privacypreserving noise. We develop a novel generative statistical model for privately shared statistics, which exploits a useful distributional relation between the summary statistics of linear regression. Bayesian estimation of the regression coefficients is conducted mainly using Markov chain Monte Carlo algorithms, while we also provide a fast version to perform Bayesian estimation in one iteration. The proposed methods have computational advantages over their competitors. We provide numerical results on both real and simulated data, which demonstrate that the proposed algorithms provide wellrounded estimation and prediction.
 [133] arXiv:2301.13786 (crosslist from eess.IV) [pdf, other]

Title: Deep learningbased lung segmentation and automatic regional template in chest Xray images for pediatric tuberculosisAuthors: Daniel CapellánMartín, Juan J. GómezValverde, Ramon SanchezJacob, David BermejoPeláez, Lara GarcíaDelgado, Elisa LópezVarela, Maria J. LedesmaCarbayoComments: This work has been accepted at the SPIE Medical Imaging 2023, Image Processing conferenceSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Tuberculosis (TB) is still considered a leading cause of death and a substantial threat to global child health. Both TB infection and disease are curable using antibiotics. However, most children who die of TB are never diagnosed or treated. In clinical practice, experienced physicians assess TB by examining chest Xrays (CXR). Pediatric CXR has specific challenges compared to adult CXR, which makes TB diagnosis in children more difficult. Computeraided diagnosis systems supported by Artificial Intelligence have shown performance comparable to experienced radiologist TB readings, which could ease mass TB screening and reduce clinical burden. We propose a multiview deep learningbased solution which, by following a proposed template, aims to automatically regionalize and extract lung and mediastinal regions of interest from pediatric CXR images where key TB findings may be present. Experimental results have shown accurate region extraction, which can be used for further analysis to confirm TB finding presence and severity assessment. Code publicly available at https://github.com/danicapellan/pTB_LungRegionExtractor.
 [134] arXiv:2301.13791 (crosslist from stat.ML) [pdf, other]

Title: Improved Algorithms for Multiperiod Multiclass Packing Problems with~Bandit~FeedbackComments: 42 pages including AppendixSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We consider the linear contextual multiclass multiperiod packing problem~(LMMP) where the goal is to pack items such that the total vector of consumption is below a given budget vector and the total value is as large as possible. We consider the setting where the reward and the consumption vector associated with each action is a classdependent linear function of the context, and the decisionmaker receives bandit feedback. LMMP includes linear contextual bandits with knapsacks and online revenue management as special cases. We establish a new more efficient estimator which guarantees a faster convergence rate, and consequently, a lower regret in such problems. We propose a bandit policy that is a closedform function of said estimated parameters. When the contexts are nondegenerate, the regret of the proposed policy is sublinear in the context dimension, the number of classes, and the time horizon~$T$ when the budget grows at least as $\sqrt{T}$. We also resolve an open problem posed in Agrawal & Devanur (2016), and extend the result to a multiclass setting. Our numerical experiments clearly demonstrate that the performance of our policy is superior to other benchmarks in the literature.
 [135] arXiv:2301.13803 (crosslist from cs.CV) [pdf, other]

Title: Fairnessaware Vision Transformer via Debiased SelfAttentionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Vision Transformer (ViT) has recently gained significant interest in solving computer vision (CV) problems due to its capability of extracting informative features and modeling longrange dependencies through the selfattention mechanism. To fully realize the advantages of ViT in realworld applications, recent works have explored the trustworthiness of ViT, including its robustness and explainability. However, another desiderata, fairness has not yet been adequately addressed in the literature. We establish that the existing fairnessaware algorithms (primarily designed for CNNs) do not perform well on ViT. This necessitates the need for developing our novel framework via Debiased SelfAttention (DSA). DSA is a fairnessthroughblindness approach that enforces ViT to eliminate spurious features correlated with the sensitive attributes for bias mitigation. Notably, adversarial examples are leveraged to locate and mask the spurious features in the input image patches. In addition, DSA utilizes an attention weights alignment regularizer in the training objective to encourage learning informative features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance
 [136] arXiv:2301.13807 (crosslist from cs.SE) [pdf, other]

Title: Identifying the Hazard Boundary of MLenabled Autonomous Systems Using Cooperative CoEvolutionary SearchSubjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
In Machine Learning (ML)enabled autonomous systems (MLASs), it is essential to identify the hazard boundary of ML Components (MLCs) in the MLAS under analysis. Given that such boundary captures the conditions in terms of MLC behavior and system context that can lead to hazards, it can then be used to, for example, build a safety monitor that can take any predefined fallback mechanisms at runtime when reaching the hazard boundary. However, determining such hazard boundary for an ML component is challenging. This is due to the space combining system contexts (i.e., scenarios) and MLC behaviors (i.e., inputs and outputs) being far too large for exhaustive exploration and even to handle using conventional metaheuristics, such as genetic algorithms. Additionally, the high computational cost of simulations required to determine any MLAS safety violations makes the problem even more challenging. Furthermore, it is unrealistic to consider a region in the problem space deterministically safe or unsafe due to the uncontrollable parameters in simulations and the nonlinear behaviors of ML models (e.g., deep neural networks) in the MLAS under analysis. To address the challenges, we propose MLCSHE (ML Component Safety Hazard Envelope), a novel method based on a Cooperative CoEvolutionary Algorithm (CCEA), which aims to tackle a highdimensional problem by decomposing it into two lowerdimensional search subproblems. Moreover, we take a probabilistic view of safe and unsafe regions and define a novel fitness function to measure the distance from the probabilistic hazard boundary and thus drive the search effectively. We evaluate the effectiveness and efficiency of MLCSHE on a complex Autonomous Vehicle (AV) case study. Our evaluation results show that MLCSHE is significantly more effective and efficient compared to a standard genetic algorithm and random search.
 [137] arXiv:2301.13819 (crosslist from cs.CL) [pdf, other]

Title: CausalDiscovery Performance of ChatGPT in the context of Neuropathic Pain DiagnosisSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
ChatGPT has demonstrated exceptional proficiency in natural language conversation, e.g., it can answer a wide range of questions while no previous large language models can. Thus, we would like to push its limit and explore its ability to answer causal discovery questions by using a medical benchmark (Tu et al. 2019) in causal discovery.
 [138] arXiv:2301.13823 (crosslist from cs.CL) [pdf, other]

Title: Grounding Language Models to Images for Multimodal GenerationComments: Project page: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We propose an efficient method to ground pretrained textonly language models to the visual domain, enabling them to process and generate arbitrarily interleaved imageandtext data. Our method leverages the abilities of language models learnt from large scale textonly pretraining, such as incontext learning and freeform text generation. We keep the language model frozen, and finetune input and output linear layers to enable crossmodality interactions. This allows our model to process arbitrarily interleaved imageandtext inputs, and generate freeform text interleaved with retrieved images. We achieve strong zeroshot performance on grounded tasks such as contextual image retrieval and multimodal dialogue, and showcase compelling interactive abilities. Our approach works with any offtheshelf language model and paves the way towards an effective, general solution for leveraging pretrained language models in visually grounded settings.
 [139] arXiv:2301.13826 (crosslist from cs.CV) [pdf, other]

Title: AttendandExcite: AttentionBased Semantic Guidance for TexttoImage Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Graphics (cs.GR); Machine Learning (cs.LG)
Recent texttoimage generative models have demonstrated an unparalleled ability to generate diverse and creative imagery guided by a target text prompt. While revolutionary, current stateoftheart diffusion models may still fail in generating images that fully convey the semantics in the given text prompt. We analyze the publicly available Stable Diffusion model and assess the existence of catastrophic neglect, where the model fails to generate one or more of the subjects from the input prompt. Moreover, we find that in some cases the model also fails to correctly bind attributes (e.g., colors) to their corresponding subjects. To help mitigate these failure cases, we introduce the concept of Generative Semantic Nursing (GSN), where we seek to intervene in the generative process on the fly during inference time to improve the faithfulness of the generated images. Using an attentionbased formulation of GSN, dubbed AttendandExcite, we guide the model to refine the crossattention units to attend to all subject tokens in the text prompt and strengthen  or excite  their activations, encouraging the model to generate all subjects described in the text prompt. We compare our approach to alternative approaches and demonstrate that it conveys the desired concepts more faithfully across a range of text prompts.
 [140] arXiv:2301.13838 (crosslist from cs.CR) [pdf, other]

Title: Image Shortcut Squeezing: Countering Perturbative Availability Poisons with CompressionComments: Our code is available at this https URLSubjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Perturbative availability poisoning (PAP) adds small changes to images to prevent their use for model training. Current research adopts the belief that practical and effective approaches to countering such poisons do not exist. In this paper, we argue that it is time to abandon this belief. We present extensive experiments showing that 12 stateoftheart PAP methods are vulnerable to Image Shortcut Squeezing (ISS), which is based on simple compression. For example, on average, ISS restores the CIFAR10 model accuracy to $81.73\%$, surpassing the previous best preprocessingbased countermeasures by $37.97\%$ absolute. ISS also (slightly) outperforms adversarial training and has higher generalizability to unseen perturbation norms and also higher efficiency. Our investigation reveals that the property of PAP perturbations depends on the type of surrogate model used for poison generation, and it explains why a specific ISS compression yields the best performance for a specific type of PAP perturbation. We further test stronger, adaptive poisoning, and show it falls short of being an ideal defense against ISS. Overall, our results demonstrate the importance of considering various (simple) countermeasures to ensure the meaningfulness of analysis carried out during the development of availability poisons.
 [141] arXiv:2301.13848 (crosslist from cs.CL) [pdf, other]

Title: Benchmarking Large Language Models for News SummarizationAuthors: Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. HashimotoSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, we find instruction tuning, and not model size, is the key to the LLM's zeroshot summarization capability. Second, existing studies have been limited by lowquality references, leading to underestimates of human performance and lower fewshot and finetuning performance. To better evaluate LLMs, we perform human evaluation over highquality summaries we collect from freelance writers. Despite major stylistic differences such as the amount of paraphrasing, we find that LMM summaries are judged to be on par with human written summaries.
 [142] arXiv:2301.13850 (crosslist from math.ST) [pdf, ps, other]

Title: Gaussian Noise is Nearly Instance Optimal for Private Unbiased Mean EstimationSubjects: Statistics Theory (math.ST); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
We investigate unbiased highdimensional mean estimators in differential privacy. We consider differentially private mechanisms whose expected output equals the mean of the input dataset, for every dataset drawn from a fixed convex domain $K$ in $\mathbb{R}^d$. In the setting of concentrated differential privacy, we show that, for every input such an unbiased mean estimator introduces approximately at least as much error as a mechanism that adds Gaussian noise with a carefully chosen covariance. This is true when the error is measured with respect to $\ell_p$ error for any $p \ge 2$. We extend this result to local differential privacy, and to approximate differential privacy, but for the latter the error lower bound holds either for a dataset or for a neighboring dataset. We also extend our results to mechanisms that take i.i.d.~samples from a distribution over $K$ and are unbiased with respect to the mean of the distribution.
 [143] arXiv:2301.13852 (crosslist from cs.CL) [pdf, other]

Title: ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPTgenerated TextComments: 11 pages, 8 figures, 2 tablesSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
ChatGPT has the ability to generate grammatically flawless and seeminglyhuman replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPTgenerated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPTgenerated and humangenerated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing humangenerated and ChatGPTgenerated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original humangenerated reviews. We finetune a Transformerbased model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity scorebased approach and find that disambiguation between human and ChatGPTgenerated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79%. Using explainability, we observe that ChatGPT's writing is polite, without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.
 [144] arXiv:2301.13856 (crosslist from stat.ML) [pdf, other]

Title: Simplex Random FeaturesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors. We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels among the class of weightindependent geometricallycoupled positive random feature (PRF) mechanisms, substantially outperforming the previously most accurate Orthogonal Random Features at no observable extra cost. We present a more computationally expensive SimRFs+ variant, which we prove is asymptotically optimal in the broader family of weightdependent geometrical coupling schemes (which permit correlations between random vector directions and norms). In extensive empirical studies, we show consistent gains provided by SimRFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers.
Replacements for Wed, 1 Feb 23
 [145] arXiv:1903.07120 (replaced) [pdf, other]

Title: Stabilize Deep ResNet with A Sharp Scaling Factor $τ$Comments: Journal version (Published in Machine Learning Journal), 26 pagesJournalref: Machine Learning, 111(9), 33593392 (2022)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [146] arXiv:2012.12311 (replaced) [pdf]

Title: Video Influencers: Unboxing the MystiqueComments: 45 pages, Online AppendixSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
 [147] arXiv:2106.02600 (replaced) [pdf, other]

Title: Causal Graph Discovery from Self and Mutually Exciting Time SeriesComments: See v2 for a previous workshop paper on Interpretable ML in Healthcare (IMLH) at ICML 2021, titled "Causal Graph Recovery for SepsisAssociated Derangements via Interpretable Hawkes Networks". Also, see arXiv:2301.11336 for a short conference version with more experiments of our proposed method to learn "strict" DAGsSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Applications (stat.AP); Methodology (stat.ME)
 [148] arXiv:2108.08647 (replaced) [src]

Title: MultiCenter Federated Learning: Clients Clustering for Better PersonalizationComments: I have a duplicated arXiv versions for this paper that are 2108.08647 and 2005.01026. The 2108.08647 is a wrong version. I need use the 2005.01026 that has the most of citations of this work. I will withdraw this submission from 2108.08647, and then resubmit to 2005.01026. Sorry for causing confusionJournalref: World Wide Web 26(2023)481500Subjects: Machine Learning (cs.LG)
 [149] arXiv:2109.11808 (replaced) [pdf, other]

Title: A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative MeasurementsJournalref: Entropy, 25, 251 (2023)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Optimization and Control (math.OC)
 [150] arXiv:2109.12390 (replaced) [pdf, other]

Title: Model reduction for the material point method via an implicit neural representation of the deformation mapSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Graphics (cs.GR); Numerical Analysis (math.NA)
 [151] arXiv:2112.12047 (replaced) [pdf, other]

Title: Generating Synthetic Mixedtype Longitudinal Electronic Health Records for Artificial Intelligent ApplicationsComments: Main article (22 pages, 7 figures); Appendix (15 pages, 8 figures)Subjects: Machine Learning (cs.LG)
 [152] arXiv:2202.06924 (replaced) [pdf, other]

Title: Do Gradient Inversion Attacks Make Federated Learning Unsafe?Authors: Ali Hatamizadeh, Hongxu Yin, Pavlo Molchanov, Andriy Myronenko, Wenqi Li, Prerna Dogra, Andrew Feng, Mona G. Flores, Jan Kautz, Daguang Xu, Holger R. RothComments: Revised version; Accepted to IEEE Transactions on Medical Imaging; Improved and reformatted version of this https URL; Added NVFlare referenceSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
 [153] arXiv:2202.13100 (replaced) [pdf, other]

Title: SemSup: Semantic Supervision for Simple and Scalable Zeroshot GeneralizationSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
 [154] arXiv:2203.17193 (replaced) [pdf, other]

Title: Learning from many trajectoriesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [155] arXiv:2205.07246 (replaced) [pdf, other]

Title: FreeMatch: Selfadaptive Thresholding for Semisupervised LearningAuthors: Yidong Wang, Hao Chen, Qiang Heng, Wenxin Hou, Yue Fan, Zhen Wu, Jindong Wang, Marios Savvides, Takahiro Shinozaki, Bhiksha Raj, Bernt Schiele, Xing XieComments: Accepted by ICLR 2023. Code: this https URLSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [156] arXiv:2205.11736 (replaced) [pdf, other]

Title: Towards a Defense Against Federated Backdoor Attacks Under Continuous TrainingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
 [157] arXiv:2205.13114 (replaced) [pdf, other]

Title: Contextual Pandora's BoxAuthors: Alexia Atsidakou, Constantine Caramanis, Evangelia Gergatsouli, Orestis Papadigenopoulos, Christos TzamosSubjects: Machine Learning (cs.LG)
 [158] arXiv:2205.13421 (replaced) [pdf, other]

Title: Bias in Machine Learning Models Can Be Significantly Mitigated by Careful Training: Evidence from Neuroimaging StudiesJournalref: Proceedings of the National Academy of Sciences 120.6 (2023)Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
 [159] arXiv:2205.14116 (replaced) [pdf, other]

Title: Don't Explain Noise: Robust Counterfactuals for Randomized EnsemblesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
 [160] arXiv:2205.14938 (replaced) [pdf, other]

Title: Spectral Maps for Learning on SubgraphsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [161] arXiv:2206.07912 (replaced) [pdf, other]

Title: Double Sampling Randomized SmoothingComments: ICML 2022; minor typos fixed; minor data corrected on Page 42 (no influence on conclusions)Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Statistics Theory (math.ST)
 [162] arXiv:2206.08890 (replaced) [pdf, other]

Title: Disentangling Model Multiplicity in Deep LearningComments: 13 pages, 6 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [163] arXiv:2206.10206 (replaced) [pdf, other]

Title: Personalized Subgraph Federated LearningSubjects: Machine Learning (cs.LG)
 [164] arXiv:2206.10815 (replaced) [pdf, other]

Title: FedBC: Calibrating Global and Local Models via Federated Learning Beyond ConsensusAuthors: Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh ManochaSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
 [165] arXiv:2207.02016 (replaced) [pdf, other]

Title: Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set RegularizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
 [166] arXiv:2207.03928 (replaced) [pdf, other]

Title: Accelerating Material Design with the Generative Toolkit for Scientific DiscoveryAuthors: Matteo Manica, Jannis Born, Joris Cadow, Dimitrios Christofidellis, Ashish Dave, Dean Clarke, Yves Gaetan Nana Teukam, Giorgio Giannone, Samuel C. Hoffman, Matthew Buchan, Vijil Chenthamarakshan, Timothy Donovan, Hsiang Han Hsu, Federico Zipoli, Oliver Schilter, Akihiro Kishimoto, Lisa Hamada, Inkit Padhi, Karl Wehden, Lauren McHugh, Alexy Khrabrov, Payel Das, Seiji Takeda, John R. SmithComments: 15 pages, 2 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
 [167] arXiv:2207.05022 (replaced) [pdf, other]

Title: STI: Turbocharge NLP Inference at the Edge via Elastic PipeliningComments: ASPLOS'23Subjects: Machine Learning (cs.LG)
 [168] arXiv:2207.10541 (replaced) [pdf, other]

Title: Optimal precision for GANsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [169] arXiv:2207.11727 (replaced) [pdf, other]

Title: Can we achieve robustness from data alone?Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [170] arXiv:2208.10483 (replaced) [pdf, other]

Title: Prioritizing Samples in Reinforcement Learning with Reducible LossComments: DeepRL Workshop, NeurIPS 2022Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [171] arXiv:2209.02606 (replaced) [pdf, other]

Title: Unifying Generative Models with GFlowNets and BeyondComments: expanded version of the ICML 2022 workshop paperSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [172] arXiv:2209.06203 (replaced) [pdf, other]

Title: Normalizing Flows for Interventional Density EstimationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME)
 [173] arXiv:2209.07932 (replaced) [pdf, other]

Title: Finetuning or toptuning? Transfer learning with pretrained features and fast kernel methodsSubjects: Machine Learning (cs.LG)
 [174] arXiv:2210.03175 (replaced) [pdf, other]

Title: Weak Proxies are Sufficient and Preferable for Fairness with Missing Sensitive AttributesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
 [175] arXiv:2210.04296 (replaced) [pdf, other]

Title: Improving Scorebased Diffusion Models by Enforcing the Underlying Score FokkerPlanck EquationAuthors: ChiehHsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano ErmonSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [176] arXiv:2210.04345 (replaced) [pdf, other]

Title: LieGG: Studying Learned Lie Group GeneratorsSubjects: Machine Learning (cs.LG)
 [177] arXiv:2210.05577 (replaced) [pdf, other]

Title: What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?Comments: NeurIPS 2022; added link to GitHub repositorySubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
 [178] arXiv:2210.05974 (replaced) [pdf, other]

Title: Clustering the Sketch: A Novel Approach to Embedding Table CompressionSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
 [179] arXiv:2210.08942 (replaced) [pdf, other]

Title: MetaLearning via Classifier(free) Diffusion GuidanceSubjects: Machine Learning (cs.LG)
 [180] arXiv:2210.11794 (replaced) [pdf, other]

Title: Diffuser: Efficient Transformers with Multihop Attention Diffusion for Long SequencesSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
 [181] arXiv:2210.12057 (replaced) [pdf, ps, other]

Title: Efficient Global Planning in Large MDPs via Stochastic PrimalDual OptimizationComments: 23 pages including reference and appendixSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
 [182] arXiv:2211.01852 (replaced) [pdf, other]

Title: Revisiting Hyperparameter Tuning with Differential PrivacyComments: ML Safety Workshop of NeurIPS'22 Accepted PaperSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
 [183] arXiv:2211.06007 (replaced) [pdf, other]

Title: Continuous Soft PseudoLabeling in ASRSubjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
 [184] arXiv:2211.07675 (replaced) [pdf, ps, other]

Title: On the Global Convergence of Fitted QIteration with Twolayer Neural Network ParametrizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [185] arXiv:2211.14594 (replaced) [pdf, other]

Title: DirectEffect Risk Minimization for Domain GeneralizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [186] arXiv:2211.16553 (replaced) [pdf, other]

Title: Hierarchically Clustered PCA, LLE, and CCA via a Convex Clustering PenaltyComments: 11 pages, 4 figures, 3 tablesSubjects: Machine Learning (cs.LG); Quantitative Methods (qbio.QM); Machine Learning (stat.ML)
 [187] arXiv:2212.02742 (replaced) [pdf, other]

Title: A Learning Based Hypothesis Test for Harmful Covariate ShiftSubjects: Machine Learning (cs.LG)
 [188] arXiv:2212.07346 (replaced) [pdf, other]

Title: Learning useful representations for shifting tasks and distributionsComments: 20 pagesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
 [189] arXiv:2212.08379 (replaced) [pdf, other]

Title: GeneFormer: Learned Gene Compression using Transformerbased Context ModelingSubjects: Machine Learning (cs.LG); Genomics (qbio.GN)
 [190] arXiv:2212.13556 (replaced) [pdf, other]

Title: Limitations of InformationTheoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex OptimizationAuthors: Mahdi Haghifam, Borja RodríguezGálvez, Ragnar Thobaben, Mikael Skoglund, Daniel M. Roy, Gintare Karolina DziugaiteComments: 49 pages, 2 figures. To appear, Proc. International Conference on Algorithmic Learning Theory (ALT), 2023Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [191] arXiv:2301.05911 (replaced) [pdf, other]

Title: DayAhead PV Power Forecasting Based on MSTLTFTSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
 [192] arXiv:2301.06695 (replaced) [pdf, ps, other]

Title: Quantifying and Managing Impacts of Concept Drifts on IoT Traffic Inference in Residential ISP NetworksAuthors: Arman Pashamokhtari, Norihiro Okui, Masataka Nakahara, Ayumu Kubota, Gustavo Batista, Hassan Habibi GharakheiliComments: Submitted to IEEE IoT JournalSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Networking and Internet Architecture (cs.NI)
 [193] arXiv:2301.08203 (replaced) [pdf, other]

Title: An SDE for Modeling SAM: Theory and InsightsAuthors: Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto, Frank Norbert Proske, Hans Kersting, Aurelien LucchiSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
 [194] arXiv:2301.09044 (replaced) [pdf, other]

Title: Learning to Reject with a Fixed Predictor: Application to DecontextualizationSubjects: Machine Learning (cs.LG)
 [195] arXiv:2301.11308 (replaced) [pdf, other]

Title: Neural ContinuousDiscrete State Space Models for IrregularlySampled Time SeriesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [196] arXiv:2301.12056 (replaced) [pdf, other]

Title: Variational Latent Branching Model for OffPolicy EvaluationComments: Accepted to ICLR 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [197] arXiv:2301.12386 (replaced) [pdf, other]

Title: Learning to reject meets OOD detection: Are all abstentions created equal?Subjects: Machine Learning (cs.LG)
 [198] arXiv:2301.12616 (replaced) [pdf, other]

Title: Active Sequential TwoSample TestingAuthors: Weizhi Li, Karthikeyan Natesan Ramamurthy, Prad Kadambi, Pouria Saidi, Gautam Dasarathy, Visar BerishaSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
 [199] arXiv:2301.12929 (replaced) [pdf, other]

Title: Can Persistent Homology provide an efficient alternative for Evaluation of Knowledge Graph Completion Methods?Authors: Anson Bastos, Kuldeep Singh, Abhishek Nadgeri, Johannes Hoffart, Toyotaro Suzumura, Manish SinghComments: To appear in proceedings of The Web Conference 2023 (WWW'23)Subjects: Machine Learning (cs.LG); Algebraic Topology (math.AT)
 [200] arXiv:2301.12935 (replaced) [src]

Title: ERASolver: ErrorRobust Adams Solver for Fast Sampling of Diffusion Probabilistic ModelsComments: Typo ErrorSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
 [201] arXiv:2301.13142 (replaced) [pdf]

Title: SelfCompressing Neural NetworksComments: Accepted submission to 2023 DLHardware CoDesign for AI AccelerationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
 [202] arXiv:2009.10622 (replaced) [pdf, ps, other]

Title: An $l_1$oracle inequality for the Lasso in highdimensional mixtures of experts modelsComments: Added more explanationsSubjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
 [203] arXiv:2103.09603 (replaced) [pdf, other]

Title: DoubleML  An ObjectOriented Implementation of Double Machine Learning in RComments: 42 pages, 8 Figures, 1 Table; Updated version for DoubleML > 0.5.0Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)
 [204] arXiv:2104.03509 (replaced) [pdf]

Title: PyFeat: Python Facial Expression Analysis ToolboxSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
 [205] arXiv:2106.02702 (replaced) [pdf, other]

Title: Subgroup Fairness in TwoSided MarketsSubjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Systems and Control (eess.SY)
 [206] arXiv:2110.05887 (replaced) [pdf, other]

Title: Discovery of Single Independent Latent VariableComments: Published as a conference paper at Neurips 2022Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [207] arXiv:2112.04417 (replaced) [pdf, other]

Title: What I Cannot Predict, I Do Not Understand: A HumanCentered Evaluation Framework for Explainability MethodsSubjects: Computer Vision and Pattern Recognition (cs.CV); HumanComputer Interaction (cs.HC); Machine Learning (cs.LG)
 [208] arXiv:2201.06463 (replaced) [pdf, other]

Title: Bayesian Calibration of Imperfect Computer Models using PhysicsInformed PriorsComments: 48 pages, 21 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
 [209] arXiv:2201.09592 (replaced) [pdf, other]

Title: Unsupervised Music Source Separation Using Differentiable Parametric Source ModelsComments: Revised version of the submissionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
 [210] arXiv:2202.07835 (replaced) [pdf, other]

Title: SecGNN: PrivacyPreserving Graph Neural Network Training and Inference as a Cloud ServiceComments: Accepted in IEEE Transactions on Services Computing (TSC)Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
 [211] arXiv:2206.02795 (replaced) [pdf]

Title: Forecasting COVID 19 cases using Statistical Models and Ontologybased Semantic Modelling: A real time data analytics approachSubjects: Populations and Evolution (qbio.PE); Machine Learning (cs.LG)
 [212] arXiv:2206.05576 (replaced) [pdf, other]

Title: Optimal Solutions for Joint Beamforming and Antenna Selection: From Branch and Bound to Graph Neural Imitation LearningSubjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG)
 [213] arXiv:2206.09241 (replaced) [pdf, other]

Title: An Empirical Study of Quantum Dynamics as a Ground State Problem with Neural Quantum StatesComments: 20 pages, 4 figuresSubjects: Quantum Physics (quantph); Machine Learning (cs.LG)
 [214] arXiv:2208.01003 (replaced) [pdf, other]

Title: What can be learnt with wide convolutional neural networks?Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [215] arXiv:2208.06987 (replaced) [pdf, other]

Title: A Unified Causal View of Domain Invariant Representation LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [216] arXiv:2208.08609 (replaced) [pdf, other]

Title: A Scalable, Interpretable, Verifiable & Differentiable Logic Gate Convolutional Neural Network Architecture From Truth TablesSubjects: Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Machine Learning (cs.LG); Symbolic Computation (cs.SC)
 [217] arXiv:2208.14960 (replaced) [pdf, other]

Title: Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces I: the compact caseSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
 [218] arXiv:2209.06300 (replaced) [pdf, other]

Title: PINCH: An Adversarial Extraction Attack Framework for Deep Learning ModelsComments: 19 pages, 13 figures, 5 tablesSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [219] arXiv:2209.09626 (replaced) [pdf, other]

Title: Sequence Learning using Equilibrium PropagationSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [220] arXiv:2209.13565 (replaced) [pdf, other]

Title: Neural parameter calibration for largescale multiagent modelsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
 [221] arXiv:2209.14074 (replaced) [pdf, other]

Title: ReciproCAM: Gradientfree reciprocal class activation mapSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [222] arXiv:2210.00079 (replaced) [pdf, other]

Title: Causal Estimation for Text Data with (Apparent) Overlap ViolationsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [223] arXiv:2210.00313 (replaced) [pdf, other]

Title: CRISP: Curriculum based Sequential Neural Decoders for Polar Code FamilyAuthors: S Ashwin Hebbar, Viraj Nadkarni, Ashok Vardhan Makkuva, Suma Bhat, Sewoong Oh, Pramod ViswanathComments: 22 pages, 23 figuresSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [224] arXiv:2210.01891 (replaced) [pdf, other]

Title: Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization under Concept ShiftSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [225] arXiv:2210.02129 (replaced) [pdf, other]

Title: Personalized Decentralized Bilevel Optimization over Random Directed NetworksComments: Under reviewSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [226] arXiv:2211.11865 (replaced) [pdf, other]

Title: Bayesian Learning for Neural Networks: an algorithmic surveySubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [227] arXiv:2211.14236 (replaced) [pdf, other]

Title: Strategyproof DecisionMaking in Panel Data Settings and BeyondSubjects: Econometrics (econ.EM); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
 [228] arXiv:2212.04325 (replaced) [pdf, ps, other]

Title: LatticeFree Sequence Discriminative Training for PhonemeBased Neural TransducersComments: submitted to ICASSP 2023Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
 [229] arXiv:2212.07231 (replaced) [pdf, ps, other]

Title: Cutting Plane Selection with Analytic Centers and MultiregressionSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
 [230] arXiv:2212.07383 (replaced) [pdf, other]

Title: Sequential Kernelized Independence TestingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
 [231] arXiv:2212.09412 (replaced) [pdf, other]

Title: Difformer: Empowering Diffusion Models on the Embedding Space for Text GenerationSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [232] arXiv:2301.06582 (replaced) [pdf, other]

Title: Antenna Array Calibration Via Gaussian Process ModelsComments: International ITG 26th Workshop on Smart Antennas and 13th Conference on Systems, Communications, and CodingSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
 [233] arXiv:2301.06916 (replaced) [pdf, other]

Title: Automated speech and textbased classification of neuropsychiatric conditions in a multidiagnostic settingAuthors: Lasse Hansen, Roberta Rocca, Arndis Simonsen, Alberto Parola, Vibeke Bliksted, Nicolai Ladegaard, Dan Bang, Kristian Tylén, Ethan Weed, Søren Dinesen Østergaard, Riccardo FusaroliComments: 24 pages, 5 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Applications (stat.AP)
 [234] arXiv:2301.07854 (replaced) [pdf, other]

Title: FETCM: FilterEnhanced Transformer Click Model for Web SearchSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
 [235] arXiv:2301.11482 (replaced) [pdf, ps, other]

Title: Diffusion Denoising for LowDoseCT ModelAuthors: Runyi LiSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [236] arXiv:2301.11719 (replaced) [pdf, other]

Title: Incorporating Knowledge into Document Summarization: an Application of PrefixTuning on GPT2Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [237] arXiv:2301.12231 (replaced) [pdf, other]

Title: Rateless Autoencoder Codes: Trading off Decoding Delay and ReliabilityAuthors: Vukan Ninkovic, Dejan Vukobratovic, Christian Häger, Henk Wymeersch, Alexandre Graell i AmatComments: 6 pages, 7 figures, to appear at IEEE ICC 2023Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
 [238] arXiv:2301.12254 (replaced) [pdf, other]

Title: Inference on the Optimal Assortment in the Multinomial Logit ModelSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [239] arXiv:2301.12623 (replaced) [pdf, other]

Title: FedPass: PrivacyPreserving Vertical Federated Deep Learning with Adaptive ObfuscationComments: 6 figures, 9 tablesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
 [240] arXiv:2301.13152 (replaced) [pdf, other]

Title: STEEL: Singularityaware Reinforcement LearningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM); Methodology (stat.ME)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2301, contact, help (Access key information)