We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 265 entries: 1-265 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 25 Jan 22

[1]  arXiv:2201.08848 [pdf, other]
Title: Lensing Machines: Representing Perspective in Latent Variable Models
Comments: Presented at The Ninth Advances in Cognitive Systems (ACS) Conference 2021 (arXiv:2201.06134)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

Many datasets represent a combination of different ways of looking at the same data that lead to different generalizations. For example, a corpus with examples generated by different people may be mixtures of many perspectives and can be viewed with different perspectives by others. It isnt always possible to represent the viewpoints by a clean separation, in advance, of examples representing each viewpoint and train a separate model for each viewpoint. We introduce lensing, a mixed initiative technique to extract lenses or mappings between machine learned representations and perspectives of human experts, and to generate lensed models that afford multiple perspectives of the same dataset. We apply lensing for two classes of latent variable models: a mixed membership model, a matrix factorization model in the context of two mental health applications, and we capture and imbue the perspectives of clinical psychologists into these models. Our work shows the benefits of the machine learning practitioner formally incorporating the perspective of a knowledgeable domain expert into their models rather than estimating unlensed models themselves in isolation.

[2]  arXiv:2201.08877 [pdf, other]
Title: Variational Autoencoder based Metamodeling for Multi-Objective Topology Optimization of Electrical Machines
Comments: 4 pages, 15 this article will appear in the proceedings of the IEEE transactions on magnetics as a conference paper
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)

Conventional magneto-static finite element analysis of electrical machine models is time-consuming and computationally expensive. Since each machine topology has a distinct set of parameters, design optimization is commonly performed independently. This paper presents a novel method for predicting Key Performance Indicators (KPIs) of differently parameterized electrical machine topologies at the same time by mapping a high dimensional integrated design parameters in a lower dimensional latent space using a variational autoencoder. After training, via a latent space, the decoder and multi-layer neural network will function as meta-models for sampling new designs and predicting associated KPIs, respectively. This enables parameter-based concurrent multi-topology optimization.

[3]  arXiv:2201.08896 [pdf, other]
Title: Environment Generation for Zero-Shot Compositional Reinforcement Learning
Comments: Published in NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Many real-world problems are compositional - solving them requires completing interdependent sub-tasks, either in series or in parallel, that can be represented as a dependency graph. Deep reinforcement learning (RL) agents often struggle to learn such complex tasks due to the long time horizons and sparse rewards. To address this problem, we present Compositional Design of Environments (CoDE), which trains a Generator agent to automatically build a series of compositional tasks tailored to the RL agent's current skill level. This automatic curriculum not only enables the agent to learn more complex tasks than it could have otherwise, but also selects tasks where the agent's performance is weak, enhancing its robustness and ability to generalize zero-shot to unseen tasks at test-time. We analyze why current environment generation techniques are insufficient for the problem of generating compositional tasks, and propose a new algorithm that addresses these issues. Our results assess learning and generalization across multiple compositional tasks, including the real-world problem of learning to navigate and interact with web pages. We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments. We contribute two new benchmark frameworks for generating compositional tasks, compositional MiniGrid and gMiniWoB for web navigation.CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.

[4]  arXiv:2201.08905 [pdf, other]
Title: Optimal Dynamic Regret in Proper Online Learning with Strongly Convex Losses and Beyond
Comments: To appear at AISTATS 2022
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)

We study the framework of universal dynamic regret minimization with strongly convex losses. We answer an open problem in Baby and Wang 2021 by showing that in a proper learning setup, Strongly Adaptive algorithms can achieve the near optimal dynamic regret of $\tilde O(d^{1/3} n^{1/3}\text{TV}[u_{1:n}]^{2/3} \vee d)$ against any comparator sequence $u_1,\ldots,u_n$ simultaneously, where $n$ is the time horizon and $\text{TV}[u_{1:n}]$ is the Total Variation of comparator. These results are facilitated by exploiting a number of new structures imposed by the KKT conditions that were not considered in Baby and Wang 2021 which also lead to other improvements over their results such as: (a) handling non-smooth losses and (b) improving the dimension dependence on regret. Further, we also derive near optimal dynamic regret rates for the special case of proper online learning with exp-concave losses and an $L_\infty$ constrained decision set.

[5]  arXiv:2201.08924 [pdf, other]
Title: Nearest Class-Center Simplification through Intermediate Layers
Subjects: Machine Learning (cs.LG)

Recent advances in theoretical Deep Learning have introduced geometric properties that occur during training, past the Interpolation Threshold -- where the training error reaches zero. We inquire into the phenomena coined Neural Collapse in the intermediate layers of the networks, and emphasize the innerworkings of Nearest Class-Center Mismatch inside the deepnet. We further show that these processes occur both in vision and language model architectures. Lastly, we propose a Stochastic Variability-Simplification Loss (SVSL) that encourages better geometrical features in intermediate layers, and improves both train metrics and generalization.

[6]  arXiv:2201.08984 [pdf, other]
Title: PiCO: Contrastive Label Disambiguation for Partial Label Learning
Comments: Accepted to ICLR 2022
Subjects: Machine Learning (cs.LG)

Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set, which well suits many real-world data annotation scenarios with label ambiguity. Despite the promise, the performance of PLL often lags behind the supervised counterpart. In this work, we bridge the gap by addressing two key research challenges in PLL -- representation learning and label disambiguation -- in one coherent framework. Specifically, our proposed framework PiCO consists of a contrastive learning module along with a novel class prototype-based label disambiguation algorithm. PiCO produces closely aligned representations for examples from the same classes and facilitates label disambiguation. Theoretically, we show that these two components are mutually beneficial, and can be rigorously justified from an expectation-maximization (EM) algorithm perspective. Extensive experiments demonstrate that PiCO significantly outperforms the current state-of-the-art approaches in PLL and even achieves comparable results to fully supervised learning. Code and data available: https://github.com/hbzju/PiCO.

[7]  arXiv:2201.08987 [pdf]
Title: Survival Prediction of Children Undergoing Hematopoietic Stem Cell Transplantation Using Different Machine Learning Classifiers by Performing Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis
Comments: 25 pages, 14 figures, 38 tables
Subjects: Machine Learning (cs.LG)

Bone Marrow Transplant, a gradational rescue for a wide range of disorders emanating from the bone marrow, is an efficacious surgical treatment. Several risk factors, such as post-transplant illnesses, new malignancies, and even organ damage, can impair long-term survival. Therefore, technologies like Machine Learning are deployed for investigating the survival prediction of BMT receivers along with the influences that limit their resilience. In this study, an efficient survival classification model is presented in a comprehensive manner, incorporating the Chi-squared feature selection method to address the dimensionality problem and Hyper Parameter Optimization (HPO) to increase accuracy. A synthetic dataset is generated by imputing the missing values, transforming the data using dummy variable encoding, and compressing the dataset from 59 features to the 11 most correlated features using Chi-squared feature selection. The dataset was split into train and test sets at a ratio of 80:20, and the hyperparameters were optimized using Grid Search Cross-Validation. Several supervised ML methods were trained in this regard, like Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbors, Gradient Boosting Classifier, Ada Boost, and XG Boost. The simulations have been performed for both the default and optimized hyperparameters by using the original and reduced synthetic dataset. After ranking the features using the Chi-squared test, it was observed that the top 11 features with HPO, resulted in the same accuracy of prediction (94.73%) as the entire dataset with default parameters. Moreover, this approach requires less time and resources for predicting the survivability of children undergoing BMT. Hence, the proposed approach may aid in the development of a computer-aided diagnostic system with satisfactory accuracy and minimal computation time by utilizing medical data records.

[8]  arXiv:2201.09020 [pdf, other]
Title: Bi-CLKT: Bi-Graph Contrastive Learning based Knowledge Tracing
Comments: 12pages, 2 figures
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

The goal of Knowledge Tracing (KT) is to estimate how well students have mastered a concept based on their historical learning of related exercises. The benefit of knowledge tracing is that students' learning plans can be better organised and adjusted, and interventions can be made when necessary. With the recent rise of deep learning, Deep Knowledge Tracing (DKT) has utilised Recurrent Neural Networks (RNNs) to accomplish this task with some success. Other works have attempted to introduce Graph Neural Networks (GNNs) and redefine the task accordingly to achieve significant improvements. However, these efforts suffer from at least one of the following drawbacks: 1) they pay too much attention to details of the nodes rather than to high-level semantic information; 2) they struggle to effectively establish spatial associations and complex structures of the nodes; and 3) they represent either concepts or exercises only, without integrating them. Inspired by recent advances in self-supervised learning, we propose a Bi-Graph Contrastive Learning based Knowledge Tracing (Bi-CLKT) to address these limitations. Specifically, we design a two-layer contrastive learning scheme based on an "exercise-to-exercise" (E2E) relational subgraph. It involves node-level contrastive learning of subgraphs to obtain discriminative representations of exercises, and graph-level contrastive learning to obtain discriminative representations of concepts. Moreover, we designed a joint contrastive loss to obtain better representations and hence better prediction performance. Also, we explored two different variants, using RNN and memory-augmented neural networks as the prediction layer for comparison to obtain better representations of exercises and concepts respectively. Extensive experiments on four real-world datasets show that the proposed Bi-CLKT and its variants outperform other baseline models.

[9]  arXiv:2201.09044 [pdf, other]
Title: Good Classification Measures and How to Find Them
Subjects: Machine Learning (cs.LG); Discrete Mathematics (cs.DM); Probability (math.PR)

Several performance measures can be used for evaluating classification results: accuracy, F-measure, and many others. Can we say that some of them are better than others, or, ideally, choose one measure that is best in all situations? To answer this question, we conduct a systematic analysis of classification performance measures: we formally define a list of desirable properties and theoretically analyze which measures satisfy which properties. We also prove an impossibility theorem: some desirable properties cannot be simultaneously satisfied. Finally, we propose a new family of measures satisfying all desirable properties except one. This family includes the Matthews Correlation Coefficient and a so-called Symmetric Balanced Accuracy that was not previously used in classification literature. We believe that our systematic approach gives an important tool to practitioners for adequately evaluating classification results.

[10]  arXiv:2201.09046 [pdf, other]
Title: Differentially Private SGDA for Minimax Problems
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.

[11]  arXiv:2201.09051 [pdf, other]
Title: On the Robustness of Counterfactual Explanations to Adverse Perturbations
Subjects: Machine Learning (cs.LG)

Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed. Researchers have proposed a number of desiderata that CEs should meet to be practically useful, such as requiring minimal effort to enact, or complying with causal models. We consider a further aspect to improve the usability of CEs: robustness to adverse perturbations, which may naturally happen due to unfortunate circumstances. Since CEs typically prescribe a sparse form of intervention (i.e., only a subset of the features should be changed), we provide two definitions of robustness, which concern, respectively, the features to change and to keep as they are. These definitions are workable in that they can be incorporated as penalty terms in the loss functions that are used for discovering CEs. To experiment with the proposed definitions of robustness, we create and release code where five data sets (commonly used in the field of fair and explainable machine learning) have been enriched with feature-specific annotations that can be used to sample meaningful perturbations. Our experiments show that CEs are often not robust and, if adverse perturbations take place, the intervention they prescribe may require a much larger cost than anticipated, or even become impossible. However, accounting for robustness in the search process, which can be done rather easily, allows discovering robust CEs systematically. Robust CEs are resilient to adverse perturbations: additional intervention to contrast perturbations is much less costly than for non-robust CEs. Our code is available at: https://github.com/marcovirgolin/robust-counterfactuals

[12]  arXiv:2201.09065 [pdf, other]
Title: Online Attentive Kernel-Based Temporal Difference Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

With rising uncertainty in the real world, online Reinforcement Learning (RL) has been receiving increasing attention due to its fast learning capability and improving data efficiency. However, online RL often suffers from complex Value Function Approximation (VFA) and catastrophic interference, creating difficulty for the deep neural network to be applied to an online RL algorithm in a fully online setting. Therefore, a simpler and more adaptive approach is introduced to evaluate value function with the kernel-based model. Sparse representations are superior at handling interference, indicating that competitive sparse representations should be learnable, non-prior, non-truncated and explicit when compared with current sparse representation methods. Moreover, in learning sparse representations, attention mechanisms are utilized to represent the degree of sparsification, and a smooth attentive function is introduced into the kernel-based VFA. In this paper, we propose an Online Attentive Kernel-Based Temporal Difference (OAKTD) algorithm using two-timescale optimization and provide convergence analysis of our proposed algorithm. Experimental evaluations showed that OAKTD outperformed several Online Kernel-based Temporal Difference (OKTD) learning algorithms in addition to the Temporal Difference (TD) learning algorithm with Tile Coding on public Mountain Car, Acrobot, CartPole and Puddle World tasks.

[13]  arXiv:2201.09069 [pdf, other]
Title: Neuronal Correlation: a Central Concept in Neural Network
Subjects: Machine Learning (cs.LG)

This paper proposes to study neural networks through neuronal correlation, a statistical measure of correlated neuronal activity on the penultimate layer. We show that neuronal correlation can be efficiently estimated via weight matrix, can be effectively enforced through layer structure, and is a strong indicator of generalisation ability of the network. More importantly, we show that neuronal correlation significantly impacts on the accuracy of entropy estimation in high-dimensional hidden spaces. While previous estimation methods may be subject to significant inaccuracy due to implicit assumption on neuronal independence, we present a novel computational method to have an efficient and authentic computation of entropy, by taking into consideration the neuronal correlation. In doing so, we install neuronal correlation as a central concept of neural network.

[14]  arXiv:2201.09071 [pdf, other]
Title: Towards Sustainable Deep Learning for Wireless Fingerprinting Localization
Comments: 6 pages, 6 sections, accepted to ICC 2022
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

Location based services, already popular with end users, are now inevitably becoming part of new wireless infrastructures and emerging business processes. The increasingly popular Deep Learning (DL) artificial intelligence methods perform very well in wireless fingerprinting localization based on extensive indoor radio measurement data. However, with the increasing complexity these methods become computationally very intensive and energy hungry, both for their training and subsequent operation. Considering only mobile users, estimated to exceed 7.4billion by the end of 2025, and assuming that the networks serving these users will need to perform only one localization per user per hour on average, the machine learning models used for the calculation would need to perform 65*10^12 predictions per year. Add to this equation tens of billions of other connected devices and applications that rely heavily on more frequent location updates, and it becomes apparent that localization will contribute significantly to carbon emissions unless more energy-efficient models are developed and used. This motivated our work on a new DL-based architecture for indoor localization that is more energy efficient compared to related state-of-the-art approaches while showing only marginal performance degradation. A detailed performance evaluation shows that the proposed model producesonly 58 % of the carbon footprint while maintaining 98.7 % of the overall performance compared to state of the art model external to our group. Additionally, we elaborate on a methodology to calculate the complexity of the DL model and thus the CO2 footprint during its training and operation.

[15]  arXiv:2201.09086 [pdf, other]
Title: Joint Learning of Hierarchical Community Structure and Node Representations: An Unsupervised Approach
Comments: Accepted at the DLG-AAAI'21 workshop at AAAI 2021
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Graph representation learning has demonstrated improved performance in tasks such as link prediction and node classification across a range of domains. Research has shown that many natural graphs can be organized in hierarchical communities, leading to approaches that use these communities to improve the quality of node representations. However, these approaches do not take advantage of the learned representations to also improve the quality of the discovered communities and establish an iterative and joint optimization of representation learning and community discovery. In this work, we present Mazi, an algorithm that jointly learns the hierarchical community structure and the node representations of the graph in an unsupervised fashion. To account for the structure in the node representations, Mazi generates node representations at each level of the hierarchy, and utilizes them to influence the node representations of the original graph. Further, the communities at each level are discovered by simultaneously maximizing the modularity metric and minimizing the distance between the representations of a node and its community. Using multi-label node classification and link prediction tasks, we evaluate our method on a variety of synthetic and real-world graphs and demonstrate that Mazi outperforms other hierarchical and non-hierarchical methods.

[16]  arXiv:2201.09101 [pdf, other]
Title: HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Weather Forecasting is an attractive challengeable task due to its influence on human life and complexity in atmospheric motion. Supported by massive historical observed time series data, the task is suitable for data-driven approaches, especially deep neural networks. Recently, the Graph Neural Networks (GNNs) based methods have achieved excellent performance for spatio-temporal forecasting. However, the canonical GNNs-based methods only individually model the local graph of meteorological variables per station or the global graph of whole stations, lacking information interaction between meteorological variables in different stations. In this paper, we propose a novel Hierarchical Spatio-Temporal Graph Neural Network (HiSTGNN) to model cross-regional spatio-temporal correlations among meteorological variables in multiple stations. An adaptive graph learning layer and spatial graph convolution are employed to construct self-learning graph and study hidden dependency among nodes of variable-level and station-level graph. For capturing temporal pattern, the dilated inception as the backbone of gate temporal convolution is designed to model long and various meteorological trends. Moreover, a dynamic interaction learning is proposed to build bidirectional information passing in hierarchical graph. Experimental results on three real-world meteorological datasets demonstrate the superior performance of HiSTGNN beyond 7 baselines and it reduces the errors by 4.2% to 11.6% especially compared to state-of-the-art weather forecasting method.

[17]  arXiv:2201.09104 [pdf, other]
Title: Bag of Tricks for Natural Policy Gradient Reinforcement Learning
Subjects: Machine Learning (cs.LG)

Natural policy gradient methods are popular reinforcement learning methods that improve the stability of policy gradient methods by preconditioning the gradient with the inverse of the Fisher-information matrix. However, leveraging natural policy gradient methods in an optimal manner can be very challenging as many implementation details must be set to achieve optimal performance. To the best of the authors' knowledge, there has not been a study that has investigated strategies for setting these details for natural policy gradient methods to achieve high performance in a comprehensive and systematic manner. To address this, we have implemented and compared strategies that impact performance in natural policy gradient reinforcement learning across five different second-order approximations. These include varying batch sizes and optimizing the critic network using the natural gradient. Furthermore, insights about the fundamental trade-offs when optimizing for performance (stability, sample efficiency, and computation time) were generated. Experimental results indicate that the proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark, with TENGraD exhibiting the best approximation performance amongst the tested approximations. Code in this study is available at https://github.com/gebob19/natural-policy-gradient-reinforcement-learning.

[18]  arXiv:2201.09113 [pdf, other]
Title: Predicting Physics in Mesh-reduced Space with Temporal Attention
Subjects: Machine Learning (cs.LG)

Graph-based next-step prediction models have recently been very successful in modeling complex high-dimensional physical systems on irregular meshes. However, due to their short temporal attention span, these models suffer from error accumulation and drift. In this paper, we propose a new method that captures long-term dependencies through a transformer-style temporal attention model. We introduce an encoder-decoder structure to summarize features and create a compact mesh representation of the system state, to allow the temporal model to operate on a low-dimensional mesh representations in a memory efficient manner. Our method outperforms a competitive GNN baseline on several complex fluid dynamics prediction tasks, from sonic shocks to vascular flow. We demonstrate stable rollouts without the need for training noise and show perfectly phase-stable predictions even for very long sequences. More broadly, we believe our approach paves the way to bringing the benefits of attention-based sequence models to solving high-dimensional complex physics tasks.

[19]  arXiv:2201.09145 [pdf, other]
Title: glassoformer: a query-sparse transformer for post-fault power grid voltage prediction
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than many existing benchmark algorithms in terms of accuracy and stability.

[20]  arXiv:2201.09172 [pdf, other]
Title: An Attention-based ConvLSTM Autoencoder with Dynamic Thresholding for Unsupervised Anomaly Detection in Multivariate Time Series
Comments: 12 pages, 9 figures, thesis: this https URL
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML)

As a substantial amount of multivariate time series data is being produced by the complex systems in Smart Manufacturing, improved anomaly detection frameworks are needed to reduce the operational risks and the monitoring burden placed on the system operators. However, building such frameworks is challenging, as a sufficiently large amount of defective training data is often not available and frameworks are required to capture both the temporal and contextual dependencies across different time steps while being robust to noise. In this paper, we propose an unsupervised Attention-based Convolutional Long Short-Term Memory (ConvLSTM) Autoencoder with Dynamic Thresholding (ACLAE-DT) framework for anomaly detection and diagnosis in multivariate time series. The framework starts by pre-processing and enriching the data, before constructing feature images to characterize the system statuses across different time steps by capturing the inter-correlations between pairs of time series. Afterwards, the constructed feature images are fed into an attention-based ConvLSTM autoencoder, which aims to encode the constructed feature images and capture the temporal behavior, followed by decoding the compressed knowledge representation to reconstruct the feature images input. The reconstruction errors are then computed and subjected to a statistical-based, dynamic thresholding mechanism to detect and diagnose the anomalies. Evaluation results conducted on real-life manufacturing data demonstrate the performance strengths of the proposed approach over state-of-the-art methods under different experimental settings.

[21]  arXiv:2201.09191 [pdf, other]
Title: Revisiting Pooling through the Lens of Optimal Transport
Subjects: Machine Learning (cs.LG)

Pooling is one of the most significant operations in many machine learning models and tasks, whose implementation, however, is often empirical in practice. In this paper, we develop a novel and solid algorithmic pooling framework through the lens of optimal transport. In particular, we demonstrate that most existing pooling methods are equivalent to solving some specializations of an unbalanced optimal transport (UOT) problem. Making the parameters of the UOT problem learnable, we unify most existing pooling methods in the same framework, and accordingly, propose a generalized pooling layer called \textit{UOT-Pooling} for neural networks. Moreover, we implement the UOT-Pooling with two different architectures, based on the Sinkhorn scaling algorithm and the Bregman ADMM algorithm, respectively, and study their stability and efficiency quantitatively. We test our UOT-Pooling layers in two application scenarios, including multi-instance learning (MIL) and graph embedding. For state-of-the-art models of these two tasks, we can improve their performance by replacing conventional pooling layers with our UOT-Pooling layers.

[22]  arXiv:2201.09196 [pdf, other]
Title: Learning to Predict Gradients for Semi-Supervised Continual Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

A key challenge for machine intelligence is to learn new visual concepts without forgetting the previously acquired knowledge. Continual learning is aimed towards addressing this challenge. However, there is a gap between existing supervised continual learning and human-like intelligence, where human is able to learn from both labeled and unlabeled data. How unlabeled data affects learning and catastrophic forgetting in the continual learning process remains unknown. To explore these issues, we formulate a new semi-supervised continual learning method, which can be generically applied to existing continual learning models. Specifically, a novel gradient learner learns from labeled data to predict gradients on unlabeled data. Hence, the unlabeled data could fit into the supervised continual learning method. Different from conventional semi-supervised settings, we do not hypothesize that the underlying classes, which are associated to the unlabeled data, are known to the learning process. In other words, the unlabeled data could be very distinct from the labeled data. We evaluate the proposed method on mainstream continual learning, adversarial continual learning, and semi-supervised learning tasks. The proposed method achieves state-of-the-art performance on classification accuracy and backward transfer in the continual learning setting while achieving desired performance on classification accuracy in the semi-supervised learning setting. This implies that the unlabeled images can enhance the generalizability of continual learning models on the predictive ability on unseen data and significantly alleviate catastrophic forgetting. The code is available at \url{https://github.com/luoyan407/grad_prediction.git}.

[23]  arXiv:2201.09199 [pdf, other]
Title: Deep Learning on Attributed Sequences
Authors: Zhongfang Zhuang
Comments: PhD thesis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent research in feature learning has been extended to sequence data, where each instance consists of a sequence of heterogeneous items with a variable length. However, in many real-world applications, the data exists in the form of attributed sequences, which is composed of a set of fixed-size attributes and variable-length sequences with dependencies between them. In the attributed sequence context, feature learning remains challenging due to the dependencies between sequences and their associated attributes. In this dissertation, we focus on analyzing and building deep learning models for four new problems on attributed sequences. Our extensive experiments on real-world datasets demonstrate that the proposed solutions significantly improve the performance of each task over the state-of-the-art methods on attributed sequences.

[24]  arXiv:2201.09202 [pdf, other]
Title: One-Shot Learning on Attributed Sequences
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

One-shot learning has become an important research topic in the last decade with many real-world applications. The goal of one-shot learning is to classify unlabeled instances when there is only one labeled example per class. Conventional problem setting of one-shot learning mainly focuses on the data that is already in feature space (such as images). However, the data instances in real-world applications are often more complex and feature vectors may not be available. In this paper, we study the problem of one-shot learning on attributed sequences, where each instance is composed of a set of attributes (e.g., user profile) and a sequence of categorical items (e.g., clickstream). This problem is important for a variety of real-world applications ranging from fraud prevention to network intrusion detection. This problem is more challenging than conventional one-shot learning since there are dependencies between attributes and sequences. We design a deep learning framework OLAS to tackle this problem. The proposed OLAS utilizes a twin network to generalize the features from pairwise attributed sequence examples. Empirical results on real-world datasets demonstrate the proposed OLAS can outperform the state-of-the-art methods under a rich variety of parameter settings.

[25]  arXiv:2201.09209 [pdf, other]
Title: Weight Expansion: A New Perspective on Dropout and Generalization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of \emph{weight expansion}, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an \emph{indicator} of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.

[26]  arXiv:2201.09210 [pdf, other]
Title: Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)

Imperative programming allows users to implement their deep neural networks (DNNs) easily and has become an essential part of recent deep learning (DL) frameworks. Recently, several systems have been proposed to combine the usability of imperative programming with the optimized performance of symbolic graph execution. Such systems convert imperative Python DL programs to optimized symbolic graphs and execute them. However, they cannot fully support the usability of imperative programming. For example, if an imperative DL program contains a Python feature with no corresponding symbolic representation (e.g., third-party library calls or unsupported dynamic control flows) they fail to execute the program. To overcome this limitation, we propose Terra, an imperative-symbolic co-execution system that can handle any imperative DL programs while achieving the optimized performance of symbolic graph execution. To achieve this, Terra builds a symbolic graph by decoupling DL operations from Python features. Then, Terra conducts the imperative execution to support all Python features, while delegating the decoupled operations to the symbolic execution. We evaluated the performance improvement and coverage of Terra with ten imperative DL programs for several DNN architectures. The results show that Terra can speed up the execution of all ten imperative DL programs, whereas AutoGraph, one of the state-of-the-art systems, fails to execute five of them.

[27]  arXiv:2201.09223 [pdf, other]
Title: A Generalized Weighted Optimization Method for Computational Learning and Inversion
Comments: 25 pages, 2 figures; paper accepted by ICLR 2022
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)

The generalization capacity of various machine learning models exhibits different phenomena in the under- and over-parameterized regimes. In this paper, we focus on regression models such as feature regression and kernel regression and analyze a generalized weighted least-squares optimization method for computational learning and inversion with noisy data. The highlight of the proposed framework is that we allow weighting in both the parameter space and the data space. The weighting scheme encodes both a priori knowledge on the object to be learned and a strategy to weight the contribution of different data points in the loss function. Here, we characterize the impact of the weighting scheme on the generalization error of the learning method, where we derive explicit generalization errors for the random Fourier feature model in both the under- and over-parameterized regimes. For more general feature maps, error bounds are provided based on the singular values of the feature matrix. We demonstrate that appropriate weighting from prior knowledge can improve the generalization capability of the learned model.

[28]  arXiv:2201.09329 [pdf, other]
Title: ULSA: Unified Language of Synthesis Actions for Representation of Synthesis Protocols
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)

Applying AI power to predict syntheses of novel materials requires high-quality, large-scale datasets. Extraction of synthesis information from scientific publications is still challenging, especially for extracting synthesis actions, because of the lack of a comprehensive labeled dataset using a solid, robust, and well-established ontology for describing synthesis procedures. In this work, we propose the first Unified Language of Synthesis Actions (ULSA) for describing ceramics synthesis procedures. We created a dataset of 3,040 synthesis procedures annotated by domain experts according to the proposed ULSA scheme. To demonstrate the capabilities of ULSA, we built a neural network-based model to map arbitrary ceramics synthesis paragraphs into ULSA and used it to construct synthesis flowcharts for synthesis procedures. Analysis for the flowcharts showed that (a) ULSA covers essential vocabulary used by researchers when describing synthesis procedures and (b) it can capture important features of synthesis protocols. This work is an important step towards creating a synthesis ontology and a solid foundation for autonomous robotic synthesis.

[29]  arXiv:2201.09332 [pdf, other]
Title: Investigating Expressiveness of Transformer in Spectral Domain for Graphs
Subjects: Machine Learning (cs.LG)

Transformers have been proven to be inadequate for graph representation learning. To understand this inadequacy, there is need to investigate if spectral analysis of transformer will reveal insights on its expressive power. Similar studies already established that spectral analysis of Graph neural networks (GNNs) provides extra perspectives on their expressiveness. In this work, we systematically study and prove the link between the spatial and spectral domain in the realm of the transformer. We further provide a theoretical analysis that the spatial attention mechanism in the transformer cannot effectively capture the desired frequency response, thus, inherently limiting its expressiveness in spectral space. Therefore, we propose FeTA, a framework that aims to perform attention over the entire graph spectrum analogous to the attention in spatial space. Empirical results suggest that FeTA provides homogeneous performance gain against vanilla transformer across all tasks on standard benchmarks and can easily be extended to GNN based models with low-pass characteristics (e.g., GAT). Furthermore, replacing the vanilla transformer model with FeTA in recently proposed position encoding schemes has resulted in comparable or better performance than transformer and GNN baselines.

[30]  arXiv:2201.09353 [pdf, other]
Title: Distributed Bandits with Heterogeneous Agents
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

This paper tackles a multi-agent bandit setting where $M$ agents cooperate together to solve the same instance of a $K$-armed stochastic bandit problem. The agents are \textit{heterogeneous}: each agent has limited access to a local subset of arms and the agents are asynchronous with different gaps between decision-making rounds. The goal for each agent is to find its optimal local arm, and agents can cooperate by sharing their observations with others. While cooperation between agents improves the performance of learning, it comes with an additional complexity of communication between agents. For this heterogeneous multi-agent setting, we propose two learning algorithms, \ucbo and \AAE. We prove that both algorithms achieve order-optimal regret, which is $O\left(\sum_{i:\tilde{\Delta}_i>0} \log T/\tilde{\Delta}_i\right)$, where $\tilde{\Delta}_i$ is the minimum suboptimality gap between the reward mean of arm $i$ and any local optimal arm. In addition, a careful selection of the valuable information for cooperation, \AAE achieves a low communication complexity of $O(\log T)$. Last, numerical experiments verify the efficiency of both algorithms.

[31]  arXiv:2201.09366 [pdf, other]
Title: Optimal transport for causal discovery
Subjects: Machine Learning (cs.LG); Methodology (stat.ME)

Approaches based on Functional Causal Models (FCMs) have been proposed to determine causal direction between two variables, by properly restricting model classes; however, their performance is sensitive to the model assumptions, which makes it difficult for practitioners to use. In this paper, we provide a novel dynamical-system view of FCMs and propose a new framework for identifying causal direction in the bivariate case. We first show the connection between FCMs and optimal transport, and then study optimal transport under the constraints of FCMs. Furthermore, by exploiting the dynamical interpretation of optimal transport under the FCM constraints, we determine the corresponding underlying dynamical process of the static cause-effect pair data under the least action principle. It provides a new dimension for describing static causal discovery tasks, while enjoying more freedom for modeling the quantitative causal influences. In particular, we show that Additive Noise Models (ANMs) correspond to volume-preserving pressureless flows. Consequently, based on their velocity field divergence, we introduce a criterion to determine causal direction. With this criterion, we propose a novel optimal transport-based algorithm for ANMs which is robust to the choice of models and extend it to post-noninear models. Our method demonstrated state-of-the-art results on both synthetic and causal discovery benchmark datasets.

[32]  arXiv:2201.09369 [pdf, other]
Title: Efficient and Robust Classification for Sparse Attacks
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

In the past two decades we have seen the popularity of neural networks increase in conjunction with their classification accuracy. Parallel to this, we have also witnessed how fragile the very same prediction models are: tiny perturbations to the inputs can cause misclassification errors throughout entire datasets. In this paper, we consider perturbations bounded by the $\ell_0$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection. To this end, we propose a novel defense method that consists of "truncation" and "adversarial training". We then theoretically study the Gaussian mixture setting and prove the asymptotic optimality of our proposed classifier. Motivated by the insights we obtain, we extend these components to neural network classifiers. We conduct numerical experiments in the domain of computer vision using the MNIST and CIFAR datasets, demonstrating significant improvement for the robust classification error of neural networks.

[33]  arXiv:2201.09391 [pdf, other]
Title: Partition-Based Active Learning for Graph Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We study the problem of semi-supervised learning with Graph Neural Networks (GNNs) in an active learning setup. We propose GraphPart, a novel partition-based active learning approach for GNNs. GraphPart first splits the graph into disjoint partitions and then selects representative nodes within each partition to query. The proposed method is motivated by a novel analysis of the classification error under realistic smoothness assumptions over the graph and the node features. Extensive experiments on multiple benchmark datasets demonstrate that the proposed method outperforms existing active learning methods for GNNs under a wide range of annotation budget constraints. In addition, the proposed method does not introduce additional hyperparameters, which is crucial for model training, especially in the active learning setting where a labeled validation set may not be available.

[34]  arXiv:2201.09394 [pdf, other]
Title: An integrated recurrent neural network and regression model with spatial and climatic couplings for vector-borne disease dynamics
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Populations and Evolution (q-bio.PE); Quantitative Methods (q-bio.QM)

We developed an integrated recurrent neural network and nonlinear regression spatio-temporal model for vector-borne disease evolution. We take into account climate data and seasonality as external factors that correlate with disease transmitting insects (e.g. flies), also spill-over infections from neighboring regions surrounding a region of interest. The climate data is encoded to the model through a quadratic embedding scheme motivated by recommendation systems. The neighboring regions' influence is modeled by a long short-term memory neural network. The integrated model is trained by stochastic gradient descent and tested on leish-maniasis data in Sri Lanka from 2013-2018 where infection outbreaks occurred. Our model outperformed ARIMA models across a number of regions with high infections, and an associated ablation study renders support to our modeling hypothesis and ideas.

[35]  arXiv:2201.09398 [pdf, other]
Title: Towards Private Learning on Decentralized Graphs with Local Differential Privacy
Comments: in submission
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Many real-world networks are inherently decentralized. For example, in social networks, each user maintains a local view of a social graph, such as a list of friends and her profile. It is typical to collect these local views of social graphs and conduct graph learning tasks. However, learning over graphs can raise privacy concerns as these local views often contain sensitive information.
In this paper, we seek to ensure private graph learning on a decentralized network graph. Towards this objective, we propose {\em Solitude}, a new privacy-preserving learning framework based on graph neural networks (GNNs), with formal privacy guarantees based on edge local differential privacy. The crux of {\em Solitude} is a set of new delicate mechanisms that can calibrate the introduced noise in the decentralized graph collected from the users. The principle behind the calibration is the intrinsic properties shared by many real-world graphs, such as sparsity. Unlike existing work on locally private GNNs, our new framework can simultaneously protect node feature privacy and edge privacy, and can seamlessly incorporate with any GNN with privacy-utility guarantees. Extensive experiments on benchmarking datasets show that {\em Solitude} can retain the generalization capability of the learned GNN while preserving the users' data privacy under given privacy budgets.

[36]  arXiv:2201.09418 [pdf, ps, other]
Title: Approximation bounds for norm constrained neural networks with applications to regression and GANs
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)

This paper studies the approximation capacity of ReLU neural networks with norm constraint on the weights. We prove upper and lower bounds on the approximation error of these networks for smooth function classes. The lower bound is derived through the Rademacher complexity of neural networks, which may be of independent interest. We apply these approximation bounds to analyze the convergence of regression using norm constrained neural networks and distribution estimation by GANs. In particular, we obtain convergence rates for over-parameterized neural networks. It is also shown that GANs can achieve optimal rate of learning probability distributions, when the discriminator is a properly chosen norm constrained neural network.

[37]  arXiv:2201.09433 [pdf, other]
Title: Active Learning Polynomial Threshold Functions
Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Machine Learning (stat.ML)

We initiate the study of active learning polynomial threshold functions (PTFs). While traditional lower bounds imply that even univariate quadratics cannot be non-trivially actively learned, we show that allowing the learner basic access to the derivatives of the underlying classifier circumvents this issue and leads to a computationally efficient algorithm for active learning degree-$d$ univariate PTFs in $\tilde{O}(d^3\log(1/\varepsilon\delta))$ queries. We also provide near-optimal algorithms and analyses for active learning PTFs in several average case settings. Finally, we prove that access to derivatives is insufficient for active learning multivariate PTFs, even those of just two variables.

[38]  arXiv:2201.09441 [pdf, other]
Title: Federated Unlearning with Knowledge Distillation
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Federated Learning (FL) is designed to protect the data privacy of each client during the training process by transmitting only models instead of the original data. However, the trained model may memorize certain information about the training data. With the recent legislation on right to be forgotten, it is crucially essential for the FL model to possess the ability to forget what it has learned from each client. We propose a novel federated unlearning method to eliminate a client's contribution by subtracting the accumulated historical updates from the model and leveraging the knowledge distillation method to restore the model's performance without using any data from the clients. This method does not have any restrictions on the type of neural networks and does not rely on clients' participation, so it is practical and efficient in the FL system. We further introduce backdoor attacks in the training process to help evaluate the unlearning effect. Experiments on three canonical datasets demonstrate the effectiveness and efficiency of our method.

[39]  arXiv:2201.09457 [pdf, other]
Title: Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)

We propose the homotopic policy mirror descent (HPMD) method for solving discounted, infinite horizon MDPs with finite state and action space, and study its policy convergence. We report three properties that seem to be new in the literature of policy gradient methods: (1) The policy first converges linearly, then superlinearly with order $\gamma^{-2}$ to the set of optimal policies, after $\mathcal{O}(\log(1/\Delta^*))$ number of iterations, where $\Delta^*$ is defined via a gap quantity associated with the optimal state-action value function; (2) HPMD also exhibits last-iterate convergence, with the limiting policy corresponding exactly to the optimal policy with the maximal entropy for every state. No regularization is added to the optimization objective and hence the second observation arises solely as an algorithmic property of the homotopic policy gradient method. (3) For the stochastic HPMD method, we demonstrate a better than $\mathcal{O}(|\mathcal{S}| |\mathcal{A}| / \epsilon^2)$ sample complexity for small optimality gap $\epsilon$, when assuming a generative model for policy evaluation.

[40]  arXiv:2201.09460 [pdf, other]
Title: Probability Distribution on Rooted Trees
Comments: arXiv admin note: substantial text overlap with arXiv:2109.12825
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The hierarchical and recursive expressive capability of rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning. On the other hand, such hierarchical expressive capability causes a problem in tree selection to avoid overfitting. One unified approach to solve this is a Bayesian approach, on which the rooted tree is regarded as a random variable and a direct loss function can be assumed on the selected model or the predicted value for a new data point. However, all the previous studies on this approach are based on the probability distribution on full trees, to the best of our knowledge. In this paper, we propose a generalized probability distribution for any rooted trees in which only the maximum number of child nodes and the maximum depth are fixed. Furthermore, we derive recursive methods to evaluate the characteristics of the probability distribution without any approximations.

[41]  arXiv:2201.09483 [pdf, other]
Title: A Machine Learning Framework for Distributed Functional Compression over Wireless Channels in IoT
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT); Signal Processing (eess.SP); Machine Learning (stat.ML)

IoT devices generating enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. In many diverse fields, from autonomous driving to augmented reality, distributed IoT devices compute specific target functions without simple forms like obstacle detection, object recognition, etc. Traditional cloud-based methods that focus on transferring data to a central location either for training or inference place enormous strain on network resources. To address this, we develop, to the best of our knowledge, the first machine learning framework for distributed functional compression over both the Gaussian Multiple Access Channel (GMAC) and orthogonal AWGN channels. Due to the Kolmogorov-Arnold representation theorem, our machine learning framework can, by design, compute any arbitrary function for the desired functional compression task in IoT. Importantly the raw sensory data are never transferred to a central node for training or inference, thus reducing communication. For these algorithms, we provide theoretical convergence guarantees and upper bounds on communication. Our simulations show that the learned encoders and decoders for functional compression perform significantly better than traditional approaches, are robust to channel condition changes and sensor outages. Compared to the cloud-based scenario, our algorithms reduce channel use by two orders of magnitude.

[42]  arXiv:2201.09531 [pdf, other]
Title: Communication-Efficient Stochastic Zeroth-Order Optimization for Federated Learning
Comments: 13 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Federated learning (FL), as an emerging edge artificial intelligence paradigm, enables many edge devices to collaboratively train a global model without sharing their private data. To enhance the training efficiency of FL, various algorithms have been proposed, ranging from first-order to second-order methods. However, these algorithms cannot be applied in scenarios where the gradient information is not available, e.g., federated black-box attack and federated hyperparameter tuning. To address this issue, in this paper we propose a derivative-free federated zeroth-order optimization (FedZO) algorithm featured by performing multiple local updates based on stochastic gradient estimators in each communication round and enabling partial device participation. Under the non-convex setting, we derive the convergence performance of the FedZO algorithm and characterize the impact of the numbers of local iterates and participating edge devices on the convergence. To enable communication-efficient FedZO over wireless networks, we further propose an over-the-air computation (AirComp) assisted FedZO algorithm. With an appropriate transceiver design, we show that the convergence of AirComp-assisted FedZO can still be preserved under certain signal-to-noise ratio conditions. Simulation results demonstrate the effectiveness of the FedZO algorithm and validate the theoretical observations.

[43]  arXiv:2201.09534 [pdf, other]
Title: PaRT: Parallel Learning Towards Robust and Transparent AI
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper takes a parallel learning approach for robust and transparent AI. A deep neural network is trained in parallel on multiple tasks, where each task is trained only on a subset of the network resources. Each subset consists of network segments, that can be combined and shared across specific tasks. Tasks can share resources with other tasks, while having independent task-related network resources. Therefore, the trained network can share similar representations across various tasks, while also enabling independent task-related representations. The above allows for some crucial outcomes. (1) The parallel nature of our approach negates the issue of catastrophic forgetting. (2) The sharing of segments uses network resources more efficiently. (3) We show that the network does indeed use learned knowledge from some tasks in other tasks, through shared representations. (4) Through examination of individual task-related and shared representations, the model offers transparency in the network and in the relationships across tasks in a multi-task setting. Evaluation of the proposed approach against complex competing approaches such as Continual Learning, Neural Architecture Search, and Multi-task learning shows that it is capable of learning robust representations. This is the first effort to train a DL model on multiple tasks in parallel. Our code is available at https://github.com/MahsaPaknezhad/PaRT

[44]  arXiv:2201.09562 [pdf, other]
Title: Scalable Safe Exploration for Global Optimization of Dynamical Systems
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for complex systems while giving safety and optimality guarantees. Our experiments on a robot arm that would be prohibitive for GoSafe demonstrate that GoSafeOpt safely finds remarkably better policies than competing safe learning methods for high-dimensional domains.

[45]  arXiv:2201.09568 [pdf, other]
Title: Pearl: Parallel Evolutionary and Reinforcement Learning Library
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Reinforcement learning is increasingly finding success across domains where the problem can be represented as a Markov decision process. Evolutionary computation algorithms have also proven successful in this domain, exhibiting similar performance to the generally more complex reinforcement learning. Whilst there exist many open-source reinforcement learning and evolutionary computation libraries, no publicly available library combines the two approaches for enhanced comparison, cooperation, or visualization. To this end, we have created Pearl (https://github.com/LondonNode/Pearl), an open source Python library designed to allow researchers to rapidly and conveniently perform optimized reinforcement learning, evolutionary computation and combinations of the two. The key features within Pearl include: modular and expandable components, opinionated module settings, Tensorboard integration, custom callbacks and comprehensive visualizations.

[46]  arXiv:2201.09603 [pdf, other]
Title: Performance analysis of Electrical Machines based on Electromagnetic System Characterization using Deep Learning
Comments: 13 pages, This article will appear as a contributed paper in scientific journal
Subjects: Machine Learning (cs.LG)

The numerical optimization of an electrical machine entails computationally intensive and time-consuming magneto-static finite element (FE) simulation. Generally, this FE-simulation involves varying input geometry, electrical, and material parameters of an electrical machine. The result of the FE simulation characterizes the electromagnetic behavior of the electrical machine. It usually includes nonlinear iron losses and electromagnetic torque and flux at different time-steps for an electrical cycle at each operating point (varying electrical input phase current and control angle). In this paper, we present a novel data-driven deep learning (DL) approach to approximate the electromagnetic behavior of an electrical machine by predicting intermediate measures that include non-linear iron losses, a non-negligible fraction ($\frac{1}{6}$ of a whole electrical period) of the electromagnetic torque and flux at different time-steps for each operating point. The remaining time-steps of the electromagnetic flux and torque for an electrical cycle are estimated by exploiting the magnetic state symmetry of the electrical machine. Then these calculations, along with the system parameters, are fed as input to the physics-based analytical models to estimate characteristic maps and key performance indicators (KPIs) such as material cost, maximum torque, power, torque ripple, etc. The key idea is to train the proposed multi-branch deep neural network (DNN) step by step on a large volume of stored FE data in a supervised manner. Preliminary results exhibit that the predictions of intermediate measures and the subsequent computations of KPIs are close to the ground truth for a new machine design in the input design space. In the end, the quantitative analysis validates that the hybrid approach is more accurate than the existing DNN-based direct prediction of KPIs, which avoids electromagnetic calculations.

[47]  arXiv:2201.09635 [pdf, other]
Title: Adversarially Guided Subgoal Generation for Hierarchical Reinforcement Learning
Subjects: Machine Learning (cs.LG)

Hierarchical reinforcement learning (HRL) proposes to solve difficult tasks by performing decision-making and control at successively higher levels of temporal abstraction. However, off-policy training in HRL often suffers from the problem of non-stationary high-level decision making since the low-level policy is constantly changing. In this paper, we propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. In practice, the adversarial learning can be implemented by training a simple discriminator network concurrently with the high-level policy which determines the compatibility level of subgoals. Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.

[48]  arXiv:2201.09636 [pdf, other]
Title: Neural Implicit Surfaces in Higher Dimension
Subjects: Machine Learning (cs.LG); Graphics (cs.GR)

This work investigates the use of neural networks admitting high-order derivatives for modeling dynamic variations of smooth implicit surfaces. For this purpose, it extends the representation of differentiable neural implicit surfaces to higher dimensions, which opens up mechanisms that allow to exploit geometric transformations in many settings, from animation and surface evolution to shape morphing and design galleries.
The problem is modeled by a $k$-parameter family of surfaces $S_c$, specified as a neural network function $f : \mathbb{R}^3 \times \mathbb{R}^k \rightarrow \mathbb{R}$, where $S_c$ is the zero-level set of the implicit function $f(\cdot, c) : \mathbb{R}^3 \rightarrow \mathbb{R} $, with $c \in \mathbb{R}^k$, with variations induced by the control variable $c$. In that context, restricted to each coordinate of $\mathbb{R}^k$, the underlying representation is a neural homotopy which is the solution of a general partial differential equation.

[49]  arXiv:2201.09637 [pdf, other]
Title: DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations
Comments: 54 pages, 11 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications.
In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

[50]  arXiv:2201.09644 [pdf, other]
Title: Multiscale Generative Models: Improving Performance of a Generative Model Using Feedback from Other Dependent Generative Models
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Realistic fine-grained multi-agent simulation of real-world complex systems is crucial for many downstream tasks such as reinforcement learning. Recent work has used generative models (GANs in particular) for providing high-fidelity simulation of real-world systems. However, such generative models are often monolithic and miss out on modeling the interaction in multi-agent systems. In this work, we take a first step towards building multiple interacting generative models (GANs) that reflects the interaction in real world. We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs. We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs. We mathematically characterize the conditions under which our technique is impactful, including understanding the transfer learning nature of our set-up. We present three distinct experiments on synthetic data, time series data, and image domain, revealing the wide applicability of our technique.

[51]  arXiv:2201.09653 [pdf, other]
Title: The Paradox of Choice: Using Attention in Hierarchical Reinforcement Learning
Comments: 20 pages, 15 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we model "affordances" through an attention mechanism that limits the available choices of temporally extended options. We present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. We investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. We identify and empirically illustrate the settings in which the paradox of choice arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.

[52]  arXiv:2201.09656 [pdf, other]
Title: A singular Riemannian geometry approach to Deep Neural Networks I. Theoretical foundations
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Metric Geometry (math.MG)

Deep Neural Networks are widely used for solving complex problems in several scientific areas, such as speech recognition, machine translation, image analysis. The strategies employed to investigate their theoretical properties mainly rely on Euclidean geometry, but in the last years new approaches based on Riemannian geometry have been developed. Motivated by some open problems, we study a particular sequence of maps between manifolds, with the last manifold of the sequence equipped with a Riemannian metric. We investigate the structures induced trough pullbacks on the other manifolds of the sequence and on some related quotients. In particular, we show that the pullbacks of the final Riemannian metric to any manifolds of the sequence is a degenerate Riemannian metric inducing a structure of pseudometric space, we show that the Kolmogorov quotient of this pseudometric space yields a smooth manifold, which is the base space of a particular vertical bundle. We investigate the theoretical properties of the maps of such sequence, eventually we focus on the case of maps between manifolds implementing neural networks of practical interest and we present some applications of the geometric framework we introduced in the first part of the paper.

[53]  arXiv:2201.09671 [pdf, other]
Title: Analyzing Multispectral Satellite Imagery of South American Wildfires Using CNNs and Unsupervised Learning
Authors: Christopher Sun
Comments: 15 pages, 12 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Since severe droughts are occurring more frequently and lengthening the dry season in the Amazon Rainforest, it is important to respond to active wildfires promptly and to forecast them before they become inextinguishable. Though computer vision researchers have applied algorithms on large databases to automatically detect wildfires, current models are computationally expensive and are not versatile enough for the low technology conditions of regions in South America. This comprehensive deep learning study first trains a Fully Convolutional Neural Network with skip connections on multispectral Landsat 8 images of Ecuador and the Galapagos. The model uses Green and Short-wave Infrared bands as inputs to predict each image's corresponding pixel-level binary fire mask. This model achieves a 0.962 validation F2 score and a 0.932 F2 score on test data from Guyana and Suriname. Afterward, image segmentation is conducted on the Cirrus Cloud band using K-Means Clustering to simplify continuous pixel values into three discrete classes representing the degree of cirrus cloud contamination. Two additional Convolutional Neural Networks are trained to classify the presence of a wildfire in a patch of land using these segmented cirrus images. The "experimental" model trained on the segmented inputs achieves 96.5% binary accuracy and has smoother learning curves than the "control model" that is not given the segmented inputs. This proof of concept reveals that feature simplification can improve the performance of wildfire detection models. Overall, the software built in this study is useful for early and accurate detection of wildfires in South America.

[54]  arXiv:2201.09676 [pdf, ps, other]
Title: Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement
Comments: This work has been submitted for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing graph traversal order in device placement for various neural network families to improve the training time in model parallelization.

[55]  arXiv:2201.09679 [pdf]
Title: A Review of Deep Transfer Learning and Recent Advancements
Comments: 13 pages, 2 figures, 1 table
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

A successful deep learning model is dependent on extensive training data and processing power and time (known as training costs). There exist many tasks without enough number of labeled data to train a deep learning model. Further, the demand is rising for running deep learning models on edge devices with limited processing capacity and training time. Deep transfer learning (DTL) methods are the answer to tackle such limitations, e.g., fine-tuning a pre-trained model on a massive semi-related dataset proved to be a simple and effective method for many problems. DTLs handle limited target data concerns as well as drastically reduce the training costs. In this paper, the definition and taxonomy of deep transfer learning is reviewed. Then we focus on the sub-category of network-based DTLs since it is the most common types of DTLs that have been applied to various applications in the last decade.

[56]  arXiv:2201.09686 [pdf, other]
Title: Learning Sparse and Continuous Graph Structures for Multivariate Time Series Forecasting
Comments: 8 pages, in submission to IJCAI 2022
Subjects: Machine Learning (cs.LG)

Accurate forecasting of multivariate time series is an extensively studied subject in finance, transportation, and computer science. Fully mining the correlation and causation between the variables in a multivariate time series exhibits noticeable results in improving the performance of a time series model. Recently, some models have explored the dependencies between variables through end-to-end graph structure learning without the need for pre-defined graphs. However, most current models do not incorporate the trade-off between effectiveness and flexibility and lack the guidance of domain knowledge in the design of graph learning algorithms. Besides, they have issues generating sparse graph structures, which pose challenges to end-to-end learning. In this paper, we propose Learning Sparse and Continuous Graphs for Forecasting (LSCGF), a novel deep learning model that joins graph learning and forecasting. Technically, LSCGF leverages the spatial information into convolutional operation and extracts temporal dynamics using the diffusion convolution recurrent network. At the same time, we propose a brand new method named Smooth Sparse Unit (SSU) to learn sparse and continuous graph adjacency matrix. Extensive experiments on three real-world datasets demonstrate that our model achieves state-of-the-art performances with minor trainable parameters.

[57]  arXiv:2201.09698 [pdf, other]
Title: Graph Neural Diffusion Networks for Semi-supervised Learning
Comments: 11 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph Convolutional Networks (GCN) is a pioneering model for graph-based semi-supervised learning. However, GCN does not perform well on sparsely-labeled graphs. Its two-layer version cannot effectively propagate the label information to the whole graph structure (i.e., the under-smoothing problem) while its deep version over-smoothens and is hard to train (i.e., the over-smoothing problem). To solve these two issues, we propose a new graph neural network called GND-Nets (for Graph Neural Diffusion Networks) that exploits the local and global neighborhood information of a vertex in a single layer. Exploiting the shallow network mitigates the over-smoothing problem while exploiting the local and global neighborhood information mitigates the under-smoothing problem. The utilization of the local and global neighborhood information of a vertex is achieved by a new graph diffusion method called neural diffusions, which integrate neural networks into the conventional linear and nonlinear graph diffusions. The adoption of neural networks makes neural diffusions adaptable to different datasets. Extensive experiments on various sparsely-labeled graphs verify the effectiveness and efficiency of GND-Nets compared to state-of-the-art approaches.

[58]  arXiv:2201.09699 [pdf, other]
Title: EASY: Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Image and Video Processing (eess.IV)

Few-shot learning aims at leveraging knowledge learned by one or more deep learning models, in order to obtain good classification performance on new problems, where only a few labeled samples per class are available. Recent years have seen a fair number of works in the field, introducing methods with numerous ingredients. A frequent problem, though, is the use of suboptimally trained models to extract knowledge, leading to interrogations on whether proposed approaches bring gains compared to using better initial models without the introduced ingredients. In this work, we propose a simple methodology, that reaches or even beats state of the art performance on multiple standardized benchmarks of the field, while adding almost no hyperparameters or parameters to those used for training the initial deep learning models on the generic dataset. This methodology offers a new baseline on which to propose (and fairly compare) new techniques or adapt existing ones.

[59]  arXiv:2201.09725 [pdf]
Title: Machine Learning Algorithms for Prediction of Penetration Depth and Geometrical Analysis of Weld in Friction Stir Spot Welding Process
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Nowadays, manufacturing sectors harness the power of machine learning and data science algorithms to make predictions for the optimization of mechanical and microstructure properties of fabricated mechanical components. The application of these algorithms reduces the experimental cost beside leads to reduce the time of experiments. The present research work is based on the prediction of penetration depth using Supervised Machine Learning algorithms such as Support Vector Machines (SVM), Random Forest Algorithm, and Robust Regression algorithm. A Friction Stir Spot Welding (FSSW) was used to join two elements of AA1230 aluminum alloys. The dataset consists of three input parameters: Rotational Speed (rpm), Dwelling Time (seconds), and Axial Load (KN), on which the machine learning models were trained and tested. It observed that the Robust Regression machine learning algorithm outperformed the rest of the algorithms by resulting in the coefficient of determination of 0.96. The research work also highlights the application of image processing techniques to find the geometrical features of the weld formation.

[60]  arXiv:2201.09736 [pdf, other]
Title: Tensor and Matrix Low-Rank Value-Function Approximation in Reinforcement Learning
Comments: 12 pages, 6 figures, 1 table
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Value-function (VF) approximation is a central problem in Reinforcement Learning (RL). Classical non-parametric VF estimation suffers from the curse of dimensionality. As a result, parsimonious parametric models have been adopted to approximate VFs in high-dimensional spaces, with most efforts being focused on linear and neural-network-based approaches. Differently, this paper puts forth a a parsimonious non-parametric approach, where we use stochastic low-rank algorithms to estimate the VF matrix in an online and model-free fashion. Furthermore, as VFs tend to be multi-dimensional, we propose replacing the classical VF matrix representation with a tensor (multi-way array) representation and, then, use the PARAFAC decomposition to design an online model-free tensor low-rank algorithm. Different versions of the algorithms are proposed, their complexity is analyzed, and their performance is assessed numerically using standardized RL environments.

[61]  arXiv:2201.09737 [pdf, other]
Title: RamanNet: A generalized neural network architecture for Raman Spectrum Analysis
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Raman spectroscopy provides a vibrational profile of the molecules and thus can be used to uniquely identify different kind of materials. This sort of fingerprinting molecules has thus led to widespread application of Raman spectrum in various fields like medical dignostics, forensics, mineralogy, bacteriology and virology etc. Despite the recent rise in Raman spectra data volume, there has not been any significant effort in developing generalized machine learning methods for Raman spectra analysis. We examine, experiment and evaluate existing methods and conjecture that neither current sequential models nor traditional machine learning models are satisfactorily sufficient to analyze Raman spectra. Both has their perks and pitfalls, therefore we attempt to mix the best of both worlds and propose a novel network architecture RamanNet. RamanNet is immune to invariance property in CNN and at the same time better than traditional machine learning models for the inclusion of sparse connectivity. Our experiments on 4 public datasets demonstrate superior performance over the much complex state-of-the-art methods and thus RamanNet has the potential to become the defacto standard in Raman spectra data analysis

[62]  arXiv:2201.09739 [pdf, other]
Title: Dense Air Quality Maps Using Regressive Facility Location Based Drive By Sensing
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Currently, fixed static sensing is a primary way to monitor environmental data like air quality in cities. However, to obtain a dense spatial coverage, a large number of static monitors are required, thereby making it a costly option. Dense spatiotemporal coverage can be achieved using only a fraction of static sensors by deploying them on the moving vehicles, known as the drive by sensing paradigm. The redundancy present in the air quality data can be exploited by processing the sparsely sampled data to impute the remaining unobserved data points using the matrix completion techniques. However, the accuracy of imputation is dependent on the extent to which the moving sensors capture the inherent structure of the air quality matrix. Therefore, the challenge is to pick those set of paths (using vehicles) that perform representative sampling in space and time. Most works in the literature for vehicle subset selection focus on maximizing the spatiotemporal coverage by maximizing the number of samples for different locations and time stamps which is not an effective representative sampling strategy. We present regressive facility location-based drive by sensing, an efficient vehicle selection framework that incorporates the smoothness in neighboring locations and autoregressive time correlation while selecting the optimal set of vehicles for effective spatiotemporal sampling. We show that the proposed drive by sensing problem is submodular, thereby lending itself to a greedy algorithm but with performance guarantees. We evaluate our framework on selecting a subset from the fleet of public transport in Delhi, India. We illustrate that the proposed method samples the representative spatiotemporal data against the baseline methods, reducing the extrapolation error on the simulated air quality data. Our method, therefore, has the potential to provide cost effective dense air quality maps.

[63]  arXiv:2201.09746 [pdf, other]
Title: Reinforcement Learning Textbook
Authors: Sergey Ivanov
Comments: The text is in Russian
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)

This textbook covers principles behind main modern deep reinforcement learning algorithms that achieved breakthrough results in many domains from game AI to robotics. All required theory is explained with proofs using unified notation and emphasize on the differences between different types of algorithms and the reasons why they are constructed the way they are.

[64]  arXiv:2201.09750 [pdf, other]
Title: Online AutoML: An adaptive AutoML framework for online learning
Comments: 25 pages, 8 figures. Submitted to Machine Learning S.I.: Automating Data Science
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Automated Machine Learning (AutoML) has been used successfully in settings where the learning task is assumed to be static. In many real-world scenarios, however, the data distribution will evolve over time, and it is yet to be shown whether AutoML techniques can effectively design online pipelines in dynamic environments. This study aims to automate pipeline design for online learning while continuously adapting to data drift. For this purpose, we design an adaptive Online Automated Machine Learning (OAML) system, searching the complete pipeline configuration space of online learners, including preprocessing algorithms and ensembling techniques. This system combines the inherent adaptation capabilities of online learners with the fast automated pipeline (re)optimization capabilities of AutoML. Focusing on optimization techniques that can adapt to evolving objectives, we evaluate asynchronous genetic programming and asynchronous successive halving to optimize these pipelines continually. We experiment on real and artificial data streams with varying types of concept drift to test the performance and adaptation capabilities of the proposed system. The results confirm the utility of OAML over popular online learning algorithms and underscore the benefits of continuous pipeline redesign in the presence of data drift.

[65]  arXiv:2201.09753 [pdf]
Title: Evaluation of data imputation strategies in complex, deeply-phenotyped data sets: the case of the EU-AIMS Longitudinal European Autism Project
Comments: 22 pages, 3 figures, 3 tables
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Computation (stat.CO)

An increasing number of large-scale multi-modal research initiatives has been conducted in the typically developing population, as well as in psychiatric cohorts. Missing data is a common problem in such datasets due to the difficulty of assessing multiple measures on a large number of participants. The consequences of missing data accumulate when researchers aim to explore relationships between multiple measures. Here we aim to evaluate different imputation strategies to fill in missing values in clinical data from a large (total N=764) and deeply characterised (i.e. range of clinical and cognitive instruments administered) sample of N=453 autistic individuals and N=311 control individuals recruited as part of the EU-AIMS Longitudinal European Autism Project (LEAP) consortium. In particular we consider a total of 160 clinical measures divided in 15 overlapping subsets of participants. We use two simple but common univariate strategies, mean and median imputation, as well as a Round Robin regression approach involving four independent multivariate regression models including a linear model, Bayesian Ridge regression, as well as several non-linear models, Decision Trees, Extra Trees and K-Neighbours regression. We evaluate the models using the traditional mean square error towards removed available data, and consider in addition the KL divergence between the observed and the imputed distributions. We show that all of the multivariate approaches tested provide a substantial improvement compared to typical univariate approaches. Further, our analyses reveal that across all 15 data-subsets tested, an Extra Trees regression approach provided the best global results. This allows the selection of a unique model to impute missing data for the LEAP project and deliver a fixed set of imputed clinical data to be used by researchers working with the LEAP dataset in the future.

[66]  arXiv:2201.09765 [pdf, ps, other]
Title: Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Standard model-free reinforcement learning algorithms optimize a policy that generates the action to be taken in the current time step in order to maximize expected future return. While flexible, it faces difficulties arising from the inefficient exploration due to its single step nature. In this work, we present Generative Planning method (GPM), which can generate actions not only for the current step, but also for a number of future steps (thus termed as generative planning). This brings several benefits to GPM. Firstly, since GPM is trained by maximizing value, the plans generated from it can be regarded as intentional action sequences for reaching high value regions. GPM can therefore leverage its generated multi-step plans for temporally coordinated exploration towards high value regions, which is potentially more effective than a sequence of actions generated by perturbing each action at single step level, whose consistent movement decays exponentially with the number of exploration steps. Secondly, starting from a crude initial plan generator, GPM can refine it to be adaptive to the task, which, in return, benefits future explorations. This is potentially more effective than commonly used action-repeat strategy, which is non-adaptive in its form of plans. Additionally, since the multi-step plan can be interpreted as the intent of the agent from now to a span of time period into the future, it offers a more informative and intuitive signal for interpretation. Experiments are conducted on several benchmark environments and the results demonstrated its effectiveness compared with several baseline methods.

[67]  arXiv:2201.09774 [pdf, other]
Title: Hiding Behind Backdoors: Self-Obfuscation Against Generative Models
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Attack vectors that compromise machine learning pipelines in the physical world have been demonstrated in recent research, from perturbations to architectural components. Building on this work, we illustrate the self-obfuscation attack: attackers target a pre-processing model in the system, and poison the training set of generative models to obfuscate a specific class during inference. Our contribution is to describe, implement and evaluate a generalized attack, in the hope of raising awareness regarding the challenge of architectural robustness within the machine learning community.

[68]  arXiv:2201.09785 [pdf, other]
Title: Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Neural architecture search (NAS) has gained immense popularity owing to its ability to automate neural architecture design. A number of training-free metrics are recently proposed to realize NAS without training, hence making NAS more scalable. Despite their competitive empirical performances, a unified theoretical understanding of these training-free metrics is lacking. As a consequence, (a) the relationships among these metrics are unclear, (b) there is no theoretical guarantee for their empirical performances and transferability, and (c) there may exist untapped potential in training-free NAS, which can be unveiled through a unified theoretical understanding. To this end, this paper presents a unified theoretical analysis of gradient-based training-free NAS, which allows us to (a) theoretically study their relationships, (b) theoretically guarantee their generalization performances and transferability, and (c) exploit our unified theoretical understanding to develop a novel framework named hybrid NAS (HNAS) which consistently boosts training-free NAS in a principled way. Interestingly, HNAS is able to enjoy the advantages of both training-free (i.e., superior search efficiency) and training-based (i.e., remarkable search effectiveness) NAS, which we have demonstrated through extensive experiments.

[69]  arXiv:2201.09798 [pdf, other]
Title: IMO^3: Interactive Multi-Objective Off-Policy Optimization
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)

Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. We consider a more practical but challenging setting of unknown objective functions. In industry, this problem is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose interactive multi-objective off-policy optimization (IMO^3). The key idea in our approach is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO^3 identifies a near-optimal policy with high probability, depending on the amount of feedback from the designer and training data for off-policy estimation. We demonstrate its effectiveness empirically on multiple multi-objective optimization problems.

[70]  arXiv:2201.09802 [pdf, other]
Title: Constrained Policy Optimization via Bayesian World Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Improving sample-efficiency and safety are crucial challenges when deploying reinforcement learning in high-stakes real world applications. We propose LAMBDA, a novel model-based approach for policy optimization in safety critical tasks modeled via constrained Markov decision processes. Our approach utilizes Bayesian world models, and harnesses the resulting uncertainty to maximize optimistic upper bounds on the task objective, as well as pessimistic upper bounds on the safety constraints. We demonstrate LAMBDA's state of the art performance on the Safety-Gym benchmark suite in terms of sample efficiency and constraint violation.

[71]  arXiv:2201.09818 [pdf, ps, other]
Title: Optimal SQ Lower Bounds for Learning Halfspaces with Massart Noise
Subjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Statistics Theory (math.ST); Machine Learning (stat.ML)

We give tight statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise. In particular, suppose that all labels are corrupted with probability at most $\eta$. We show that for arbitrary $\eta \in [0,1/2]$ every SQ algorithm achieving misclassification error better than $\eta$ requires queries of superpolynomial accuracy or at least a superpolynomial number of queries. Further, this continues to hold even if the information-theoretically optimal error $\mathrm{OPT}$ is as small as $\exp\left(-\log^c(d)\right)$, where $d$ is the dimension and $0 < c < 1$ is an arbitrary absolute constant, and an overwhelming fraction of examples are noiseless. Our lower bound matches known polynomial time algorithms, which are also implementable in the SQ framework. Previously, such lower bounds only ruled out algorithms achieving error $\mathrm{OPT} + \epsilon$ or error better than $\Omega(\eta)$ or, if $\eta$ is close to $1/2$, error $\eta - o_\eta(1)$, where the term $o_\eta(1)$ is constant in $d$ but going to 0 for $\eta$ approaching $1/2$.
As a consequence, we also show that achieving misclassification error better than $1/2$ in the $(A,\alpha)$-Tsybakov model is SQ-hard for $A$ constant and $\alpha$ bounded away from 1.

[72]  arXiv:2201.09828 [pdf, other]
Title: MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis
Comments: Accepted, ICASSP 2022
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations (late/mid fusion) or low level sensory inputs (early fusion). Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived, i.e. cognition affects perception. These top-down interactions are not captured in current deep learning models. In this work we propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training. The proposed mechanism extracts high-level representations for each modality and uses these representations to mask the sensory inputs, allowing the model to perform top-down feature masking. We apply the proposed model for multimodal sentiment recognition on CMU-MOSEI. Our method shows consistent improvements over the well established MulT and over our strong late fusion baseline, achieving state-of-the-art results.

[73]  arXiv:2201.09830 [pdf, other]
Title: Learning Graph Augmentations to Learn Graph Representations
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Devising augmentations for graph contrastive learning is challenging due to their irregular structure, drastic distribution shifts, and nonequivalent feature spaces across datasets. We introduce LG2AR, Learning Graph Augmentations to Learn Graph Representations, which is an end-to-end automatic graph augmentation framework that helps encoders learn generalizable representations on both node and graph levels. LG2AR consists of a probabilistic policy that learns a distribution over augmentations and a set of probabilistic augmentation heads that learn distributions over augmentation parameters. We show that LG2AR achieves state-of-the-art results on 18 out of 20 graph-level and node-level benchmarks compared to previous unsupervised models under both linear and semi-supervised evaluation protocols. The source code will be released here: https://github.com/kavehhassani/lg2ar

[74]  arXiv:2201.09857 [pdf, other]
Title: TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence
Subjects: Machine Learning (cs.LG)

Risk-averse problems receive far less attention than risk-neutral control problems in reinforcement learning, and existing risk-averse approaches are challenging to deploy to real-world applications. One primary reason is that such risk-averse algorithms often learn from consecutive trajectories with a certain length, which significantly increases the potential danger of causing dangerous failures in practice. This paper proposes Transition-based VOlatility-controlled Policy Search (TOPS), a novel algorithm that solves risk-averse problems by learning from (possibly non-consecutive) transitions instead of only consecutive trajectories. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of risk-averse policy search methods.

[75]  arXiv:2201.09871 [pdf, other]
Title: On Evaluation Metrics for Graph Generative Models
Comments: Published as a conference paper at ICLR 2022
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many cases it fails to consider underlying edge and node features, and iii) it is prohibitively slow to perform. In this work, we mitigate these issues by searching for scalar, domain-agnostic, and scalable metrics for evaluating and ranking GGMs. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-specific network. Motivated by the power of certain Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN. We design experiments to thoroughly test metrics on their ability to measure the diversity and fidelity of generated graphs, as well as their sample and computational efficiency. Depending on the quantity of samples, we recommend one of two random-GNN-based metrics that we show to be more expressive than pre-existing metrics. While we focus on applying these metrics to GGM evaluation, in practice this enables the ability to easily compute the dissimilarity between any two sets of graphs regardless of domain. Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics.

[76]  arXiv:2201.09874 [pdf, other]
Title: CVAE-H: Conditionalizing Variational Autoencoders via Hypernetworks and Trajectory Forecasting for Autonomous Driving
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

The task of predicting stochastic behaviors of road agents in diverse environments is a challenging problem for autonomous driving. To best understand scene contexts and produce diverse possible future states of the road agents adaptively in different environments, a prediction model should be probabilistic, multi-modal, context-driven, and general. We present Conditionalizing Variational AutoEncoders via Hypernetworks (CVAE-H); a conditional VAE that extensively leverages hypernetwork and performs generative tasks for high-dimensional problems like the prediction task. We first evaluate CVAE-H on simple generative experiments to show that CVAE-H is probabilistic, multi-modal, context-driven, and general. Then, we demonstrate that the proposed model effectively solves a self-driving prediction problem by producing accurate predictions of road agents in various environments.

Cross-lists for Tue, 25 Jan 22

[77]  arXiv:2201.08860 (cross-list from cs.CL) [pdf, other]
Title: GreaseLM: Graph REASoning Enhanced Language Models for Question Answering
Comments: Published at ICLR 2022. All code, data, and pretrained models are available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it. However, pretrained language models (LM), the foundation of most modern QA systems, do not robustly represent latent relationships between concepts, which is necessary for reasoning. While knowledge graphs (KG) are often used to augment LMs with structured representations of world knowledge, it remains an open question how to effectively fuse and reason over the KG representations and the language context, which provides situational constraints and nuances. In this work, we propose GreaseLM, a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations. Information from both modalities propagates to the other, allowing language context representations to be grounded by structured world knowledge, and allowing linguistic nuances (e.g., negation, hedging) in the context to inform the graph representations of knowledge. Our results on three benchmarks in the commonsense reasoning (i.e., CommonsenseQA, OpenbookQA) and medical question answering (i.e., MedQA-USMLE) domains demonstrate that GreaseLM can more reliably answer questions that require reasoning over both situational constraints and structured knowledge, even outperforming models 8x larger.

[78]  arXiv:2201.08862 (cross-list from hep-lat) [pdf, other]
Title: Stochastic normalizing flows as non-equilibrium transformations
Comments: 28 pages, 8 figures
Subjects: High Energy Physics - Lattice (hep-lat); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Machine Learning (stat.ML)

Normalizing flows are a class of deep generative models that provide a promising route to sample lattice field theories more efficiently than conventional Monte~Carlo simulations. In this work we show that the theoretical framework of stochastic normalizing flows, in which neural-network layers are combined with Monte~Carlo updates, is the same that underlies out-of-equilibrium simulations based on Jarzynski's equality, which have been recently deployed to compute free-energy differences in lattice gauge theories. We lay out a strategy to optimize the efficiency of this extended class of generative models and present examples of applications.

[79]  arXiv:2201.08865 (cross-list from eess.IV) [pdf, other]
Title: On the in vivo recognition of kidney stones using machine learning
Comments: Paper submitted to Computer Methods and Programs in Biomedicine (CMPB)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Determining the type of kidney stones allows urologists to prescribe a treatment to avoid recurrence of renal lithiasis. An automated in-vivo image-based classification method would be an important step towards an immediate identification of the kidney stone type required as a first phase of the diagnosis. In the literature it was shown on ex-vivo data (i.e., in very controlled scene and image acquisition conditions) that an automated kidney stone classification is indeed feasible. This pilot study compares the kidney stone recognition performances of six shallow machine learning methods and three deep-learning architectures which were tested with in-vivo images of the four most frequent urinary calculi types acquired with an endoscope during standard ureteroscopies. This contribution details the database construction and the design of the tested kidney stones classifiers. Even if the best results were obtained by the Inception v3 architecture (weighted precision, recall and F1-score of 0.97, 0.98 and 0.97, respectively), it is also shown that choosing an appropriate colour space and texture features allows a shallow machine learning method to approach closely the performances of the most promising deep-learning methods (the XGBoost classifier led to weighted precision, recall and F1-score values of 0.96). This paper is the first one that explores the most discriminant features to be extracted from images acquired during ureteroscopies.

[80]  arXiv:2201.08878 (cross-list from quant-ph) [pdf, other]
Title: Tensor Ring Parametrized Variational Quantum Circuits for Large Scale Quantum Machine Learning
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

Quantum Machine Learning (QML) is an emerging research area advocating the use of quantum computing for advancement in machine learning. Since the discovery of the capability of Parametrized Variational Quantum Circuits (VQC) to replace Artificial Neural Networks, they have been widely adopted to different tasks in Quantum Machine Learning. However, despite their potential to outperform neural networks, VQCs are limited to small scale applications given the challenges in scalability of quantum circuits. To address this shortcoming, we propose an algorithm that compresses the quantum state within the circuit using a tensor ring representation. Using the input qubit state in the tensor ring representation, single qubit gates maintain the tensor ring representation. However, the same is not true for two qubit gates in general, where an approximation is used to have the output as a tensor ring representation. Using this approximation, the storage and computational time increases linearly in the number of qubits and number of layers, as compared to the exponential increase with exact simulation algorithms. This approximation is used to implement the tensor ring VQC. The training of the parameters of tensor ring VQC is performed using a gradient descent based algorithm, where efficient approaches for backpropagation are used. The proposed approach is evaluated on two datasets: Iris and MNIST for the classification task to show the improved accuracy using more number of qubits. We achieve a test accuracy of 83.33\% on Iris dataset and a maximum of 99.30\% and 76.31\% on binary and ternary classification of MNIST dataset using various circuit architectures. The results from the IRIS dataset outperform the results on VQC implemented on Qiskit, and being scalable, demonstrates the potential for VQCs to be used for large scale Quantum Machine Learning applications.

[81]  arXiv:2201.08893 (cross-list from cs.CV) [pdf, other]
Title: Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers
Comments: Accepted at SVRHM 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Feature preference in Convolutional Neural Network (CNN) image classifiers is integral to their decision making process, and while the topic has been well studied, it is still not understood at a fundamental level. We test a range of task relevant feature attributes (including shape, texture, and color) with varying degrees of signal and noise in highly controlled CNN image classification experiments using synthetic datasets to determine feature preferences. We find that CNNs will prefer features with stronger signal strength and lower noise irrespective of whether the feature is texture, shape, or color. This provides guidance for a predictive model for task relevant feature preferences, demonstrates pathways for bias in machine models that can be avoided with careful controls on experimental setup, and suggests that comparisons between how humans and machines prefer task relevant features in vision classification tasks should be revisited. Code to reproduce experiments in this paper can be found at \url{https://github.com/mwolff31/signal_preference}.

[82]  arXiv:2201.08894 (cross-list from q-bio.BM) [pdf]
Title: Reinforcement Learning for Personalized Drug Discovery and Design for Complex Diseases: A Systems Pharmacology Perspective
Comments: 19 pages, 1 figure
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)

Many multi-genic systematic diseases such as Alzheimer's disease and majority of cancers do not have effective treatments yet. Systems pharmacology is a potentially effective approach to designing personalized therapies for untreatable complexed diseases. In this article, we review the potential of reinforcement learning in systems pharmacology-oriented drug discovery and design. In spite of successful application of advanced reinforcement learning techniques to target-based drug discovery, new reinforcement learning techniques are needed to boost generalizability and transferability of reinforcement learning in partially observed and changing environments, optimize multi-objective reward functions for system-level molecular phenotype readouts and generalize predictive models for out-of-distribution data. A synergistic integration of reinforcement learning with other machine learning techniques and related fields such as biophysics and quantum computing is needed to achieve the ultimate goal of systems pharmacology-oriented de novo drug design for personalized medicine.

[83]  arXiv:2201.08903 (cross-list from stat.ML) [pdf, ps, other]
Title: Universal Online Learning with Unbounded Losses: Memory Is All You Need
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We resolve an open problem of Hanneke on the subject of universally consistent online learning with non-i.i.d. processes and unbounded losses. The notion of an optimistically universal learning rule was defined by Hanneke in an effort to study learning theory under minimal assumptions. A given learning rule is said to be optimistically universal if it achieves a low long-run average loss whenever the data generating process makes this goal achievable by some learning rule. Hanneke posed as an open problem whether, for every unbounded loss, the family of processes admitting universal learning are precisely those having a finite number of distinct values almost surely. In this paper, we completely resolve this problem, showing that this is indeed the case. As a consequence, this also offers a dramatically simpler formulation of an optimistically universal learning rule for any unbounded loss: namely, the simple memorization rule already suffices. Our proof relies on constructing random measurable partitions of the instance space and could be of independent interest for solving other open questions. We extend the results to the non-realizable setting thereby providing an optimistically universal Bayes consistent learning rule.

[84]  arXiv:2201.08909 (cross-list from eess.SY) [pdf]
Title: Uncertainty-Cognizant Model Predictive Control for Energy Management of Residential Buildings with PVT and Thermal Energy Storage
Comments: Index terms: Stochastic Model predictive control (MPC), building energy management systems (BEMSs), renewable energy resources (RES), thermal energy storage system (TESS), Mixed-integer linear stochastic optimization
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

The building sector accounts for almost 40 percent of the global energy consumption. This reveals a great opportunity to exploit renewable energy resources in buildings to achieve the climate target. In this context, this paper offers a building energy system embracing a heat pump, a thermal energy storage system along with grid-connected photovoltaic thermal (PVT) collectors to supply both electric and thermal energy demands of the building with minimum operating cost. To this end, the paper develops a stochastic model predictive control (MPC) strategy to optimally determine the set-point of the whole building energy system while accounting for the uncertainties associated with the PVT energy generation. This system enables the building to 1-shift its electric demand from high-peak to off-peak hours and 2- sell electricity to the grid to make energy arbitrage.

[85]  arXiv:2201.08932 (cross-list from stat.ML) [pdf, other]
Title: Overcoming Oversmoothness in Graph Convolutional Networks via Hybrid Scattering Networks
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Geometric deep learning (GDL) has made great strides towards generalizing the design of structure-aware neural network architectures from traditional domains to non-Euclidean ones, such as graphs. This gave rise to graph neural network (GNN) models that can be applied to graph-structured datasets arising, for example, in social networks, biochemistry, and material science. Graph convolutional networks (GCNs) in particular, inspired by their Euclidean counterparts, have been successful in processing graph data by extracting structure-aware features. However, current GNN models (and GCNs in particular) are known to be constrained by various phenomena that limit their expressive power and ability to generalize to more complex graph datasets. Most models essentially rely on low-pass filtering of graph signals via local averaging operations, thus leading to oversmoothing. Here, we propose a hybrid GNN framework that combines traditional GCN filters with band-pass filters defined via the geometric scattering transform. We further introduce an attention framework that allows the model to locally attend over the combined information from different GNN filters at the node level. Our theoretical results establish the complementary benefits of the scattering filters to leverage structural information from the graph, while our experiments show the benefits of our method on various learning tasks.

[86]  arXiv:2201.08951 (cross-list from cs.CV) [pdf, other]
Title: Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime
Comments: Accepted to ICASSP-2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to convolutions on various computer vision tasks. In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. The motivation is to reduce the number of manual annotations required to train a visual embedder, and to produce generalizable, semantically meaningful and robust embeddings. For few-shot image classification we train SSL-ViTs without any supervision, on external data, and use this trained embedder to adapt quickly to novel classes with limited number of labels. For zero-shot image retrieval, we use SSL-ViTs pre-trained on a large dataset without any labels and fine-tune them with several metric learning objectives. Our self-supervised attention representations outperforms the state-of-the-art on several public benchmarks for both tasks, namely miniImageNet and CUB200 for few-shot image classification by up-to 6%-10%, and Stanford Online Products, Cars196 and CUB200 for zero-shot image retrieval by up-to 4%-11%. Code is available at \url{https://github.com/AutoVision-cloud/SSL-ViT-lowlabel-highdata}.

[87]  arXiv:2201.08955 (cross-list from eess.IV) [pdf, other]
Title: Modality Bank: Learn multi-modality images across data centers without sharing medical data
Comments: arXiv admin note: substantial text overlap with arXiv:2012.08604
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Multi-modality images have been widely used and provide comprehensive information for medical image analysis. However, acquiring all modalities among all institutes is costly and often impossible in clinical settings. To leverage more comprehensive multi-modality information, we propose a privacy secured decentralized multi-modality adaptive learning architecture named ModalityBank. Our method could learn a set of effective domain-specific modulation parameters plugged into a common domain-agnostic network. We demonstrate by switching different sets of configurations, the generator could output high-quality images for a specific modality. Our method could also complete the missing modalities across all data centers, thus could be used for modality completion purposes. The downstream task trained from the synthesized multi-modality samples could achieve higher performance than learning from one real data center and achieve close-to-real performance compare with all real images.

[88]  arXiv:2201.08956 (cross-list from stat.ML) [pdf, other]
Title: The Many Faces of Adversarial Risk
Comments: A version of this paper was presented at NeurIPS 2021
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)

Adversarial risk quantifies the performance of classifiers on adversarially perturbed data. Numerous definitions of adversarial risk -- not all mathematically rigorous and differing subtly in the details -- have appeared in the literature. In this paper, we revisit these definitions, make them rigorous, and critically examine their similarities and differences. Our technical tools derive from optimal transport, robust statistics, functional analysis, and game theory. Our contributions include the following: generalizing Strassen's theorem to the unbalanced optimal transport setting with applications to adversarial classification with unequal priors; showing an equivalence between adversarial robustness and robust hypothesis testing with $\infty$-Wasserstein uncertainty sets; proving the existence of a pure Nash equilibrium in the two-player game between the adversary and the algorithm; and characterizing adversarial risk by the minimum Bayes error between a pair of distributions belonging to the $\infty$-Wasserstein uncertainty sets. Our results generalize and deepen recently discovered connections between optimal transport and adversarial robustness and reveal new connections to Choquet capacities and game theory.

[89]  arXiv:2201.08968 (cross-list from cs.RO) [pdf, other]
Title: Mechanical Search on Shelves using a Novel "Bluction" Tool
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Shelves are common in homes, warehouses, and commercial settings due to their storage efficiency. However, this efficiency comes at the cost of reduced visibility and accessibility. When looking from a side (lateral) view of a shelf, most objects will be fully occluded, resulting in a constrained lateral-access mechanical search problem. To address this problem, we introduce: (1) a novel bluction tool, which combines a thin pushing blade and suction cup gripper, (2) an improved LAX-RAY simulation pipeline and perception model that combines ray-casting with 2D Minkowski sums to efficiently generate target occupancy distributions, and (3) a novel SLAX-RAY search policy, which optimally reduces target object distribution support area using the bluction tool. Experimental data from 2000 simulated shelf trials and 18 trials with a physical Fetch robot equipped with the bluction tool suggest that using suction grasping actions improves the success rate over the highest performing push-only policy by 26% in simulation and 67% in physical environments.

[90]  arXiv:2201.09032 (cross-list from cs.SD) [pdf, other]
Title: NAS-VAD: Neural Architecture Search for Voice Activity Detection
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

The need for automatic design of deep neural networks has led to the emergence of neural architecture search (NAS), which has generated models outperforming manually-designed models. However, most existing NAS frameworks are designed for image processing tasks, and lack structures and operations effective for voice activity detection (VAD) tasks. To discover improved VAD models through automatic design, we present the first work that proposes a NAS framework optimized for the VAD task. The proposed NAS-VAD framework expands the existing search space with the attention mechanism while incorporating the compact macro-architecture with fewer cells. The experimental results show that the models discovered by NAS-VAD outperform the existing manually-designed VAD models in various synthetic and real-world datasets. Our code and models are available at https://github.com/daniel03c1/NAS_VAD.

[91]  arXiv:2201.09042 (cross-list from cs.CV) [pdf, other]
Title: Uncertainty-aware deep learning methods for robust diabetic retinopathy classification
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Automatic classification of diabetic retinopathy from retinal images has been widely studied using deep neural networks with impressive results. However, there is a clinical need for estimation of the uncertainty in the classifications, a shortcoming of modern neural networks. Recently, approximate Bayesian deep learning methods have been proposed for the task but the studies have only considered the binary referable/non-referable diabetic retinopathy classification applied to benchmark datasets. We present novel results by systematically investigating a clinical dataset and a clinically relevant 5-class classification scheme, in addition to benchmark datasets and the binary classification scheme. Moreover, we derive a connection between uncertainty measures and classifier risk, from which we develop a new uncertainty measure. We observe that the previously proposed entropy-based uncertainty measure generalizes to the clinical dataset on the binary classification scheme but not on the 5-class scheme, whereas our new uncertainty measure generalizes to the latter case.

[92]  arXiv:2201.09054 (cross-list from math.AT) [pdf, ps, other]
Title: Topological data analysis and clustering
Comments: This article is intended to be a chapter for a book
Subjects: Algebraic Topology (math.AT); Machine Learning (cs.LG)

Clustering is one of the most common tasks of Machine Learning. In this paper we examine how ideas from topology can be used to improve clustering techniques.

[93]  arXiv:2201.09058 (cross-list from q-fin.TR) [pdf, other]
Title: DeepScalper: A Risk-Aware Deep Reinforcement Learning Framework for Intraday Trading with Micro-level Market Embedding
Subjects: Trading and Market Microstructure (q-fin.TR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Reinforcement learning (RL) techniques have shown great success in quantitative investment tasks, such as portfolio management and algorithmic trading. Especially, intraday trading is one of the most profitable and risky tasks because of the intraday behaviors of the financial market that reflect billions of rapidly fluctuating values. However, it is hard to apply existing RL methods to intraday trading due to the following three limitations: 1) overlooking micro-level market information (e.g., limit order book); 2) only focusing on local price fluctuation and failing to capture the overall trend of the whole trading day; 3) neglecting the impact of market risk. To tackle these limitations, we propose DeepScalper, a deep reinforcement learning framework for intraday trading. Specifically, we adopt an encoder-decoder architecture to learn robust market embedding incorporating both macro-level and micro-level market information. Moreover, a novel hindsight reward function is designed to provide the agent a long-term horizon for capturing the overall price trend. In addition, we propose a risk-aware auxiliary task by predicting future volatility, which helps the agent take market risk into consideration while maximizing profit. Finally, extensive experiments on two stock index futures and four treasury bond futures demonstrate that DeepScalper achieves significant improvement against many state-of-the-art approaches.

[94]  arXiv:2201.09061 (cross-list from cs.CV) [pdf, other]
Title: Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)

Facial expressions are a form of non-verbal communication that humans perform seamlessly for meaningful transfer of information. Most of the literature addresses the facial expression recognition aspect however, with the advent of Generative Models, it has become possible to explore the affect space in addition to mere classification of a set of expressions. In this article, we propose a generative model architecture which robustly generates a set of facial expressions for multiple character identities and explores the possibilities of generating complex expressions by combining the simple ones.

[95]  arXiv:2201.09075 (cross-list from cs.NI) [pdf, ps, other]
Title: Dynamic Channel Access via Meta-Reinforcement Learning
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Systems and Control (eess.SY)

In this paper, we address the channel access problem in a dynamic wireless environment via meta-reinforcement learning. Spectrum is a scarce resource in wireless communications, especially with the dramatic increase in the number of devices in networks. Recently, inspired by the success of deep reinforcement learning (DRL), extensive studies have been conducted in addressing wireless resource allocation problems via DRL. However, training DRL algorithms usually requires a massive amount of data collected from the environment for each specific task and the well-trained model may fail if there is a small variation in the environment. In this work, in order to address these challenges, we propose a meta-DRL framework that incorporates the method of Model-Agnostic Meta-Learning (MAML). In the proposed framework, we train a common initialization for similar channel selection tasks. From the initialization, we show that only a few gradient descents are required for adapting to different tasks drawn from the same distribution. We demonstrate the performance improvements via simulation results.

[96]  arXiv:2201.09079 (cross-list from cs.CV) [pdf, other]
Title: Implicit Bias of Projected Subgradient Method Gives Provable Robust Recovery of Subspaces of Unknown Codimension
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Robust subspace recovery (RSR) is a fundamental problem in robust representation learning. Here we focus on a recently proposed RSR method termed Dual Principal Component Pursuit (DPCP) approach, which aims to recover a basis of the orthogonal complement of the subspace and is amenable to handling subspaces of high relative dimension. Prior work has shown that DPCP can provably recover the correct subspace in the presence of outliers, as long as the true dimension of the subspace is known. We show that DPCP can provably solve RSR problems in the {\it unknown} subspace dimension regime, as long as orthogonality constraints -- adopted in previous DPCP formulations -- are relaxed and random initialization is used instead of spectral one. Namely, we propose a very simple algorithm based on running multiple instances of a projected sub-gradient descent method (PSGM), with each problem instance seeking to find one vector in the null space of the subspace. We theoretically prove that under mild conditions this approach will succeed with high probability. In particular, we show that 1) all of the problem instances will converge to a vector in the nullspace of the subspace and 2) the ensemble of problem instance solutions will be sufficiently diverse to fully span the nullspace of the subspace thus also revealing its true unknown codimension. We provide empirical results that corroborate our theoretical results and showcase the remarkable implicit rank regularization behavior of PSGM algorithm that allows us to perform RSR without being aware of the subspace dimension.

[97]  arXiv:2201.09119 (cross-list from cs.CL) [pdf, other]
Title: A Causal Lens for Controllable Text Generation
Comments: NeurIPS 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)

Controllable text generation concerns two fundamental tasks of wide applications, namely generating text of given attributes (i.e., attribute-conditional generation), and minimally editing existing text to possess desired attributes (i.e., text attribute transfer). Extensive prior work has largely studied the two problems separately, and developed different conditional models which, however, are prone to producing biased text (e.g., various gender stereotypes). This paper proposes to formulate controllable text generation from a principled causal perspective which models the two tasks with a unified framework. A direct advantage of the causal formulation is the use of rich causality tools to mitigate generation biases and improve control. We treat the two tasks as interventional and counterfactual causal inference based on a structural causal model, respectively. We then apply the framework to the challenging practical setting where confounding factors (that induce spurious correlations) are observable only on a small fraction of data. Experiments show significant superiority of the causal approach over previous conditional models for improved control accuracy and reduced bias.

[98]  arXiv:2201.09130 (cross-list from cs.AI) [pdf]
Title: Artificial Intelligence for Suicide Assessment using Audiovisual Cues: A Review
Comments: Manuscirpt submitted to Arificial Intelligence Reviews
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Death by suicide is the seventh of the leading death cause worldwide. The recent advancement in Artificial Intelligence (AI), specifically AI application in image and voice processing, has created a promising opportunity to revolutionize suicide risk assessment. Subsequently, we have witnessed fast-growing literature of researches that applies AI to extract audiovisual non-verbal cues for mental illness assessment. However, the majority of the recent works focus on depression, despite the evident difference between depression signs and suicidal behavior non-verbal cues. In this paper, we review the recent works that study suicide ideation and suicide behavior detection through audiovisual feature analysis, mainly suicidal voice/speech acoustic features analysis and suicidal visual cues.

[99]  arXiv:2201.09132 (cross-list from eess.IV) [pdf, other]
Title: CNN-based regularisation for CT image reconstructions
Authors: Attila Juhos
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

X-ray computed tomographic infrastructures are medical imaging modalities that rely on the acquisition of rays crossing examined objects while measuring their intensity decrease. Physical measurements are post-processed by mathematical reconstruction algorithms that may offer weaker or top-notch consistency guarantees on the computed volumetric field. Superior results are provided on the account of an abundance of low-noise measurements being supplied. Nonetheless, such a scanning process would expose the examined body to an undesirably large-intensity and long-lasting ionising radiation, imposing severe health risks. One main objective of the ongoing research is the reduction of the number of projections while keeping the quality performance stable. Due to the under-sampling, the noise occurring inherently because of photon-electron interactions is now supplemented by reconstruction artifacts. Recently, deep learning methods, especially fully convolutional networks have been extensively investigated and proven to be efficient in filtering such deviations. In this report algorithms are presented that take as input a slice of a low-quality reconstruction of the volume in question and aim to map it to the reconstruction that is considered ideal, the ground truth. Above that, the first system comprises two additional elements: firstly, it ensures the consistency with the measured sinogram, secondly it adheres to constraints proposed in classical compressive sampling theory. The second one, inspired by classical ways of solving the inverse problem of reconstruction, takes an iterative approach to regularise the hypothesis in the direction of the correct result.

[100]  arXiv:2201.09134 (cross-list from quant-ph) [pdf, other]
Title: Data-Centric Machine Learning in Quantum Information Science
Subjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We propose a series of data-centric heuristics for improving the performance of machine learning systems when applied to problems in quantum information science. In particular, we consider how systematic engineering of training sets can significantly enhance the accuracy of pre-trained neural networks used for quantum state reconstruction without altering the underlying architecture. We find that it is not always optimal to engineer training sets to exactly match the expected distribution of a target scenario, and instead, performance can be further improved by biasing the training set to be slightly more mixed than the target. This is due to the heterogeneity in the number of free variables required to describe states of different purity, and as a result, overall accuracy of the network improves when training sets of a fixed size focus on states with the least constrained free variables. For further clarity, we also include a "toy model" demonstration of how spurious correlations can inadvertently enter synthetic data sets used for training, how the performance of systems trained with these correlations can degrade dramatically, and how the inclusion of even relatively few counterexamples can effectively remedy such problems.

[101]  arXiv:2201.09135 (cross-list from cs.CV) [pdf, other]
Title: MIDAS: Deep learning human action intention prediction from natural eye movement patterns
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Eye movements have long been studied as a window into the attentional mechanisms of the human brain and made accessible as novelty style human-machine interfaces. However, not everything that we gaze upon, is something we want to interact with; this is known as the Midas Touch problem for gaze interfaces. To overcome the Midas Touch problem, present interfaces tend not to rely on natural gaze cues, but rather use dwell time or gaze gestures. Here we present an entirely data-driven approach to decode human intention for object manipulation tasks based solely on natural gaze cues. We run data collection experiments where 16 participants are given manipulation and inspection tasks to be performed on various objects on a table in front of them. The subjects' eye movements are recorded using wearable eye-trackers allowing the participants to freely move their head and gaze upon the scene. We use our Semantic Fovea, a convolutional neural network model to obtain the objects in the scene and their relation to gaze traces at every frame. We then evaluate the data and examine several ways to model the classification task for intention prediction. Our evaluation shows that intention prediction is not a naive result of the data, but rather relies on non-linear temporal processing of gaze cues. We model the task as a time series classification problem and design a bidirectional Long-Short-Term-Memory (LSTM) network architecture to decode intentions. Our results show that we can decode human intention of motion purely from natural gaze cues and object relative position, with $91.9\%$ accuracy. Our work demonstrates the feasibility of natural gaze as a Zero-UI interface for human-machine interaction, i.e., users will only need to act naturally, and do not need to interact with the interface itself or deviate from their natural eye movement patterns.

[102]  arXiv:2201.09139 (cross-list from cs.CV) [pdf, other]
Title: Dual-Flattening Transformers through Decomposed Row and Column Queries for Semantic Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

It is critical to obtain high resolution features with long range dependency for dense prediction tasks such as semantic segmentation. To generate high-resolution output of size $H\times W$ from a low-resolution feature map of size $h\times w$ ($hw\ll HW$), a naive dense transformer incurs an intractable complexity of $\mathcal{O}(hwHW)$, limiting its application on high-resolution dense prediction. We propose a Dual-Flattening Transformer (DFlatFormer) to enable high-resolution output by reducing complexity to $\mathcal{O}(hw(H+W))$ that is multiple orders of magnitude smaller than the naive dense transformer. Decomposed queries are presented to retrieve row and column attentions tractably through separate transformers, and their outputs are combined to form a dense feature map at high resolution. To this end, the input sequence fed from an encoder is row-wise and column-wise flattened to align with decomposed queries by preserving their row and column structures, respectively. Row and column transformers also interact with each other to capture their mutual attentions with the spatial crossings between rows and columns. We also propose to perform attentions through efficient grouping and pooling to further reduce the model complexity. Extensive experiments on ADE20K and Cityscapes datasets demonstrate the superiority of the proposed dual-flattening transformer architecture with higher mIoUs.

[103]  arXiv:2201.09146 (cross-list from cs.CL) [pdf, other]
Title: Question rewriting? Assessing its importance for conversational question answering
Comments: Submitted manuscript (without anonymized content) accepted to the 44th European Conference on Information Retrieval (ECIR) 2022. This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution is published in [insert volume title], and is available online at this https URL[insert DOI]
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In conversational question answering, systems must correctly interpret the interconnected interactions and generate knowledgeable answers, which may require the retrieval of relevant information from a background repository. Recent approaches to this problem leverage neural language models, although different alternatives can be considered in terms of modules for (a) representing user questions in context, (b) retrieving the relevant background information, and (c) generating the answer. This work presents a conversational question answering system designed specifically for the Search-Oriented Conversational AI (SCAI) shared task, and reports on a detailed analysis of its question rewriting module. In particular, we considered different variations of the question rewriting module to evaluate the influence on the subsequent components, and performed a careful analysis of the results obtained with the best system configuration. Our system achieved the best performance in the shared task and our analysis emphasizes the importance of the conversation context representation for the overall system performance.

[104]  arXiv:2201.09149 (cross-list from cs.MA) [pdf, other]
Title: Multi-Agent Adversarial Attacks for Multi-Channel Communications
Subjects: Multiagent Systems (cs.MA); Information Theory (cs.IT); Machine Learning (cs.LG)

Recently Reinforcement Learning (RL) has been applied as an anti-adversarial remedy in wireless communication networks. However, studying the RL-based approaches from the adversary's perspective has received little attention. Additionally, RL-based approaches in an anti-adversary or adversarial paradigm mostly consider single-channel communication (either channel selection or single channel power control), while multi-channel communication is more common in practice. In this paper, we propose a multi-agent adversary system (MAAS) for modeling and analyzing adversaries in a wireless communication scenario by careful design of the reward function under realistic communication scenarios. In particular, by modeling the adversaries as learning agents, we show that the proposed MAAS is able to successfully choose the transmitted channel(s) and their respective allocated power(s) without any prior knowledge of the sender strategy. Compared to the single-agent adversary (SAA), multi-agents in MAAS can achieve significant reduction in signal-to-noise ratio (SINR) under the same power constraints and partial observability, while providing improved stability and a more efficient learning process. Moreover, through empirical studies we show that the results in simulation are close to the ones in communication in reality, a conclusion that is pivotal to the validity of performance of agents evaluated in simulations.

[105]  arXiv:2201.09151 (cross-list from cs.CY) [pdf, other]
Title: External Stability Auditing to Test the Validity of Personality Prediction in AI Hiring
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Our approach is to (a) develop a methodology for an external audit of stability of predictions made by algorithmic personality tests, and (b) instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Crucially, rather than challenging or affirming the assumptions made in psychometric testing -- that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job -- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves.
In our audit of Humantic AI and Crystal, we find that both systems show substantial instability with respect to key facets of measurement, and so cannot be considered valid testing instruments. For example, Crystal frequently computes different personality scores if the same resume is given in PDF vs. in raw text format, violating the assumption that the output of an algorithmic personality test is stable across job-irrelevant variations in the input. Among other notable findings is evidence of persistent -- and often incorrect -- data linkage by Humantic AI.

[106]  arXiv:2201.09152 (cross-list from cs.CV) [pdf, other]
Title: Generative Adversarial Network Applications in Creating a Meta-Universe
Comments: Computational Science and Computational Intelligence; 2021 International Conference on IEEE CPS (IEEE XPLORE, Scopus), IEEE, 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Generative Adversarial Networks (GANs) are machine learning methods that are used in many important and novel applications. For example, in imaging science, GANs are effectively utilized in generating image datasets, photographs of human faces, image and video captioning, image-to-image translation, text-to-image translation, video prediction, and 3D object generation to name a few. In this paper, we discuss how GANs can be used to create an artificial world. More specifically, we discuss how GANs help to describe an image utilizing image/video captioning methods and how to translate the image to a new image using image-to-image translation frameworks in a theme we desire. We articulate how GANs impact creating a customized world.

[107]  arXiv:2201.09178 (cross-list from cs.AI) [pdf, other]
Title: Dichotomic Pattern Mining with Applications to Intent Prediction from Semi-Structured Clickstream Datasets
Comments: The AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services (KDF@AAAI'22)
Subjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)

We introduce a pattern mining framework that operates on semi-structured datasets and exploits the dichotomy between outcomes. Our approach takes advantage of constraint reasoning to find sequential patterns that occur frequently and exhibit desired properties. This allows the creation of novel pattern embeddings that are useful for knowledge extraction and predictive modeling. Finally, we present an application on customer intent prediction from digital clickstream data. Overall, we show that pattern embeddings play an integrator role between semi-structured data and machine learning models, improve the performance of the downstream task and retain interpretability.

[108]  arXiv:2201.09186 (cross-list from cs.CR) [pdf, other]
Title: pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network Testing
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

This paper proposes a new approach for privacy-preserving and verifiable convolutional neural network (CNN) testing, enabling a CNN model developer to convince a user of the truthful CNN performance over non-public data from multiple testers, while respecting model privacy. To balance the security and efficiency issues, three new efforts are done by appropriately integrating homomorphic encryption (HE) and zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK) primitives with the CNN testing. First, a CNN model to be tested is strategically partitioned into a private part kept locally by the model developer, and a public part outsourced to an outside server. Then, the private part runs over HE-protected test data sent by a tester and transmits its outputs to the public part for accomplishing subsequent computations of the CNN testing. Second, the correctness of the above CNN testing is enforced by generating zk-SNARK based proofs, with an emphasis on optimizing proving overhead for two-dimensional (2-D) convolution operations, since the operations dominate the performance bottleneck during generating proofs. We specifically present a new quadratic matrix programs (QMPs)-based arithmetic circuit with a single multiplication gate for expressing 2-D convolution operations between multiple filters and inputs in a batch manner. Third, we aggregate multiple proofs with respect to a same CNN model but different testers' test data (i.e., different statements) into one proof, and ensure that the validity of the aggregated proof implies the validity of the original multiple proofs. Lastly, our experimental results demonstrate that our QMPs-based zk-SNARK performs nearly 13.9$\times$faster than the existing QAPs-based zk-SNARK in proving time, and 17.6$\times$faster in Setup time, for high-dimension matrix multiplication.

[109]  arXiv:2201.09189 (cross-list from cs.AR) [pdf, other]
Title: Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

Graph neural networks (GNNs) process large-scale graphs consisting of a hundred billion edges. In contrast to traditional deep learning, unique behaviors of the emerging GNNs are engaged with a large set of graphs and embedding data on storage, which exhibits complex and irregular preprocessing.
We propose a novel deep learning framework on large graphs, HolisticGNN, that provides an easy-to-use, near-storage inference infrastructure for fast, energy-efficient GNN processing. To achieve the best end-to-end latency and high energy efficiency, HolisticGNN allows users to implement various GNN algorithms and directly executes them where the actual data exist in a holistic manner. It also enables RPC over PCIe such that the users can simply program GNNs through a graph semantic library without any knowledge of the underlying hardware or storage configurations.
We fabricate HolisticGNN's hardware RTL and implement its software on an FPGA-based computational SSD (CSSD). Our empirical evaluations show that the inference time of HolisticGNN outperforms GNN inference services using high-performance modern GPUs by 7.1x while reducing energy consumption by 33.2x, on average.

[110]  arXiv:2201.09193 (cross-list from cs.CV) [pdf, other]
Title: Learning to Minimize the Remainder in Supervised Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The learning process of deep learning methods usually updates the model's parameters in multiple iterations. Each iteration can be viewed as the first-order approximation of Taylor's series expansion. The remainder, which consists of higher-order terms, is usually ignored in the learning process for simplicity. This learning scheme empowers various multimedia based applications, such as image retrieval, recommendation system, and video search. Generally, multimedia data (e.g., images) are semantics-rich and high-dimensional, hence the remainders of approximations are possibly non-zero. In this work, we consider the remainder to be informative and study how it affects the learning process. To this end, we propose a new learning approach, namely gradient adjustment learning (GAL), to leverage the knowledge learned from the past training iterations to adjust vanilla gradients, such that the remainders are minimized and the approximations are improved. The proposed GAL is model- and optimizer-agnostic, and is easy to adapt to the standard learning framework. It is evaluated on three tasks, i.e., image classification, object detection, and regression, with state-of-the-art models and optimizers. The experiments show that the proposed GAL consistently enhances the evaluated models, whereas the ablation studies validate various aspects of the proposed GAL. The code is available at \url{https://github.com/luoyan407/gradient_adjustment.git}.

[111]  arXiv:2201.09194 (cross-list from stat.ME) [pdf, other]
Title: Distributed Learning of Generalized Linear Causal Networks
Comments: 27 pages, 3 tables, 3 figures
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Computation (stat.CO)

We consider the task of learning causal structures from data stored on multiple machines, and propose a novel structure learning method called distributed annealing on regularized likelihood score (DARLS) to solve this problem. We model causal structures by a directed acyclic graph that is parameterized with generalized linear models, so that our method is applicable to various types of data. To obtain a high-scoring causal graph, DARLS simulates an annealing process to search over the space of topological sorts, where the optimal graphical structure compatible with a sort is found by a distributed optimization method. This distributed optimization relies on multiple rounds of communication between local and central machines to estimate the optimal structure. We establish its convergence to a global optimizer of the overall score that is computed on all data across local machines. To the best of our knowledge, DARLS is the first distributed method for learning causal graphs with such theoretical guarantees. Through extensive simulation studies, DARLS has shown competing performance against existing methods on distributed data, and achieved comparable structure learning accuracy and test-data likelihood with competing methods applied to pooled data across all local machines. In a real-world application for modeling protein-DNA binding networks with distributed ChIP-Sequencing data, DARLS also exhibits higher predictive power than other methods, demonstrating a great advantage in estimating causal networks from distributed data.

[112]  arXiv:2201.09213 (cross-list from cs.CV) [pdf, ps, other]
Title: FN-Net:Remove the Outliers by Filtering the Noise
Authors: Kai Lv
Comments: 6 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Establishing the correspondence between two images is an important research direction of computer vision. When estimating the relationship between two images, it is often disturbed by outliers. In this paper, we propose a convolutional neural network that can filter the noise of outliers. It can output the probability that the pair of feature points is an inlier and regress the essential matrix representing the relative pose of the camera. The outliers are mainly caused by the noise introduced by the previous processing. The outliers rejection can be treated as a problem of noise elimination, and the soft threshold function has a very good effect on noise reduction. Therefore, we designed an adaptive denoising module based on soft threshold function to remove noise components in the outliers, to reduce the probability that the outlier is predicted to be an inlier. Experimental results on the YFCC100M dataset show that our method exceeds the state-of-the-art in relative pose estimation.

[113]  arXiv:2201.09243 (cross-list from cs.CR) [pdf, other]
Title: Increasing the Cost of Model Extraction with Calibrated Proof of Work
Comments: Published as a conference paper at ICLR 2022
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In model extraction attacks, adversaries can steal a machine learning model exposed via a public API by repeatedly querying it and adjusting their own model based on obtained predictions. To prevent model stealing, existing defenses focus on detecting malicious queries, truncating, or distorting outputs, thus necessarily introducing a tradeoff between robustness and model utility for legitimate users. Instead, we propose to impede model extraction by requiring users to complete a proof-of-work before they can read the model's predictions. This deters attackers by greatly increasing (even up to 100x) the computational effort needed to leverage query access for model extraction. Since we calibrate the effort required to complete the proof-of-work to each query, this only introduces a slight overhead for regular users (up to 2x). To achieve this, our calibration applies tools from differential privacy to measure the information revealed by a query. Our method requires no modification of the victim model and can be applied by machine learning practitioners to guard their publicly exposed models against being easily stolen.

[114]  arXiv:2201.09245 (cross-list from eess.SY) [pdf, other]
Title: Fast Transient Stability Prediction Using Grid-informed Temporal and Topological Embedding Deep Neural Network
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Transient stability prediction is critically essential to the fast online assessment and maintaining the stable operation in power systems. The wide deployment of phasor measurement units (PMUs) promotes the development of data-driven approaches for transient stability assessment. This paper proposes the temporal and topological embedding deep neural network (TTEDNN) model to forecast transient stability with the early transient dynamics. The TTEDNN model can accurately and efficiently predict the transient stability by extracting the temporal and topological features from the time-series data of the early transient dynamics. The grid-informed adjacency matrix is used to incorporate the power grid structural and electrical parameter information. The transient dynamics simulation environments under the single-node and multiple-node perturbations are used to test the performance of the TTEDNN model for the IEEE 39-bus and IEEE 118-bus power systems. The results show that the TTEDNN model has the best and most robust prediction performance. Furthermore, the TTEDNN model also demonstrates the transfer capability to predict the transient stability in the more complicated transient dynamics simulation environments.

[115]  arXiv:2201.09263 (cross-list from cs.GR) [pdf, other]
Title: Differential Geometry in Neural Implicits
Subjects: Graphics (cs.GR); Machine Learning (cs.LG)

We introduce a neural implicit framework that bridges discrete differential geometry of triangle meshes and continuous differential geometry of neural implicit surfaces. It exploits the differentiable properties of neural networks and the discrete geometry of triangle meshes to approximate them as the zero-level sets of neural implicit functions.
To train a neural implicit function, we propose a loss function that allows terms with high-order derivatives, such as the alignment between the principal directions, to learn more geometric details. During training, we consider a non-uniform sampling strategy based on the discrete curvatures of the triangle mesh to access points with more geometric details. This sampling implies faster learning while preserving geometric accuracy.
We present the analytical differential geometry formulas for neural surfaces, such as normal vectors and curvatures. We use them to render the surfaces using sphere tracing. Additionally, we propose a network optimization based on singular value decomposition to reduce the number of parameters.

[116]  arXiv:2201.09267 (cross-list from stat.ML) [pdf, other]
Title: Spectral, Probabilistic, and Deep Metric Learning: Tutorial and Survey
Comments: To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This is a tutorial and survey paper on metric learning. Algorithms are divided into spectral, probabilistic, and deep metric learning. We first start with the definition of distance metric, Mahalanobis distance, and generalized Mahalanobis distance. In spectral methods, we start with methods using scatters of data, including the first spectral metric learning, relevant methods to Fisher discriminant analysis, Relevant Component Analysis (RCA), Discriminant Component Analysis (DCA), and the Fisher-HSIC method. Then, large-margin metric learning, imbalanced metric learning, locally linear metric adaptation, and adversarial metric learning are covered. We also explain several kernel spectral methods for metric learning in the feature space. We also introduce geometric metric learning methods on the Riemannian manifolds. In probabilistic methods, we start with collapsing classes in both input and feature spaces and then explain the neighborhood component analysis methods, Bayesian metric learning, information theoretic methods, and empirical risk minimization in metric learning. In deep learning methods, we first introduce reconstruction autoencoders and supervised loss functions for metric learning. Then, Siamese networks and its various loss functions, triplet mining, and triplet sampling are explained. Deep discriminant analysis methods, based on Fisher discriminant analysis, are also reviewed. Finally, we introduce multi-modal deep metric learning, geometric metric learning by neural networks, and few-shot metric learning.

[117]  arXiv:2201.09280 (cross-list from cs.HC) [pdf, other]
Title: SpiroMask: Measuring Lung Function Using Consumer-Grade Masks
Comments: 33 pages
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

According to the World Health Organisation (WHO), 235 million people suffer from respiratory illnesses and four million deaths annually. Regular lung health monitoring can lead to prognoses about deteriorating lung health conditions. This paper presents our system SpiroMask that retrofits a microphone in consumer-grade masks (N95 and cloth masks) for continuous lung health monitoring. We evaluate our approach on 48 participants (including 14 with lung health issues) and find that we can estimate parameters such as lung volume and respiration rate within the approved error range by the American Thoracic Society (ATS). Further, we show that our approach is robust to sensor placement inside the mask.

[118]  arXiv:2201.09314 (cross-list from eess.IV) [pdf]
Title: Perceptual cGAN for MRI Super-resolution
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Capturing high-resolution magnetic resonance (MR) images is a time consuming process, which makes it unsuitable for medical emergencies and pediatric patients. Low-resolution MR imaging, by contrast, is faster than its high-resolution counterpart, but it compromises on fine details necessary for a more precise diagnosis. Super-resolution (SR), when applied to low-resolution MR images, can help increase their utility by synthetically generating high-resolution images with little additional time. In this paper, we present a SR technique for MR images that is based on generative adversarial networks (GANs), which have proven to be quite useful in generating sharp-looking details in SR. We introduce a conditional GAN with perceptual loss, which is conditioned upon the input low-resolution image, which improves the performance for isotropic and anisotropic MRI super-resolution.

[119]  arXiv:2201.09352 (cross-list from cs.CV) [pdf, other]
Title: Out of Distribution Detection on ImageNet-O
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Out of distribution (OOD) detection is a crucial part of making machine learning systems robust. The ImageNet-O dataset is an important tool in testing the robustness of ImageNet trained deep neural networks that are widely used across a variety of systems and applications. We aim to perform a comparative analysis of OOD detection methods on ImageNet-O, a first of its kind dataset with a label distribution different than that of ImageNet, that has been created to aid research in OOD detection for ImageNet models. As this dataset is fairly new, we aim to provide a comprehensive benchmarking of some of the current state of the art OOD detection methods on this novel dataset. This benchmarking covers a variety of model architectures, settings where we haves prior access to the OOD data versus when we don't, predictive score based approaches, deep generative approaches to OOD detection, and more.

[120]  arXiv:2201.09354 (cross-list from cs.CV) [pdf, other]
Title: Survey and Systematization of 3D Object Detection Models and Methods
Comments: submitted to CVIU
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This paper offers a comprehensive survey of recent developments in 3D object detection covering the full pipeline from input data, over data representation and feature extraction to the actual detection modules. We include basic concepts, focus our survey on a broad spectrum of different approaches arising in the last ten years and propose a systematization which offers a practical framework to compare those approaches on the methods level.

[121]  arXiv:2201.09359 (cross-list from cs.NE) [pdf, other]
Title: Imposing Connectome-Derived Topology on an Echo State Network
Comments: 6 pages, 9 figures
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Can connectome-derived constraints inform computation? In this paper we investigate the contribution of a fruit fly connectome's topology on the performance of an Echo State Network (ESN) -- a subset of Reservoir Computing which is state of the art in chaotic time series prediction. Specifically, we replace the reservoir layer of a classical ESN -- normally a fixed, random graph represented as a 2-d matrix -- with a particular (female) fruit fly connectome-derived connectivity matrix. We refer to this experimental class of models (with connectome-derived reservoirs) as "Fruit Fly ESNs" (FFESNs). We train and validate the FFESN on a chaotic time series prediction task; here we consider four sets of trials with different training input sizes (small, large) and train-validate splits (two variants). We compare the validation performance (Mean-Squared Error) of all of the best FFESN models to a class of control model ESNs (simply referred to as "ESNs"). Overall, for all four sets of trials we find that the FFESN either significantly outperforms (and has lower variance than) the ESN; or simply has lower variance than the ESN.

[122]  arXiv:2201.09360 (cross-list from eess.IV) [pdf, other]
Title: POTHER: Patch-Voted Deep Learning-based Chest X-ray Bias Analysis for COVID-19 Detection
Comments: Submitted to International Conference on Computational Science 2022 in London
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

A critical step in the fight against COVID-19, which continues to have a catastrophic impact on peoples lives, is the effective screening of patients presented in the clinics with severe COVID-19 symptoms. Chest radiography is one of the promising screening approaches. Many studies reported detecting COVID-19 in chest X-rays accurately using deep learning. A serious limitation of many published approaches is insufficient attention paid to explaining decisions made by deep learning models. Using explainable artificial intelligence methods, we demonstrate that model decisions may rely on confounding factors rather than medical pathology. After an analysis of potential confounding factors found on chest X-ray images, we propose a novel method to minimise their negative impact. We show that our proposed method is more robust than previous attempts to counter confounding factors such as ECG leads in chest X-rays that often influence model classification decisions. In addition to being robust, our method achieves results comparable to the state-of-the-art. The source code and pre-trained weights are publicly available (https://github.com/tomek1911/POTHER).

[123]  arXiv:2201.09367 (cross-list from cs.GR) [pdf, other]
Title: Sketch2PQ: Freeform Planar Quadrilateral Mesh Design via a Single Sketch
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The freeform architectural modeling process often involves two important stages: concept design and digital modeling. In the first stage, architects usually sketch the overall 3D shape and the panel layout on a physical or digital paper briefly. In the second stage, a digital 3D model is created using the sketching as the reference. The digital model needs to incorporate geometric requirements for its components, such as planarity of panels due to consideration of construction costs, which can make the modeling process more challenging. In this work, we present a novel sketch-based system to bridge the concept design and digital modeling of freeform roof-like shapes represented as planar quadrilateral (PQ) meshes. Our system allows the user to sketch the surface boundary and contour lines under axonometric projection and supports the sketching of occluded regions. In addition, the user can sketch feature lines to provide directional guidance to the PQ mesh layout. Given the 2D sketch input, we propose a deep neural network to infer in real-time the underlying surface shape along with a dense conjugate direction field, both of which are used to extract the final PQ mesh. To train and validate our network, we generate a large synthetic dataset that mimics architect sketching of freeform quadrilateral patches. The effectiveness and usability of our system are demonstrated with quantitative and qualitative evaluation as well as user studies.

[124]  arXiv:2201.09370 (cross-list from cs.CR) [pdf, other]
Title: Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models
Comments: Conditionally accepted to USENIX Security 2022. This is not the camera-ready version. arXiv admin note: substantial text overlap with arXiv:2012.03404
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model's predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks.

[125]  arXiv:2201.09377 (cross-list from cs.CL) [pdf, other]
Title: An Application of Pseudo-Log-Likelihoods to Natural Language Scoring
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Language models built using semi-supervised machine learning on large corpora of natural language have very quickly enveloped the fields of natural language generation and understanding. In this paper we apply a zero-shot approach independently developed by a number of researchers now gaining recognition as a significant alternative to fine-tuning for evaluation on common sense tasks. A language model with relatively few parameters and training steps compared to a more recent language model (T5) can outperform it on a recent large data set (TimeDial), while displaying robustness in its performance across a similar class of language tasks. Surprisingly, this result is achieved by using a hyperparameter-free zero-shot method with the smaller model, compared to fine-tuning to the larger model. We argue that robustness of the smaller model ought to be understood in terms of compositionality, in a sense that we draw from recent literature on a class of similar models. We identify a practical cost for our method and model: high GPU-time for natural language evaluation. The zero-shot measurement technique that produces remarkable stability, both for ALBERT and other BERT variants, is an application of pseudo-log-likelihoods to masked language models for the relative measurement of probability for substitution alternatives in forced choice language tasks such as the Winograd Schema Challenge, Winogrande, and others. One contribution of this paper is to bring together a number of similar, but independent strands of research. We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks, performing better than any published result in the literature, including fine-tuned efforts. We show a remarkable consistency of the model's performance under adversarial settings, which we argue is best explained by the model's compositionality of representations.

[126]  arXiv:2201.09390 (cross-list from cs.CV) [pdf, other]
Title: AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based sequence modeling stages together form an encoder. The prediction stage consists of a decoder and a content-based attention mechanism. The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The experimental results evaluate the performance of the HTR framework, further supported by an in-depth analysis of the error cases. Source code and pre-trained models are available at https://github.com/dmitrijsk/AttentionHTR.

[127]  arXiv:2201.09395 (cross-list from cs.CV) [pdf]
Title: MISeval: a Metric Library for Medical Image Segmentation Evaluation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Correct performance assessment is crucial for evaluating modern artificial intelligence algorithms in medicine like deep-learning based medical image segmentation models. However, there is no universal metric library in Python for standardized and reproducible evaluation. Thus, we propose our open-source publicly available Python package MISeval: a metric library for Medical Image Segmentation Evaluation. The implemented metrics can be intuitively used and easily integrated into any performance assessment pipeline. The package utilizes modern CI/CD strategies to ensure functionality and stability. MISeval is available from PyPI (miseval) and GitHub: https://github.com/frankkramer-lab/miseval.

[128]  arXiv:2201.09400 (cross-list from eess.IV) [pdf, other]
Title: Fast MRI Reconstruction: How Powerful Transformers Are?
Comments: 5 pages, 5 figures, EMBC 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Magnetic resonance imaging (MRI) is a widely used non-radiative and non-invasive method for clinical interrogation of organ structures and metabolism, with an inherently long scanning time. Methods by k-space undersampling and deep learning based reconstruction have been popularised to accelerate the scanning process. This work focuses on investigating how powerful transformers are for fast MRI by exploiting and comparing different novel network architectures. In particular, a generative adversarial network (GAN) based Swin transformer (ST-GAN) was introduced for the fast MRI reconstruction. To further preserve the edge and texture information, edge enhanced GAN based Swin transformer (EESGAN) and texture enhanced GAN based Swin transformer (TES-GAN) were also developed, where a dual-discriminator GAN structure was applied. We compared our proposed GAN based transformers, standalone Swin transformer and other convolutional neural networks based based GAN model in terms of the evaluation metrics PSNR, SSIM and FID. We showed that transformers work well for the MRI reconstruction from different undersampling conditions. The utilisation of GAN's adversarial structure improves the quality of images reconstructed when undersampled for 30% or higher.

[129]  arXiv:2201.09419 (cross-list from quant-ph) [pdf, ps, other]
Title: Automated machine learning for secure key rate in discrete-modulated continuous-variable quantum key distribution
Comments: 9 pages, 5 figures
Subjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Continuous-variable quantum key distribution (CV QKD) with discrete modulation has attracted increasing attention due to its experimental simplicity, lower-cost implementation and compatibility with classical optical communication. Correspondingly, some novel numerical methods have been proposed to analyze the security of these protocols against collective attacks, which promotes key rates over one hundred kilometers of fiber distance. However, numerical methods are limited by their calculation time and resource consumption, for which they cannot play more roles on mobile platforms in quantum networks. To improve this issue, a neural network model predicting key rates in nearly real time has been proposed previously. Here, we go further and show a neural network model combined with Bayesian optimization. This model automatically designs the best architecture of neural network computing key rates in real time. We demonstrate our model with two variants of CV QKD protocols with quaternary modulation. The results show high reliability with secure probability as high as $99.15\%-99.59\%$, considerable tightness and high efficiency with speedup of approximately $10^7$ in both cases. This inspiring model enables the real-time computation of unstructured quantum key distribution protocols' key rate more automatically and efficiently, which has met the growing needs of implementing QKD protocols on moving platforms.

[130]  arXiv:2201.09429 (cross-list from cs.SD) [pdf, other]
Title: End-to-End Neural Audio Coding for Real-Time Communications
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Deep-learning based methods have shown their advantages inaudio coding over traditional ones but limited attention hasbeen paid on real-time communications (RTC). This paperproposes the TFNet, an end-to-end neural audio codec withlow latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that seldom being investigated in audiocoding. An interleaved structure is proposed for temporalfiltering to capture both short-term and long-term temporaldependencies. Furthermore, with end-to-end optimization,the TFNet is jointly optimized with speech enhancement andpacket loss concealment, yielding a one-for-all network forthree tasks. Both subjective and objective results demonstratethe efficiency of the proposed TFNet.

[131]  arXiv:2201.09467 (cross-list from cs.MA) [pdf, other]
Title: CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces
Comments: To appear in the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022)
Subjects: Multiagent Systems (cs.MA); Machine Learning (cs.LG); Robotics (cs.RO)

Multi-agent path planning (MAPP) in continuous spaces is a challenging problem with significant practical importance. One promising approach is to first construct graphs approximating the spaces, called roadmaps, and then apply multi-agent pathfinding (MAPF) algorithms to derive a set of conflict-free paths. While conventional studies have utilized roadmap construction methods developed for single-agent planning, it remains largely unexplored how we can construct roadmaps that work effectively for multiple agents. To this end, we propose a novel concept of roadmaps called cooperative timed roadmaps (CTRMs). CTRMs enable each agent to focus on its important locations around potential solution paths in a way that considers the behavior of other agents to avoid inter-agent collisions (i.e., "cooperative"), while being augmented in the time direction to make it easy to derive a "timed" solution path. To construct CTRMs, we developed a machine-learning approach that learns a generative model from a collection of relevant problem instances and plausible solutions and then uses the learned model to sample the vertices of CTRMs for new, previously unseen problem instances. Our empirical evaluation revealed that the use of CTRMs significantly reduced the planning effort with acceptable overheads while maintaining a success rate and solution quality comparable to conventional roadmap construction approaches.

[132]  arXiv:2201.09486 (cross-list from cs.SD) [pdf, other]
Title: Bias in Automated Speaker Recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition technologies are deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including model building, implementation, and data generation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.

[133]  arXiv:2201.09508 (cross-list from q-bio.QM) [pdf, other]
Title: Multiple Similarity Drug-Target Interaction Prediction with Random Walks and Matrix Factorization
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)

The discovery of drug-target interactions (DTIs) is a very promising area of research with great potential. In general, the identification of reliable interactions among drugs and proteins can boost the development of effective pharmaceuticals. In this work, we leverage random walks and matrix factorization techniques towards DTI prediction. In particular, we take a multi-layered network perspective, where different layers correspond to different similarity metrics between drugs and targets. To fully take advantage of topology information captured in multiple views, we develop an optimization framework, called MDMF, for DTI prediction. The framework learns vector representations of drugs and targets that not only retain higher-order proximity across all hyper-layers and layer-specific local invariance, but also approximates the interactions with their inner product. Furthermore, we propose an ensemble method, called MDMF2A, which integrates two instantiations of the MDMF model that optimize surrogate losses of the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC), respectively. The empirical study on real-world DTI datasets shows that our method achieves significant improvement over current state-of-the-art approaches in four different settings. Moreover, the validation of highly ranked non-interacting pairs also demonstrates the potential of MDMF2A to discover novel DTIs.

[134]  arXiv:2201.09522 (cross-list from eess.SP) [pdf, other]
Title: Accelerated Intravascular Ultrasound Imaging using Deep Reinforcement Learning
Comments: 5 pages, 3 figures, conference
Journal-ref: ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Intravascular ultrasound (IVUS) offers a unique perspective in the treatment of vascular diseases by creating a sequence of ultrasound-slices acquired from within the vessel. However, unlike conventional hand-held ultrasound, the thin catheter only provides room for a small number of physical channels for signal transfer from a transducer-array at the tip. For continued improvement of image quality and frame rate, we present the use of deep reinforcement learning to deal with the current physical information bottleneck. Valuable inspiration has come from the field of magnetic resonance imaging (MRI), where learned acquisition schemes have brought significant acceleration in image acquisition at competing image quality. To efficiently accelerate IVUS imaging, we propose a framework that utilizes deep reinforcement learning for an optimal adaptive acquisition policy on a per-frame basis enabled by actor-critic methods and Gumbel top-$K$ sampling.

[135]  arXiv:2201.09538 (cross-list from cs.CR) [pdf, other]
Title: Backdoor Defense with Machine Unlearning
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASE mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASE leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed gradient ascent based machine unlearning method. Compared with the previous machine unlearning solutions, the proposed approach gets rid of the reliance on the full access to training data for retraining and shows higher effectiveness on backdoor erasing than existing fine-tuning or pruning methods. Moreover, experiments show that BAERASE can averagely lower the attack success rates of three kinds of state-of-the-art backdoor attacks by 99\% on four benchmark datasets.

[136]  arXiv:2201.09541 (cross-list from physics.flu-dyn) [pdf, other]
Title: Image features of a splashing drop on a solid surface extracted using a feedforward neural network
Comments: 19 pages, 17 figures. Source code is available on GitHub: this https URL
Journal-ref: Phys. Fluids 34, 013317 (2022)
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)

This article reports nonintuitive characteristic of a splashing drop on a solid surface discovered through extracting image features using a feedforward neural network (FNN). Ethanol of area-equivalent radius about 1.29 mm was dropped from impact heights ranging from 4 cm to 60 cm (splashing threshold 20 cm) and impacted on a hydrophilic surface. The images captured when half of the drop impacted the surface were labeled according to their outcome, splashing or nonsplashing, and were used to train an FNN. A classification accuracy higher than 96% was achieved. To extract the image features identified by the FNN for classification, the weight matrix of the trained FNN for identifying splashing drops was visualized. Remarkably, the visualization showed that the trained FNN identified the contour height of the main body of the impacting drop as an important characteristic differentiating between splashing and nonsplashing drops, which has not been reported in previous studies. This feature was found throughout the impact, even when one and three-quarters of the drop impacted the surface. To confirm the importance of this image feature, the FNN was retrained to classify using only the main body without checking for the presence of ejected secondary droplets. The accuracy was still higher than 82%, confirming that the contour height is an important feature distinguishing splashing from nonsplashing drops. Several aspects of drop impact are analyzed and discussed with the aim of identifying the possible mechanism underlying the difference in contour height between splashing and nonsplashing drops.

[137]  arXiv:2201.09592 (cross-list from cs.SD) [pdf, other]
Title: Unsupervised Audio Source Separation Using Differentiable Parametric Source Models
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely costly to obtain for musical mixtures. This raises a need for unsupervised methods. We propose a novel unsupervised model-based deep learning approach to musical source separation. Each source is modelled with a differentiable parametric source-filter model. A neural network is trained to reconstruct the observed mixture as a sum of the sources by estimating the source models' parameters given their fundamental frequencies. At test time, soft masks are obtained from the synthesized source signals. The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods based on nonnegative matrix factorization and a supervised deep learning baseline. Integrating domain knowledge in the form of source models into a data-driven method leads to high data efficiency: the proposed approach achieves good separation quality even when trained on less than three minutes of audio. This work makes powerful deep learning based separation usable in scenarios where training data with ground truth is expensive or nonexistent.

[138]  arXiv:2201.09633 (cross-list from cs.CV) [pdf, other]
Title: Paired Image to Image Translation for Strikethrough Removal From Handwritten Words
Comments: under review at DAS2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Transcribing struck-through, handwritten words, for example for the purpose of genetic criticism, can pose a challenge to both humans and machines, due to the obstructive properties of the superimposed strokes. This paper investigates the use of paired image to image translation approaches to remove strikethrough strokes from handwritten words. Four different neural network architectures are examined, ranging from a few simple convolutional layers to deeper ones, employing Dense blocks. Experimental results, obtained from one synthetic and one genuine paired strikethrough dataset, confirm that the proposed paired models outperform the CycleGAN-based state of the art, while using less than a sixth of the trainable parameters.

[139]  arXiv:2201.09647 (cross-list from q-bio.BM) [pdf, other]
Title: AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor
Comments: 9 pages, 5 figures
Subjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Molecular Networks (q-bio.MN)

The AlphaFold computer program predicted protein structures for the whole human genome, which has been considered as a remarkable breakthrough both in artificial intelligence (AI) application and structural biology. Despite the varying confidence level, these predicted structures still could significantly contribute to the structure-based drug design of novel targets, especially the ones with no or limited structural information. In this work, we successfully applied AlphaFold in our end-to-end AI-powered drug discovery engines constituted of a biocomputational platform PandaOmics and a generative chemistry platform Chemistry42, to identify a first-in-class hit molecule of a novel target without an experimental structure starting from target selection towards hit identification in a cost- and time-efficient manner. PandaOmics provided the targets of interest and Chemistry42 generated the molecules based on the AlphaFold predicted structure, and the selected molecules were synthesized and tested in biological assays. Through this approach, we identified a small molecule hit compound for CDK20 with a Kd value of 8.9 +/- 1.6 uM (n = 4) within 30 days from target selection and after only synthesizing 7 compounds. To the best of our knowledge, this is the first reported small molecule targeting CDK20 and more importantly, this work is the first demonstration of AlphaFold application in the hit identification process in early drug discovery.

[140]  arXiv:2201.09689 (cross-list from cs.CV) [pdf, other]
Title: Which Style Makes Me Attractive? Interpretable Control Discovery and Counterfactual Explanation on StyleGAN
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

The semantically disentangled latent subspace in GAN provides rich interpretable controls in image generation. This paper includes two contributions on semantic latent subspace analysis in the scenario of face generation using StyleGAN2. First, we propose a novel approach to disentangle latent subspace semantics by exploiting existing face analysis models, e.g., face parsers and face landmark detectors. These models provide the flexibility to construct various criterions with very concrete and interpretable semantic meanings (e.g., change face shape or change skin color) to restrict latent subspace disentanglement. Rich latent space controls unknown previously can be discovered using the constructed criterions. Second, we propose a new perspective to explain the behavior of a CNN classifier by generating counterfactuals in the interpretable latent subspaces we discovered. This explanation helps reveal whether the classifier learns semantics as intended. Experiments on various disentanglement criterions demonstrate the effectiveness of our approach. We believe this approach contributes to both areas of image manipulation and counterfactual explainability of CNNs. The code is available at \url{https://github.com/prclibo/ice}.

[141]  arXiv:2201.09693 (cross-list from eess.IV) [pdf, other]
Title: Shape-consistent Generative Adversarial Networks for multi-modal Medical segmentation maps
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Image translation across domains for unpaired datasets has gained interest and great improvement lately. In medical imaging, there are multiple imaging modalities, with very different characteristics. Our goal is to use cross-modality adaptation between CT and MRI whole cardiac scans for semantic segmentation. We present a segmentation network using synthesised cardiac volumes for extremely limited datasets. Our solution is based on a 3D cross-modality generative adversarial network to share information between modalities and generate synthesized data using unpaired datasets. Our network utilizes semantic segmentation to improve generator shape consistency, thus creating more realistic synthesised volumes to be used when re-training the segmentation network. We show that improved segmentation can be achieved on small datasets when using spatial augmentations to improve a generative adversarial network. These augmentations improve the generator capabilities, thus enhancing the performance of the Segmentor. Using only 16 CT and 16 MRI cardiovascular volumes, improved results are shown over other segmentation methods while using the suggested architecture.

[142]  arXiv:2201.09709 (cross-list from cs.SD) [pdf, other]
Title: Optimizing Tandem Speaker Verification and Anti-Spoofing Systems
Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing. Published version available at: this https URL
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 477-488, 2022
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using different metrics and data, which does not optimize their combined performance. In this work, we propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. The results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the t-DCF in the ASVSpoof19 dataset in a constrained setting.

[143]  arXiv:2201.09754 (cross-list from cs.NE) [pdf, other]
Title: Deep Reinforcement Learning with Spiking Q-learning
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combing SNNs and deep reinforcement learning (RL). There are only a few existing SNN-based RL methods at present. Most of them either lack generalization ability or employ Artificial Neural Networks (ANNs) to estimate value function in training. The former needs to tune numerous hyper-parameters for each scenario, and the latter limits the application of different types of RL algorithm and ignores the large energy consumption in training. To develop a robust spike-based RL method, we draw inspiration from non-spiking interneurons found in insects and propose the deep spiking Q-network (DSQN), using the membrane voltage of non-spiking neurons as the representation of Q-value, which can directly learn robust policies from high-dimensional sensory inputs using end-to-end RL. Experiments conducted on 17 Atari games demonstrate the effectiveness of DSQN by outperforming the ANN-based deep Q-network (DQN) in most games. Moreover, the experimental results show superior learning stability and robustness to adversarial attacks of DSQN.

[144]  arXiv:2201.09759 (cross-list from cs.NE) [pdf, other]
Title: Exploration of Hyperdimensional Computing Strategies for Enhanced Learning on Epileptic Seizure Detection
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Signal Processing (eess.SP)

Wearable and unobtrusive monitoring and prediction of epileptic seizures has the potential to significantly increase the life quality of patients, but is still an unreached goal due to challenges of real-time detection and wearable devices design. Hyperdimensional (HD) computing has evolved in recent years as a new promising machine learning approach, especially when talking about wearable applications. But in the case of epilepsy detection, standard HD computing is not performing at the level of other state-of-the-art algorithms. This could be due to the inherent complexity of the seizures and their signatures in different biosignals, such as the electroencephalogram (EEG), the highly personalized nature, and the disbalance of seizure and non-seizure instances. In the literature, different strategies for improved learning of HD computing have been proposed, such as iterative (multi-pass) learning, multi-centroid learning and learning with sample weight ("OnlineHD"). Yet, most of them have not been tested on the challenging task of epileptic seizure detection, and it stays unclear whether they can increase the HD computing performance to the level of the current state-of-the-art algorithms, such as random forests. Thus, in this paper, we implement different learning strategies and assess their performance on an individual basis, or in combination, regarding detection performance and memory and computational requirements. Results show that the best-performing algorithm, which is a combination of multi-centroid and multi-pass, can indeed reach the performance of the random forest model on a highly unbalanced dataset imitating a real-life epileptic seizure detection application.

[145]  arXiv:2201.09787 (cross-list from cs.HC) [pdf]
Title: Using Computational Grounded Theory to Understand Tutors' Experiences in the Gig Economy
Comments: 10 pages, Workshop on Natural Language Processing for Digital Humanities, 18th International Conference on Natural Language Processing
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Machine Learning (cs.LG)

The introduction of online marketplace platforms has led to the advent of new forms of flexible, on-demand (or 'gig') work. Yet, most prior research concerning the experience of gig workers examines delivery or crowdsourcing platforms, while the experience of the large numbers of workers who undertake educational labour in the form of tutoring gigs remains understudied. To address this, we use a computational grounded theory approach to analyse tutors' discussions on Reddit. This approach consists of three phases including data exploration, modelling and human-centred interpretation. We use both validation and human evaluation to increase the trustworthiness and reliability of the computational methods. This paper is a work in progress and reports on the first of the three phases of this approach.

[146]  arXiv:2201.09790 (cross-list from q-fin.ST) [pdf, other]
Title: Linear Laws of Markov Chains with an Application for Anomaly Detection in Bitcoin Prices
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG)

The goals of this paper are twofold: (1) to present a new method that is able to find linear laws governing the time evolution of Markov chains and (2) to apply this method for anomaly detection in Bitcoin prices. To accomplish these goals, first, the linear laws of Markov chains are derived by using the time embedding of their (categorical) autocorrelation function. Then, a binary series is generated from the first difference of Bitcoin exchange rate (against the United States Dollar). Finally, the minimum number of parameters describing the linear laws of this series is identified through stepped time windows. Based on the results, linear laws typically became more complex (containing an additional third parameter that indicates hidden Markov property) in two periods: before the crash of cryptocurrency markets inducted by the COVID-19 pandemic (12 March 2020), and before the record-breaking surge in the price of Bitcoin (Q4 2020 - Q1 2021). In addition, the locally high values of this third parameter are often related to short-term price peaks, which suggests price manipulation.

[147]  arXiv:2201.09792 (cross-list from cs.CV) [pdf, other]
Title: Patches Are All You Need?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Although convolutional networks have been the dominant architecture for vision tasks for many years, recent experiments have shown that Transformer-based models, most notably the Vision Transformer (ViT), may exceed their performance in some settings. However, due to the quadratic runtime of the self-attention layers in Transformers, ViTs require the use of patch embeddings, which group together small regions of the image into single input features, in order to be applied to larger image sizes. This raises a question: Is the performance of ViTs due to the inherently-more-powerful Transformer architecture, or is it at least partly due to using patches as the input representation? In this paper, we present some evidence for the latter: specifically, we propose the ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network. In contrast, however, the ConvMixer uses only standard convolutions to achieve the mixing steps. Despite its simplicity, we show that the ConvMixer outperforms the ViT, MLP-Mixer, and some of their variants for similar parameter counts and data set sizes, in addition to outperforming classical vision models such as the ResNet. Our code is available at https://github.com/locuslab/convmixer.

[148]  arXiv:2201.09815 (cross-list from cs.IT) [pdf, other]
Title: Analytic Mutual Information in Bayesian Neural Networks
Authors: Jae Oh Woo
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

Bayesian neural networks have successfully designed and optimized a robust neural network model in many application problems, including uncertainty quantification. However, with its recent success, information-theoretic understanding about the Bayesian neural network is still at an early stage. Mutual information is an example of an uncertainty measure in a Bayesian neural network to quantify epistemic uncertainty. Still, no analytic formula is known to describe it, one of the fundamental information measures to understand the Bayesian deep learning framework. In this paper, with the Dirichlet distribution assumption in its intermediate encoded message, we derive the analytical formula of the mutual information between model parameters and the predictive output by leveraging the notion of the point process entropy. Then, as an application, we discuss the estimation of the Dirichlet parameters and show its practical application in the active learning uncertainty measures.

[149]  arXiv:2201.09863 (cross-list from cs.RO) [pdf, other]
Title: Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots
Comments: Accepted to NeurIPS 2021, Website with documentation is available at this https URL
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Both the design and control of a robot play equally important roles in its task performance. However, while optimal control is well studied in the machine learning and robotics community, less attention is placed on finding the optimal robot design. This is mainly because co-optimizing design and control in robotics is characterized as a challenging problem, and more importantly, a comprehensive evaluation benchmark for co-optimization does not exist. In this paper, we propose Evolution Gym, the first large-scale benchmark for co-optimizing the design and control of soft robots. In our benchmark, each robot is composed of different types of voxels (e.g., soft, rigid, actuators), resulting in a modular and expressive robot design space. Our benchmark environments span a wide range of tasks, including locomotion on various types of terrains and manipulation. Furthermore, we develop several robot co-evolution algorithms by combining state-of-the-art design optimization methods and deep reinforcement learning techniques. Evaluating the algorithms on our benchmark platform, we observe robots exhibiting increasingly complex behaviors as evolution progresses, with the best evolved designs solving many of our proposed tasks. Additionally, even though robot designs are evolved autonomously from scratch without prior knowledge, they often grow to resemble existing natural creatures while outperforming hand-designed robots. Nevertheless, all tested algorithms fail to find robots that succeed in our hardest environments. This suggests that more advanced algorithms are required to explore the high-dimensional design space and evolve increasingly intelligent robots -- an area of research in which we hope Evolution Gym will accelerate progress. Our website with code, environments, documentation, and tutorials is available at this http URL

Replacements for Tue, 25 Jan 22

[150]  arXiv:1912.08917 (replaced) [pdf, ps, other]
Title: Logarithmic Regret in Multisecretary and Online Linear Programming Problems with Continuous Valuations
Authors: Robert L. Bray
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
[151]  arXiv:2004.02131 (replaced) [pdf, other]
Title: Learning Deep Graph Representations via Convolutional Neural Networks
Comments: arXiv admin note: text overlap with arXiv:2002.09846
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[152]  arXiv:2005.09030 (replaced) [pdf]
Title: Effective Learning of a GMRF Mixture Model
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[153]  arXiv:2006.03179 (replaced) [pdf, other]
Title: Discovering Parametric Activation Functions
Comments: Published in Neural Networks. 34 pages, 10 figures, 11 tables
Journal-ref: Neural Networks, Volume 148, 2022, Pages 48-65, ISSN 0893-6080
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[154]  arXiv:2008.05823 (replaced) [pdf, other]
Title: Step-Ahead Error Feedback for Distributed Training with Compressed Gradient
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[155]  arXiv:2009.05805 (replaced) [pdf, other]
Title: Multi-way Spectral Clustering of Augmented Multi-view Data through Deep Collective Matrix Tri-factorization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[156]  arXiv:2010.02709 (replaced) [pdf, other]
Title: An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence
Comments: NeurIPS 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[157]  arXiv:2012.04407 (replaced) [pdf, other]
Title: Enhanced spatio-temporal electric load forecasts using less data with active deep learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[158]  arXiv:2101.11560 (replaced) [pdf, other]
Title: Wisdom of the Contexts: Active Ensemble Learning for Contextual Anomaly Detection
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[159]  arXiv:2102.04282 (replaced) [pdf, other]
Title: Communication-efficient k-Means for Edge-based Machine Learning
Subjects: Machine Learning (cs.LG)
[160]  arXiv:2102.06648 (replaced) [pdf, other]
Title: A Critical Look at the Consistency of Causal Estimation With Deep Latent Variable Models
Comments: 10 pages for main text + 19 pages for references and supplementary. 18 Figures
Journal-ref: Advances in Neural Information Processing Systems 34 (2021)
Subjects: Machine Learning (cs.LG)
[161]  arXiv:2102.07650 (replaced) [pdf, other]
Title: Learning Student-Friendly Teacher Networks for Knowledge Distillation
Comments: Accepted by NeurIPS 2021
Subjects: Machine Learning (cs.LG)
[162]  arXiv:2103.02687 (replaced) [pdf, other]
Title: Lazy FSCA for Unsupervised Variable Selection
Comments: Submitted to Engineering Applications of Artificial Intelligence. This second version is more focused on the lazy implementation of FSCA as visible from the new title; further experiments are also added
Subjects: Machine Learning (cs.LG)
[163]  arXiv:2103.04730 (replaced) [pdf, other]
Title: Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[164]  arXiv:2103.12827 (replaced) [pdf, other]
Title: Fisher Task Distance and Its Applications in Neural Architecture Search and Transfer Learning
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
[165]  arXiv:2103.16629 (replaced) [pdf, other]
Title: Learning Lipschitz Feedback Policies from Expert Demonstrations: Closed-Loop Guarantees, Generalization and Robustness
Comments: Submitted to the IEEE Open Journal of Control Systems
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
[166]  arXiv:2104.01231 (replaced) [pdf, other]
Title: Diverse Gaussian Noise Consistency Regularization for Robustness and Uncertainty Calibration under Noise Domain Shifts
Comments: Accepted to ICML 2021 Uncertainty & Robustness in Deep Learning Workshop; under review
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[167]  arXiv:2104.01750 (replaced) [pdf, other]
Title: Optimal Sampling Gaps for Adaptive Submodular Maximization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[168]  arXiv:2104.03413 (replaced) [pdf, other]
Title: Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[169]  arXiv:2105.11730 (replaced) [pdf, other]
Title: Exploring Autoencoder-based Error-bounded Compression for Scientific Data
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
[170]  arXiv:2105.14203 (replaced) [pdf, other]
Title: Understanding Instance-based Interpretability of Variational Auto-Encoders
Comments: NeurIPS 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[171]  arXiv:2106.00221 (replaced) [pdf, other]
Title: Concurrent Adversarial Learning for Large-Batch Training
Comments: Accepted to ICLR 2022
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[172]  arXiv:2106.01805 (replaced) [pdf, other]
Title: Partial Graph Reasoning for Neural Network Regularization
Comments: Technical report
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[173]  arXiv:2106.02104 (replaced) [pdf, other]
Title: Semi-Empirical Objective Functions for MCMC Proposal Optimization
Comments: 39 pages, 19 tables, 22 figures
Subjects: Machine Learning (cs.LG)
[174]  arXiv:2106.03787 (replaced) [pdf, other]
Title: Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
Comments: AAMAS 2022
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[175]  arXiv:2106.05165 (replaced) [pdf, other]
Title: A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[176]  arXiv:2106.06935 (replaced) [pdf, other]
Title: Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction
Comments: NeurIPS 2021
Subjects: Machine Learning (cs.LG)
[177]  arXiv:2106.08229 (replaced) [pdf, other]
Title: MICo: Improved representations via sampling-based state similarity for Markov decision processes
Comments: Published at NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[178]  arXiv:2106.08961 (replaced) [pdf, other]
Title: A Direct Slip Ratio Estimation Method based on an Intelligent Tire and Machine Learning
Comments: 12 pages, 25 figures, 2 tables
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[179]  arXiv:2106.10158 (replaced) [pdf, other]
Title: Learning to Complete Code with Sketches
Comments: Published in ICLR 2022
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
[180]  arXiv:2106.10360 (replaced) [pdf, other]
Title: Prediction-Free, Real-Time Flexible Control of Tidal Lagoons through Proximal Policy Optimisation: A Case Study for the Swansea Lagoon
Authors: Túlio Marcondes Moreira (1), Jackson Geraldo de Faria Jr (1), Pedro O.S. Vaz de Melo (1), Luiz Chaimowicz (1), Gilberto Medeiros-Ribeiro (1) ((1) Universidade Federal de Minas Gerais, Belo Horizonte, Brazil)
Comments: 35 pages, 13 figures and 11 tables
Subjects: Machine Learning (cs.LG)
[181]  arXiv:2106.13948 (replaced) [pdf, other]
Title: Core Challenges in Embodied Vision-Language Planning
Comments: 42 pages, plus references
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[182]  arXiv:2107.12183 (replaced) [pdf, ps, other]
Title: EGGS: Eigen-Gap Guided Search For Automated Spectral Clustering
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[183]  arXiv:2108.02416 (replaced) [pdf, other]
Title: Aspis: Robust Detection for Distributed Learning
Comments: 17 pages, 23 figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
[184]  arXiv:2108.03236 (replaced) [pdf, other]
Title: Efficient Representation for Electric Vehicle Charging Station Operations using Reinforcement Learning
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[185]  arXiv:2108.04585 (replaced) [pdf, other]
Title: Recurrent Neural Network-based Internal Model Control design for stable nonlinear systems
Comments: This work has been submitted to Elsevier European Journal of Control for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[186]  arXiv:2108.07406 (replaced) [pdf, other]
Title: From the Greene--Wu Convolution to Gradient Estimation over Riemannian Manifolds
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
[187]  arXiv:2108.08218 (replaced) [pdf, other]
Title: Out-of-Distribution Detection Using Outlier Detection Methods
Subjects: Machine Learning (cs.LG)
[188]  arXiv:2108.09896 (replaced) [pdf, other]
Title: Generative and Contrastive Self-Supervised Learning for Graph Anomaly Detection
Comments: 14 pages, 5 figures, 4 tables. Published in IEEE Transactions on Knowledge and Data Engineering (TKDE)
Journal-ref: IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021
Subjects: Machine Learning (cs.LG)
[189]  arXiv:2108.10155 (replaced) [pdf, other]
Title: Construction Cost Index Forecasting: A Multi-feature Fusion Approach
Subjects: Machine Learning (cs.LG)
[190]  arXiv:2109.10469 (replaced) [pdf, other]
Title: Differentiable Scaffolding Tree for Molecular Optimization
Subjects: Machine Learning (cs.LG)
[191]  arXiv:2109.11851 (replaced) [pdf, other]
Title: Approximate Latent Force Model Inference
Comments: Accepted with oral presentation at the Science-Guided AI Symposium at AAAI 2021
Subjects: Machine Learning (cs.LG)
[192]  arXiv:2109.13419 (replaced) [pdf, other]
Title: The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
Comments: 19 pages, 4 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
[193]  arXiv:2109.13542 (replaced) [pdf, ps, other]
Title: Convergence of Deep Convolutional Neural Networks
Comments: arXiv admin note: text overlap with arXiv:2107.12530
Subjects: Machine Learning (cs.LG)
[194]  arXiv:2110.00199 (replaced) [pdf]
Title: Perturbated Gradients Updating within Unit Space for Deep Learning
Subjects: Machine Learning (cs.LG)
[195]  arXiv:2110.02399 (replaced) [pdf, other]
Title: Task Affinity with Maximum Bipartite Matching in Few-Shot Learning
Comments: Accepted as a conference paper at ICLR 2022
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[196]  arXiv:2110.02880 (replaced) [pdf, other]
Title: Space-Time Graph Neural Networks
Subjects: Machine Learning (cs.LG)
[197]  arXiv:2110.03735 (replaced) [pdf, other]
Title: Adversarial Unlearning of Backdoors via Implicit Hypergradient
Comments: 9 pages main text, 3 pages references, 12 pages appendix, 5 figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
[198]  arXiv:2110.03825 (replaced) [pdf, other]
Title: Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks
Comments: NeurIPS 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[199]  arXiv:2110.08483 (replaced) [pdf, other]
Title: Simplest Streaming Trees
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
[200]  arXiv:2110.09506 (replaced) [pdf, other]
Title: MEMO: Test Time Robustness via Adaptation and Augmentation
Comments: code: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[201]  arXiv:2110.11088 (replaced) [pdf, other]
Title: RoMA: a Method for Neural Network Robustness Measurement and Assessment
Authors: Natan Levy, Guy Katz
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[202]  arXiv:2110.11773 (replaced) [pdf, other]
Title: Sinkformers: Transformers with Doubly Stochastic Attention
Comments: Accepted at AISTATS
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[203]  arXiv:2110.14001 (replaced) [pdf, other]
Title: SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
Comments: Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[204]  arXiv:2110.14416 (replaced) [pdf, other]
Title: Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs
Comments: 21 pages, 5 figures
Subjects: Machine Learning (cs.LG)
[205]  arXiv:2110.14578 (replaced) [pdf, other]
Title: Spatio-Temporal Federated Learning for Massive Wireless Edge Networks
Comments: 3 figures, conference
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[206]  arXiv:2110.15288 (replaced) [pdf, other]
Title: Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction
Comments: Published at 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia. 31 Pages, 14 figures
Subjects: Machine Learning (cs.LG)
[207]  arXiv:2111.03941 (replaced) [pdf, other]
Title: Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods
Comments: Accepted to NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[208]  arXiv:2111.08550 (replaced) [pdf, other]
Title: On Effective Scheduling of Model-based Reinforcement Learning
Comments: Accepted at NeurIPS2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[209]  arXiv:2111.08900 (replaced) [pdf, other]
Title: A GNN-RNN Approach for Harnessing Geospatial and Temporal Information: Application to Crop Yield Prediction
Comments: Fixed typo. 14 pages, 9 figures, accepted at AAAI-22 Social Impact Track
Subjects: Machine Learning (cs.LG)
[210]  arXiv:2111.09679 (replaced) [pdf, other]
Title: Enhanced Membership Inference Attacks against Machine Learning Models
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[211]  arXiv:2112.02393 (replaced) [pdf, ps, other]
Title: Optimization-Based Separations for Neural Networks
Subjects: Machine Learning (cs.LG)
[212]  arXiv:2112.05683 (replaced) [pdf, other]
Title: Boosting Active Learning via Improving Test Performance
Comments: 13 pages
Journal-ref: AAAI 2022
Subjects: Machine Learning (cs.LG)
[213]  arXiv:2112.07823 (replaced) [pdf, other]
Title: Bayesian Graph Contrastive Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[214]  arXiv:2112.10389 (replaced) [pdf, other]
Title: Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks
Comments: 16 pages, 14 figures
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Optimization and Control (math.OC)
[215]  arXiv:2112.10638 (replaced) [pdf, ps, other]
Title: Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models
Comments: To appear in Software Impacts
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Mathematical Software (cs.MS)
[216]  arXiv:2201.04929 (replaced) [pdf]
Title: Improving VAE based molecular representations for compound property prediction
Subjects: Machine Learning (cs.LG)
[217]  arXiv:2201.06239 (replaced) [pdf, other]
Title: MT-GBM: A Multi-Task Gradient Boosting Machine with Shared Decision Trees
Subjects: Machine Learning (cs.LG)
[218]  arXiv:2201.07677 (replaced) [pdf, other]
Title: Tiny, always-on and fragile: Bias propagation through design choices in on-device machine learning workflows
Subjects: Machine Learning (cs.LG)
[219]  arXiv:2201.07986 (replaced) [pdf, other]
Title: Unsupervised Graph Poisoning Attack via Contrastive Loss Back-propagation
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[220]  arXiv:2201.08528 (replaced) [pdf, other]
Title: To SMOTE, or not to SMOTE?
Subjects: Machine Learning (cs.LG)
[221]  arXiv:1904.08484 (replaced) [pdf, other]
Title: Deep Localization of Mixed Image Tampering Techniques
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[222]  arXiv:1905.12155 (replaced) [pdf, other]
Title: The Supermarket Model with Known and Predicted Service Times
Comments: Revision, 16 pages, 22 figures. Accepted at IEEE TPDS
Subjects: Performance (cs.PF); Machine Learning (cs.LG)
[223]  arXiv:2004.08924 (replaced) [pdf, other]
Title: VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback
Subjects: Machine Learning (stat.ML); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
[224]  arXiv:2006.16409 (replaced) [pdf, other]
Title: A Bayesian regularization-backpropagation neural network model for peeling computations
Comments: 18 pages, 9 figures
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
[225]  arXiv:2008.02159 (replaced) [pdf, other]
Title: Learning from Sparse Demonstrations
Comments: A real world drone navigation experiment using the proposed method is at the link: this https URL
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
[226]  arXiv:2010.04389 (replaced) [pdf, other]
Title: A Survey of Knowledge-Enhanced Text Generation
Comments: Accepted by ACM Computing Survey (CSUR) in Jan 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[227]  arXiv:2012.08784 (replaced) [pdf, other]
Title: Tensor Completion by Multi-Rank via Unitary Transformation
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[228]  arXiv:2012.15477 (replaced) [pdf, other]
Title: Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis
Comments: NeurIPS 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[229]  arXiv:2102.02828 (replaced) [pdf, other]
Title: Scattering Networks on the Sphere for Scalable and Rotationally Equivariant Spherical CNNs
Comments: 18 pages, 6 figures, accepted by ICLR, code at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[230]  arXiv:2103.05102 (replaced) [pdf, other]
Title: Self-Supervised Multisensor Change Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[231]  arXiv:2103.11509 (replaced) [pdf, other]
Title: Scatter Correction in X-ray CT by Physics-Inspired Deep Learning
Subjects: Medical Physics (physics.med-ph); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[232]  arXiv:2105.00177 (replaced) [pdf, other]
Title: Deep Spectrum Cartography: Completing Radio Map Tensors Using Learned Neural Models
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[233]  arXiv:2105.10239 (replaced) [pdf, other]
Title: AC-CovidNet: Attention Guided Contrastive CNN for Recognition of Covid-19 in Chest X-Ray Images
Comments: Accepted in Sixth IAPR International Conference on Computer Vision & Image Processing (CVIP2021)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[234]  arXiv:2106.05229 (replaced) [pdf, other]
Title: Speech Recovery for Real-World Self-powered Intermittent Devices
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[235]  arXiv:2106.10544 (replaced) [pdf, other]
Title: Learning Space Partitions for Path Planning
Journal-ref: NeurIPS 2021
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
[236]  arXiv:2107.04642 (replaced) [pdf]
Title: Escaping the Impossibility of Fairness: From Formal to Substantive Algorithmic Fairness
Authors: Ben Green
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
[237]  arXiv:2107.06629 (replaced) [pdf, other]
Title: Model-free Reinforcement Learning for Robust Locomotion using Demonstrations from Trajectory Optimization
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[238]  arXiv:2107.10043 (replaced) [pdf, other]
Title: KalmanNet: Neural Network Aided Kalman Filtering for Partially Known Dynamics
Comments: TSP Rev1
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Machine Learning (stat.ML)
[239]  arXiv:2108.02191 (replaced) [pdf, other]
Title: Random Offset Block Embedding Array (ROBE) for CriteoTB Benchmark MLPerf DLRM Model : 1000$\times$ Compression and 3.1$\times$ Faster Inference
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[240]  arXiv:2108.02574 (replaced) [pdf, other]
Title: Optimal Transport for Unsupervised Denoising Learning
Comments: Preprint, under review (40 pages, 33 figures)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[241]  arXiv:2108.11845 (replaced) [pdf, ps, other]
Title: Consistent Relative Confidence and Label-Free Model Selection for Convolutional Neural Networks
Authors: Bin Liu
Comments: 8 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[242]  arXiv:2109.06379 (replaced) [pdf, other]
Title: Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Comments: EMNLP 2021, Code available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[243]  arXiv:2109.09738 (replaced) [pdf, other]
Title: An Optimal Control Framework for Joint-channel Parallel MRI Reconstruction without Coil Sensitivities
Comments: 13 pages
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Optimization and Control (math.OC)
[244]  arXiv:2109.11196 (replaced) [pdf, other]
Title: Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel Manifold
Comments: 24 pages, 18 figures. Accepted to the 36th AAAI Conference on Artificial Intelligence (AAAI 2022)
Subjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG)
[245]  arXiv:2109.11544 (replaced) [pdf, other]
Title: Lifelong 3D Object Recognition and Grasp Synthesis Using Dual Memory Recurrent Self-Organization Networks
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[246]  arXiv:2109.12825 (replaced) [pdf, other]
Title: Probability Distribution on Full Rooted Trees
Subjects: Machine Learning (stat.ML); Discrete Mathematics (cs.DM); Machine Learning (cs.LG)
[247]  arXiv:2109.13841 (replaced) [pdf, other]
Title: Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation
Comments: Accepted to IEEE RA-Letter, 2022
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[248]  arXiv:2110.00745 (replaced) [pdf, other]
Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression
Comments: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[249]  arXiv:2110.01901 (replaced) [pdf, ps, other]
Title: Inferring Hidden Structures in Random Graphs
Authors: Wasim Huleihel
Comments: 43 pages
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)
[250]  arXiv:2110.03183 (replaced) [pdf, other]
Title: Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....
Authors: Prateek Verma
Comments: more grammar corrections; better figure captions; more content edits; SPL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[251]  arXiv:2110.09658 (replaced) [pdf, other]
Title: System Norm Regularization Methods for Koopman Operator Approximation
Comments: 26 pages, 14 figures, submitted to SIAM Journal on Applied Dynamical Systems (SIADS)
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Dynamical Systems (math.DS)
[252]  arXiv:2111.05790 (replaced) [pdf, other]
Title: Early Myocardial Infarction Detection over Multi-view Echocardiography
Subjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
[253]  arXiv:2111.11306 (replaced) [pdf, other]
Title: Learning PSD-valued functions using kernel sums-of-squares
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[254]  arXiv:2111.14000 (replaced) [pdf, other]
Title: Factor-augmented tree ensembles
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Econometrics (econ.EM)
[255]  arXiv:2111.15124 (replaced) [pdf, other]
Title: In-Bed Human Pose Estimation from Unseen and Privacy-Preserving Image Domains
Comments: In the IEEE International Symposium on Biomedical Imaging (ISBI)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[256]  arXiv:2112.14582 (replaced) [pdf, other]
Title: Polyak-Ruppert Averaged Q-Leaning is Statistically Efficient
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[257]  arXiv:2112.14738 (replaced) [pdf, other]
Title: Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems
Comments: Minor typographical updates
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[258]  arXiv:2201.00318 (replaced) [pdf, other]
Title: On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations
Comments: Accepted at Computing Conference 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[259]  arXiv:2201.01203 (replaced) [pdf, other]
Title: Quantifying Uncertainty in Deep Learning Approaches to Radio Galaxy Classification
Comments: accepted by MNRAS
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Astrophysics of Galaxies (astro-ph.GA); Machine Learning (cs.LG)
[260]  arXiv:2201.01669 (replaced) [pdf, other]
Title: Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[261]  arXiv:2201.04965 (replaced) [pdf, other]
Title: Stock Movement Prediction Based on Bi-typed Hybrid-relational Market Knowledge Graph via Dual Attention Networks
Comments: 22 pages, 5 figures
Subjects: Statistical Finance (q-fin.ST); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[262]  arXiv:2201.05213 (replaced) [pdf, other]
Title: Parallel Neural Local Lossless Compression
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[263]  arXiv:2201.06169 (replaced) [pdf, ps, other]
Title: On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
[264]  arXiv:2201.07972 (replaced) [pdf, other]
Title: Corrigendum and addendum to: How Populist are Parties? Measuring Degrees of Populism in Party Manifestos Using Supervised Machine Learning
Comments: 6 pages, 4 figures
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG)
[265]  arXiv:2201.08111 (replaced) [pdf, other]
Title: Safety-Aware Multi-Agent Apprenticeship Learning
Authors: Junchen Zhao
Comments: 58 pages, 4 figures. Master's report. arXiv admin note: text overlap with arXiv:1710.07983 by other authors
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[ total of 265 entries: 1-265 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2201, contact, help  (Access key information)