We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 372 entries: 1-372 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Tue, 19 Oct 21

[1]  arXiv:2110.08252 [pdf, other]
Title: A Rate-Distortion Framework for Explaining Black-box Model Decisions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)

We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework's adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.

[2]  arXiv:2110.08253 [pdf, other]
Title: A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Genomics (q-bio.GN)

Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natural world. Unfortunately, the opacity of deep neural networks limit their role in scientific discovery, creating a new demand for models that are transparently interpretable. This article is a field guide to transparent model design. It provides a taxonomy of transparent model design concepts, a practical workflow for putting design concepts into practice, and a general template for reporting design choices. We hope this field guide will help researchers more effectively design transparently interpretable models, and thus enable them to use deep learning for scientific discovery.

[3]  arXiv:2110.08254 [pdf, other]
Title: Inconsistent Few-Shot Relation Classification via Cross-Attentional Prototype Networks with Contrastive Learning
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Standard few-shot relation classification (RC) is designed to learn a robust classifier with only few labeled data for each class. However, previous works rarely investigate the effects of a different number of classes (i.e., $N$-way) and number of labeled data per class (i.e., $K$-shot) during training vs. testing. In this work, we define a new task, \textit{inconsistent few-shot RC}, where the model needs to handle the inconsistency of $N$ and $K$ between training and testing. To address this new task, we propose Prototype Network-based cross-attention contrastive learning (ProtoCACL) to capture the rich mutual interactions between the support set and query set. Experimental results demonstrate that our ProtoCACL can outperform the state-of-the-art baseline model under both inconsistent $K$ and inconsistent $N$ settings, owing to its more robust and discriminate representations. Moreover, we identify that in the inconsistent few-shot learning setting, models can achieve better performance with \textit{less data} than the standard few-shot setting with carefully-selected $N$ and $K$. In the end of the paper, we provide further analyses and suggestions to systematically guide the selection of $N$ and $K$ under different scenarios.

[4]  arXiv:2110.08255 [pdf, other]
Title: Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting
Authors: Kiran Madhusudhanan (1), Johannes Burchert (1), Nghia Duong-Trung (2), Stefan Born (2), Lars Schmidt-Thieme (1) ((1) University of Hildesheim, (2) Technische Universität Berlin)
Subjects: Machine Learning (cs.LG)

Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, the current state of the art sparse Transformer architectures fail to couple down- and upsampling procedures to produce outputs in a similar resolution as the input. We propose the Yformer model, based on a novel Y-shaped encoder-decoder architecture that (1) uses direct connection from the downscaled encoder layer to the corresponding upsampled decoder layer in a U-Net inspired architecture, (2) Combines the downscaling/upsampling with sparse attention to capture long-range effects, and (3) stabilizes the encoder-decoder stacks with the addition of an auxiliary reconstruction loss. Extensive experiments have been conducted with relevant baselines on four benchmark datasets, demonstrating an average improvement of 19.82, 18.41 percentage MSE and 13.62, 11.85 percentage MAE in comparison to the current state of the art for the univariate and the multivariate settings respectively.

[5]  arXiv:2110.08256 [pdf, other]
Title: Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

The vulnerability of deep neural networks to adversarial examples has motivated an increasing number of defense strategies for promoting model robustness. However, the progress is usually hampered by insufficient robustness evaluations. As the de facto standard to evaluate adversarial robustness, adversarial attacks typically solve an optimization problem of crafting adversarial examples with an iterative process. In this work, we propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the optimizer in adversarial attacks parameterized by a recurrent neural network, which is trained over a class of data samples and defenses to produce effective update directions during adversarial example generation. Furthermore, we develop a model-agnostic training algorithm to improve the generalization ability of the learned optimizer when attacking unseen defenses. Our approach can be flexibly incorporated with various attacks and consistently improves the performance with little extra computational cost. Extensive experiments demonstrate the effectiveness of the learned attacks by MAMA compared to the state-of-the-art attacks on different defenses, leading to a more reliable evaluation of adversarial robustness.

[6]  arXiv:2110.08257 [pdf, other]
Title: C-AllOut: Catching & Calling Outliers by Type
Comments: 9+4 pages, 3 figures, 11 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Given an unlabeled dataset, wherein we have access only to pairwise similarities (or distances), how can we effectively (1) detect outliers, and (2) annotate/tag the outliers by type? Outlier detection has a large literature, yet we find a key gap in the field: to our knowledge, no existing work addresses the outlier annotation problem. Outliers are broadly classified into 3 types, representing distinct patterns that could be valuable to analysts: (a) global outliers are severe yet isolate cases that do not repeat, e.g., a data collection error; (b) local outliers diverge from their peers within a context, e.g., a particularly short basketball player; and (c) collective outliers are isolated micro-clusters that may indicate coalition or repetitions, e.g., frauds that exploit the same loophole. This paper presents C-AllOut: a novel and effective outlier detector that annotates outliers by type. It is parameter-free and scalable, besides working only with pairwise similarities (or distances) when it is needed. We show that C-AllOut achieves on par or significantly better performance than state-of-the-art detectors when spotting outliers regardless of their type. It is also highly effective in annotating outliers of particular types, a task that none of the baselines can perform.

[7]  arXiv:2110.08258 [pdf, other]
Title: Learning When and What to Ask: a Hierarchical Reinforcement Learning Framework
Comments: 15 pages, 3 figures, 4 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)

Reliable AI agents should be mindful of the limits of their knowledge and consult humans when sensing that they do not have sufficient knowledge to make sound decisions. We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans and what type of information would be helpful to request. Our framework extends partially-observed Markov decision processes (POMDPs) by allowing an agent to interact with an assistant to leverage their knowledge in accomplishing tasks. Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework: aided with an interaction policy learned by our method, a navigation policy achieves up to a 7x improvement in task success rate compared to performing tasks only by itself. The interaction policy is also efficient: on average, only a quarter of all actions taken during a task execution are requests for information. We analyze benefits and challenges of learning with a hierarchical policy structure and suggest directions for future work.

[8]  arXiv:2110.08259 [pdf, other]
Title: Training Neural Networks for Solving 1-D Optimal Piecewise Linear Approximation
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Recently, the interpretability of deep learning has attracted a lot of attention. A plethora of methods have attempted to explain neural networks by feature visualization, saliency maps, model distillation, and so on. However, it is hard for these methods to reveal the intrinsic properties of neural networks. In this work, we studied the 1-D optimal piecewise linear approximation (PWLA) problem, and associated it with a designed neural network, named lattice neural network (LNN). We asked four essential questions as following: (1) What are the characters of the optimal solution of the PWLA problem? (2) Can an LNN converge to the global optimum? (3) Can an LNN converge to the local optimum? (4) Can an LNN solve the PWLA problem? Our main contributions are that we propose the theorems to characterize the optimal solution of the PWLA problem and present the LNN method for solving it. We evaluated the proposed LNNs on approximation tasks, forged an empirical method to improve the performance of LNNs. The experiments verified that our LNN method is competitive with the start-of-the-art method.

[9]  arXiv:2110.08260 [pdf, other]
Title: Effective Certification of Monotone Deep Equilibrium Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Monotone Operator Equilibrium Models (monDEQs) represent a class of models combining the powerful deep equilibrium paradigm with convergence guarantees. Further, their inherent robustness to adversarial perturbations makes investigating their certifiability a promising research direction. Unfortunately, existing approaches are either imprecise or severely limited in scalability. In this work, we propose the first scalable and precise monDEQ verifier, based on two key ideas: (i) a novel convex relaxation enabling efficient inclusion checks, and (ii) non-trivial mathematical insights characterizing the fixpoint operations at the heart of monDEQs on sets rather than concrete inputs. An extensive evaluation of our verifier on the challenging $\ell_\infty$ perturbations demonstrates that it exceeds state-of-the-art performance in terms of speed (two orders of magnitude) and scalability (an order of magnitude) while yielding 25% higher certified accuracies on the same networks.

[10]  arXiv:2110.08263 [pdf, other]
Title: FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling
Comments: Accepted by NeurIPS 2021; 16 pages with appendix; code: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue, we propose Curriculum Pseudo Labeling (CPL), a curriculum learning approach to leverage unlabeled data according to the model's learning status. The core of CPL is to flexibly adjust thresholds for different classes at each time step to let pass informative unlabeled data and their pseudo labels. CPL does not introduce additional parameters or computations (forward or backward propagation). We apply CPL to FixMatch and call our improved algorithm FlexMatch. FlexMatch achieves state-of-the-art performance on a variety of SSL benchmarks, with especially strong performances when the labeled data are extremely limited or when the task is challenging. For example, FlexMatch outperforms FixMatch by 14.32% and 24.55% on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class. CPL also significantly boosts the convergence speed, e.g., FlexMatch can use only 1/5 training time of FixMatch to achieve even better performance. Furthermore, we show that CPL can be easily adapted to other SSL algorithms and remarkably improve their performances. We open source our code at https://github.com/TorchSSL/TorchSSL.

[11]  arXiv:2110.08264 [pdf, other]
Title: Self-supervised Contrastive Attributed Graph Clustering
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Attributed graph clustering, which learns node representation from node attribute and topological graph for clustering, is a fundamental but challenging task for graph analysis. Recently, methods based on graph contrastive learning (GCL) have obtained impressive clustering performance on this task. Yet, we observe that existing GCL-based methods 1) fail to benefit from imprecise clustering labels; 2) require a post-processing operation to get clustering labels; 3) cannot solve out-of-sample (OOS) problem. To address these issues, we propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC). In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, which aims to maximize the similarities of intra-cluster nodes while minimizing the similarities of inter-cluster nodes, are designed for node representation learning. Meanwhile, a clustering module is built to directly output clustering labels by contrasting the representation of different clusters. Thus, for the OOS nodes, SCAGC can directly calculate their clustering labels. Extensive experimental results on four benchmark datasets have shown that SCAGC consistently outperforms 11 competitive clustering methods.

[12]  arXiv:2110.08265 [pdf, other]
Title: Knowledge-driven Active Learning
Comments: Submitted to the ICLR 2022 conference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the last few years, Deep Learning models have become increasingly popular. However, their deployment is still precluded in those contexts where the amount of supervised data is limited and manual labelling expensive. Active learning strategies aim at solving this problem by requiring supervision only on few unlabelled samples, which improve the most model performances after adding them to the training set. Most strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary. Here we propose a very different approach, taking into consideration domain knowledge. Indeed, in the case of multi-label classification, the relationships among classes offer a way to spot incoherent predictions, i.e., predictions where the model may most likely need supervision. We have developed a framework where first-order-logic knowledge is converted into constraints and their violation is checked as a natural guide for sample selection. We empirically demonstrate that knowledge-driven strategy outperforms standard strategies, particularly on those datasets where domain knowledge is complete. Furthermore, we show how the proposed approach enables discovering data distributions lying far from training data. Finally, the proposed knowledge-driven strategy can be also easily used in object-detection problems where standard uncertainty-based techniques are difficult to apply.

[13]  arXiv:2110.08266 [pdf, other]
Title: PG$^2$Net: Personalized and Group Preferences Guided Network for Next Place Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Predicting the next place to visit is a key in human mobility behavior modeling, which plays a significant role in various fields, such as epidemic control, urban planning, traffic management, and travel recommendation. To achieve this, one typical solution is designing modules based on RNN to capture their preferences to various locations. Although these RNN-based methods can effectively learn individual's hidden personalized preferences to her visited places, the interactions among users can only be weakly learned through the representations of locations. Targeting this, we propose an end-to-end framework named personalized and group preference guided network (PG$^2$Net), considering the users' preferences to various places at both individual and collective levels. Specifically, PG$^2$Net concatenates Bi-LSTM and attention mechanism to capture each user's long-term mobility tendency. To learn population's group preferences, we utilize spatial and temporal information of the visitations to construct a spatio-temporal dependency module. We adopt a graph embedding method to map users' trajectory into a hidden space, capturing their sequential relation. In addition, we devise an auxiliary loss to learn the vectorial representation of her next location. Experiment results on two Foursquare check-in datasets and one mobile phone dataset indicate the advantages of our model compared to the state-of-the-art baselines. Source codes are available at https://github.com/urbanmobility/PG2Net.

[14]  arXiv:2110.08270 [pdf, other]
Title: From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation
Comments: Preprint. Final paper accepted at the 17th IEEE International Conference on Advanced Video and Signal-based Surveillance, AVSS 2021, Virtual, November 16-19, 2021. 8 pages
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded.

[15]  arXiv:2110.08271 [pdf, other]
Title: Training Deep Neural Networks with Joint Quantization and Pruning of Weights and Activations
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Quantization and pruning are core techniques used to reduce the inference costs of deep neural networks. State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often applied to only the weights of the network. In this work, we jointly apply novel uniform quantization and unstructured pruning methods to both the weights and activations of deep neural networks during training. Using our methods, we empirically evaluate the currently accepted prune-then-quantize paradigm across a wide range of computer vision tasks and observe a non-commutative nature when applied to both the weights and activations of deep neural networks. Informed by these observations, we articulate the non-commutativity hypothesis: for a given deep neural network being trained for a specific task, there exists an exact training schedule in which quantization and pruning can be introduced to optimize network performance. We identify that this optimal ordering not only exists, but also varies across discriminative and generative tasks. Using the optimal training schedule within our training framework, we demonstrate increased performance per memory footprint over existing solutions.

[16]  arXiv:2110.08272 [pdf]
Title: Tree-based local explanations of machine learning model predictions, AraucanaXAI
Comments: XAI Healthcare workshop 2021, AIME 2021
Subjects: Machine Learning (cs.LG)

Increasingly complex learning methods such as boosting, bagging and deep learning have made ML models more accurate, but harder to understand and interpret. A tradeoff between performance and intelligibility is often to be faced, especially in high-stakes applications like medicine. In the present article we propose a novel methodological approach for generating explanations of the predictions of a generic ML model, given a specific instance for which the prediction has been made, that can tackle both classification and regression tasks. Advantages of the proposed XAI approach include improved fidelity to the original model, the ability to deal with non-linear decision boundaries, and native support to both classification and regression problems

[17]  arXiv:2110.08306 [pdf, other]
Title: Memory-augmented Adversarial Autoencoders for Multivariate Time-series Anomaly Detection with Deep Reconstruction and Prediction
Subjects: Machine Learning (cs.LG)

Detecting anomalies for multivariate time-series without manual supervision continues a challenging problem due to the increased scale of dimensions and complexity of today's IT monitoring systems. Recent progress of unsupervised time-series anomaly detection mainly use deep autoencoders to solve this problem, i.e. training on normal samples and producing significant reconstruction error on abnormal inputs. However, in practice, autoencoders can reconstruct anomalies so well, due to powerful capabilites of neural networks. Besides, these approaches can be ineffective for identifying non-point anomalies, e.g. contextual anomalies and collective anomalies, since they solely utilze a point-wise reconstruction objective. To tackle the above issues, we propose MemAAE (\textit{Memory-augmented Adversarial Autoencoders with Deep Reconstruction and Prediction}), a novel unsupervised anomaly detection method for time-series. By jointly training two complementary proxy tasks, reconstruction and prediction, with a shared network architecture, we show that detecting anomalies via multiple tasks obtains superior performance rather than single-task training. Additionally, a compressive memory module is introduced to preserve normal patterns, avoiding unexpected generalization on abnormal inputs. Through extensive experiments, MemAAE achieves an overall F1 score of 0.90 on four public datasets, significantly outperforming the best baseline by 0.02.

[18]  arXiv:2110.08307 [pdf, other]
Title: GrowSpace: Learning How to Shape Plants
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Plants are dynamic systems that are integral to our existence and survival. Plants face environment changes and adapt over time to their surrounding conditions. We argue that plant responses to an environmental stimulus are a good example of a real-world problem that can be approached within a reinforcement learning (RL)framework. With the objective of controlling a plant by moving the light source, we propose GrowSpace, as a new RL benchmark. The back-end of the simulator is implemented using the Space Colonisation Algorithm, a plant growing model based on competition for space. Compared to video game RL environments, this simulator addresses a real-world problem and serves as a test bed to visualize plant growth and movement in a faster way than physical experiments. GrowSpace is composed of a suite of challenges that tackle several problems such as control, multi-stage learning,fairness and multi-objective learning. We provide agent baselines alongside case studies to demonstrate the difficulty of the proposed benchmark.

[19]  arXiv:2110.08321 [pdf, other]
Title: Efficient Representations for Privacy-Preserving Inference
Comments: 8 pages, 2 figures
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Deep neural networks have a wide range of applications across multiple domains such as computer vision and medicine. In many cases, the input of a model at inference time can consist of sensitive user data, which raises questions concerning the levels of privacy and trust guaranteed by such services. Much existing work has leveraged homomorphic encryption (HE) schemes that enable computation on encrypted data to achieve private inference for multi-layer perceptrons and CNNs. An early work along this direction was CryptoNets, which takes 250 seconds for one MNIST inference. The main limitation of such approaches is that of compute, which is due to the costly nature of the NTT (number theoretic transform)operations that constitute HE operations. Others have proposed the use of model pruning and efficient data representations to reduce the number of HE operations required. In this paper, we focus on improving upon existing work by proposing changes to the representations of intermediate tensors during CNN inference. We construct and evaluate private CNNs on the MNIST and CIFAR-10 datasets, and achieve over a two-fold reduction in the number of operations used for inferences of the CryptoNets architecture.

[20]  arXiv:2110.08322 [pdf]
Title: Robustness of different loss functions and their impact on networks learning capability
Authors: Vishal Rajput
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Recent developments in AI have made it ubiquitous, every industry is trying to adopt some form of intelligent processing of their data. Despite so many advances in the field, AIs full capability is yet to be exploited by the industry. Industries that involve some risk factors still remain cautious about the usage of AI due to the lack of trust in such autonomous systems. Present-day AI might be very good in a lot of things but it is very bad in reasoning and this behavior of AI can lead to catastrophic results. Autonomous cars crashing into a person or a drone getting stuck in a tree are a few examples where AI decisions lead to catastrophic results. To develop insight and generate an explanation about the learning capability of AI, we will try to analyze the working of loss functions. For our case, we will use two sets of loss functions, generalized loss functions like Binary cross-entropy or BCE and specialized loss functions like Dice loss or focal loss. Through a series of experiments, we will establish whether combining different loss functions is better than using a single loss function and if yes, then what is the reason behind it. In order to establish the difference between generalized loss and specialized losses, we will train several models using the above-mentioned losses and then compare their robustness on adversarial examples. In particular, we will look at how fast the accuracy of different models decreases when we change the pixels corresponding to the most salient gradients.

[21]  arXiv:2110.08323 [pdf, other]
Title: On Learning the Transformer Kernel
Comments: 26 pages, of which 11 form the appendix. 6 figures of which 2 are part of appendix
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KERNELIZED TRANSFORMERS achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.

[22]  arXiv:2110.08330 [pdf, other]
Title: Nothing Wasted: Full Contribution Enforcement in Federated Edge Learning
Subjects: Machine Learning (cs.LG)

The explosive amount of data generated at the network edge makes mobile edge computing an essential technology to support real-time applications, calling for powerful data processing and analysis provided by machine learning (ML) techniques. In particular, federated edge learning (FEL) becomes prominent in securing the privacy of data owners by keeping the data locally used to train ML models. Existing studies on FEL either utilize in-process optimization or remove unqualified participants in advance. In this paper, we enhance the collaboration from all edge devices in FEL to guarantee that the ML model is trained using all available local data to accelerate the learning process. To that aim, we propose a collective extortion (CE) strategy under the imperfect-information multi-player FEL game, which is proved to be effective in helping the server efficiently elicit the full contribution of all devices without worrying about suffering from any economic loss. Technically, our proposed CE strategy extends the classical extortion strategy in controlling the proportionate share of expected utilities for a single opponent to the swiftly homogeneous control over a group of players, which further presents an attractive trait of being impartial for all participants. Moreover, the CE strategy enriches the game theory hierarchy, facilitating a wider application scope of the extortion strategy. Both theoretical analysis and experimental evaluations validate the effectiveness and fairness of our proposed scheme.

[23]  arXiv:2110.08331 [pdf, other]
Title: A New Approach for Interpretability and Reliability in Clinical Risk Prediction: Acute Coronary Syndrome Scenario
Comments: Accepted for publication in the Artificial Intelligence in Medicine journal. Abstract abridged to respect the arXiv's characters limit
Journal-ref: Artificial Intelligence in Medicine, Volume 117, 2021
Subjects: Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)

We intend to create a new risk assessment methodology that combines the best characteristics of both risk score and machine learning models. More specifically, we aim to develop a method that, besides having a good performance, offers a personalized model and outcome for each patient, presents high interpretability, and incorporates an estimation of the prediction reliability which is not usually available. By combining these features in the same approach we expect that it can boost the confidence of physicians to use such a tool in their daily activity. In order to achieve the mentioned goals, a three-step methodology was developed: several rules were created by dichotomizing risk factors; such rules were trained with a machine learning classifier to predict the acceptance degree of each rule (the probability that the rule is correct) for each patient; that information was combined and used to compute the risk of mortality and the reliability of such prediction. The methodology was applied to a dataset of patients admitted with any type of acute coronary syndromes (ACS), to assess the 30-days all-cause mortality risk. The performance was compared with state-of-the-art approaches: logistic regression (LR), artificial neural network (ANN), and clinical risk score model (Global Registry of Acute Coronary Events - GRACE). The proposed approach achieved testing results identical to the standard LR, but offers superior interpretability and personalization; it also significantly outperforms the GRACE risk model and the standard ANN model. The calibration curve also suggests a very good generalization ability of the obtained model as it approaches the ideal curve. Finally, the reliability estimation of individual predictions presented a great correlation with the misclassifications rate. Those properties may have a beneficial application in other clinical scenarios as well. [abridged]

[24]  arXiv:2110.08338 [pdf, other]
Title: Exploratory Lagrangian-Based Particle Tracing Using Deep Learning
Subjects: Machine Learning (cs.LG); Fluid Dynamics (physics.flu-dyn)

Time-varying vector fields produced by computational fluid dynamics simulations are often prohibitively large and pose challenges for accurate interactive analysis and exploration. To address these challenges, reduced Lagrangian representations have been increasingly researched as a means to improve scientific time-varying vector field exploration capabilities. This paper presents a novel deep neural network-based particle tracing method to explore time-varying vector fields represented by Lagrangian flow maps. In our workflow, in situ processing is first utilized to extract Lagrangian flow maps, and deep neural networks then use the extracted data to learn flow field behavior. Using a trained model to predict new particle trajectories offers a fixed small memory footprint and fast inference. To demonstrate and evaluate the proposed method, we perform an in-depth study of performance using a well-known analytical data set, the Double Gyre. Our study considers two flow map extraction strategies as well as the impact of the number of training samples and integration durations on efficacy, evaluates multiple sampling options for training and testing and informs hyperparameter settings. Overall, we find our method requires a fixed memory footprint of 10.5 MB to encode a Lagrangian representation of a time-varying vector field while maintaining accuracy. For post hoc analysis, loading the trained model costs only two seconds, significantly reducing the burden of I/O when reading data for visualization. Moreover, our parallel implementation can infer one hundred locations for each of two thousand new pathlines across the entire temporal resolution in 1.3 seconds using one NVIDIA Titan RTX GPU.

[25]  arXiv:2110.08350 [pdf, other]
Title: Differentiable Network Pruning for Microcontrollers
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Embedded and personal IoT devices are powered by microcontroller units (MCUs), whose extreme resource scarcity is a major obstacle for applications relying on on-device deep learning inference. Orders of magnitude less storage, memory and computational capacity, compared to what is typically required to execute neural networks, impose strict structural constraints on the network architecture and call for specialist model compression methodology. In this work, we present a differentiable structured network pruning method for convolutional neural networks, which integrates a model's MCU-specific resource usage and parameter importance feedback to obtain highly compressed yet accurate classification models. Our methodology (a) improves key resource usage of models up to 80x; (b) prunes iteratively while a model is trained, resulting in little to no overhead or even improved training time; (c) produces compressed models with matching or improved resource usage up to 1.7x in less time compared to prior MCU-specific methods. Compressed models are available for download.

[26]  arXiv:2110.08378 [pdf, other]
Title: FedSLD: Federated Learning with Shared Label Distribution for Medical Image Classification
Authors: Jun Luo, Shandong Wu
Comments: 10 pages
Subjects: Machine Learning (cs.LG)

Machine learning in medical research, by nature, needs careful attention on obeying the regulations of data privacy, making it difficult to train a machine learning model over gathered data from different medical centers. Failure of leveraging data of the same kind may result in poor generalizability for the trained model. Federated learning (FL) enables collaboratively training a joint model while keeping the data decentralized for multiple medical centers. However, federated optimizations often suffer from the heterogeneity of the data distribution across medical centers. In this work, we propose Federated Learning with Shared Label Distribution (FedSLD) for classification tasks, a method that assumes knowledge of the label distributions for all the participating clients in the federation. FedSLD adjusts the contribution of each data sample to the local objective during optimization given knowledge of the distribution, mitigating the instability brought by data heterogeneity across all clients. We conduct extensive experiments on four publicly available image datasets with different types of non-IID data distributions. Our results show that FedSLD achieves better convergence performance than the compared leading FL optimization algorithms, increasing the test accuracy by up to 5.50 percentage points.

[27]  arXiv:2110.08382 [pdf, other]
Title: A Neural Network Ensemble Approach to System Identification
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Dynamical Systems (math.DS); Numerical Analysis (math.NA)

We present a new algorithm for learning unknown governing equations from trajectory data, using and ensemble of neural networks. Given samples of solutions $x(t)$ to an unknown dynamical system $\dot{x}(t)=f(t,x(t))$, we approximate the function $f$ using an ensemble of neural networks. We express the equation in integral form and use Euler method to predict the solution at every successive time step using at each iteration a different neural network as a prior for $f$. This procedure yields M-1 time-independent networks, where M is the number of time steps at which $x(t)$ is observed. Finally, we obtain a single function $f(t,x(t))$ by neural network interpolation. Unlike our earlier work, where we numerically computed the derivatives of data, and used them as target in a Lipschitz regularized neural network to approximate $f$, our new method avoids numerical differentiations, which are unstable in presence of noise. We test the new algorithm on multiple examples both with and without noise in the data. We empirically show that generalization and recovery of the governing equation improve by adding a Lipschitz regularization term in our loss function and that this method improves our previous one especially in presence of noise, when numerical differentiation provides low quality target data. Finally, we compare our results with the method proposed by Raissi, et al. arXiv:1801.01236 (2018) and with SINDy.

[28]  arXiv:2110.08394 [pdf, other]
Title: Adapt to Adaptation: Learning Personalization for Cross-Silo Federated Learning
Authors: Jun Luo, Shandong Wu
Comments: 15 pages
Subjects: Machine Learning (cs.LG)

The goal of conventional federated learning (FL) is to train a global model for a federation of clients with decentralized data, reducing the systemic privacy risk of centralized training. The distribution shift across non-IID datasets, also known as the data heterogeneity, often poses a challenge for this one-global-model-fits-all solution. In this work, we propose APPLE, a personalized cross-silo FL framework that adaptively learns how much each client can benefit from other clients' models. We also introduce a method to flexibly control the focus of training APPLE between global and local objectives. We empirically evaluate our method's convergence and generalization behavior and performed extensive experiments on two benchmark datasets and two medical imaging datasets under two non-IID settings. The results show that the proposed personalized FL framework, APPLE, achieves state-of-the-art performance compared to several other personalized FL approaches in the literature.

[29]  arXiv:2110.08406 [pdf, other]
Title: Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science
Comments: 21 pages, 10 figures
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Applied Physics (physics.app-ph); Optics (physics.optics)

Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three ``inexpensive'' and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: 1)~abundant unlabeled data, 2)~prior knowledge of symmetries or invariances and 3)~surrogate data obtained at near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation. SIB-CL consistently results in orders of magnitude reduction in the number of labels needed to achieve the same network accuracies.

[30]  arXiv:2110.08432 [pdf, other]
Title: Meta-Learning with Adjoint Methods
Subjects: Machine Learning (cs.LG)

Model Agnostic Meta-Learning (MAML) is widely used to find a good initialization for a family of tasks. Despite its success, a critical challenge in MAML is to calculate the gradient w.r.t the initialization of a long training trajectory for the sampled tasks, because the computation graph can rapidly explode and the computational cost is very expensive. To address this problem, we propose Adjoint MAML (A-MAML). We view gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE). To efficiently compute the gradient of the validation loss w.r.t the initialization, we use the adjoint method to construct a companion, backward ODE. To obtain the gradient w.r.t the initialization, we only need to run the standard ODE solver twice -- one is forward in time that evolves a long trajectory of gradient flow for the sampled task; the other is backward and solves the adjoint ODE. We need not create or expand any intermediate computational graphs, adopt aggressive approximations, or impose proximal regularizers in the training loss. Our approach is cheap, accurate, and adaptable to different trajectory lengths. We demonstrate the advantage of our approach in both synthetic and real-world meta-learning tasks.

[31]  arXiv:2110.08440 [pdf, other]
Title: Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs
Comments: Under Review
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Q-learning is a popular Reinforcement Learning (RL) algorithm which is widely used in practice with function approximation \citep{mnih2015human}. In contrast, existing theoretical results are pessimistic about Q-learning. For example, \citep{baird1995residual} shows that Q-learning does not converge even with linear function approximation for linear MDPs. Furthermore, even for tabular MDPs with synchronous updates, Q-learning was shown to have sub-optimal sample complexity \citep{li2021q,azar2013minimax}. The goal of this work is to bridge the gap between practical success of Q-learning and the relatively pessimistic theoretical results. The starting point of our work is the observation that in practice, Q-learning is used with two important modifications: (i) training with two networks, called online network and target network simultaneously (online target learning, or OTL) , and (ii) experience replay (ER) \citep{mnih2015human}. While they have been observed to play a significant role in the practical success of Q-learning, a thorough theoretical understanding of how these two modifications improve the convergence behavior of Q-learning has been missing in literature. By carefully combining Q-learning with OTL and \emph{reverse} experience replay (RER) (a form of experience replay), we present novel methods Q-Rex and Q-RexDaRe (Q-Rex + data reuse). We show that Q-Rex efficiently finds the optimal policy for linear MDPs (or more generally for MDPs with zero inherent Bellman error with linear approximation (ZIBEL)) and provide non-asymptotic bounds on sample complexity -- the first such result for a Q-learning method for this class of MDPs under standard assumptions. Furthermore, we demonstrate that Q-RexDaRe in fact achieves near optimal sample complexity in the tabular setting, improving upon the existing results for vanilla Q-learning.

[32]  arXiv:2110.08450 [pdf, other]
Title: Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)

Improving the training and inference performance of graph neural networks (GNNs) is faced with a challenge uncommon in general neural networks: creating mini-batches requires a lot of computation and data movement due to the exponential growth of multi-hop graph neighborhoods along network layers. Such a unique challenge gives rise to a diverse set of system design choices. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment, under which we identify major performance bottlenecks hitherto under-explored by developers: mini-batch preparation and transfer. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler, a shared-memory parallelization strategy, and the pipelining of batch transfer with GPU computation. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised. Such an observation unifies training and inference, simplifying model implementation. We report comprehensive experimental results with several benchmark data sets and GNN architectures, including a demonstration that, for the ogbn-papers100M data set, our system SALIENT achieves a speedup of 3x over a standard PyTorch-Geometric implementation with a single GPU and a further 8x parallel speedup with 16 GPUs. Therein, training a 3-layer GraphSAGE model with sampling fanout (15, 10, 5) takes 2.0 seconds per epoch and inference with fanout (20, 20, 20) takes 2.4 seconds, attaining test accuracy 64.58%.

[33]  arXiv:2110.08465 [pdf, other]
Title: A Heterogeneous Graph Based Framework for Multimodal Neuroimaging Fusion Learning
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)

Here, we present a Heterogeneous Graph neural network for Multimodal neuroimaging fusion learning (HGM). Traditional GNN-based models usually assume the brain network is a homogeneous graph with single type of nodes and edges. However, vast literatures have shown the heterogeneity of the human brain especially between the two hemispheres. Homogeneous brain network is insufficient to model the complicated brain state. Therefore, in this work we firstly model the brain network as heterogeneous graph with multi-type nodes (i.e., left and right hemispheric nodes) and multi-type edges (i.e., intra- and inter-hemispheric edges). Besides, we also propose a self-supervised pre-training strategy based on heterogeneou brain network to address the overfitting problem due to the complex model and small sample size. Our results on two datasets show the superiority of proposed model over other multimodal methods for disease prediction task. Besides, ablation experiments show that our model with pre-training strategy can alleviate the problem of limited training sample size.

[34]  arXiv:2110.08477 [pdf, other]
Title: FedMM: Saddle Point Optimization for Federated Adversarial Domain Adaptation
Subjects: Machine Learning (cs.LG)

Federated adversary domain adaptation is a unique distributed minimax training task due to the prevalence of label imbalance among clients, with each client only seeing a subset of the classes of labels required to train a global model. To tackle this problem, we propose a distributed minimax optimizer referred to as FedMM, designed specifically for the federated adversary domain adaptation problem. It works well even in the extreme case where each client has different label classes and some clients only have unsupervised tasks. We prove that FedMM ensures convergence to a stationary point with domain-shifted unsupervised data. On a variety of benchmark datasets, extensive experiments show that FedMM consistently achieves either significant communication savings or significant accuracy improvements over federated optimizers based on the gradient descent ascent (GDA) algorithm. When training from scratch, for example, it outperforms other GDA based federated average methods by around $20\%$ in accuracy over the same communication rounds; and it consistently outperforms when training from pre-trained models with an accuracy improvement from $5.4\%$ to $9\%$ for different networks.

[35]  arXiv:2110.08483 [pdf, other]
Title: Streaming Decision Trees and Forests
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)

Machine learning has successfully leveraged modern data and provided computational solutions to innumerable real-world problems, including physical and biomedical discoveries. Currently, estimators could handle both scenarios with all samples available and situations requiring continuous updates. However, there is still room for improvement on streaming algorithms based on batch decision trees and random forests, which are the leading methods in batch data tasks. In this paper, we explore the simplest partial fitting algorithm to extend batch trees and test our models: stream decision tree (SDT) and stream decision forest (SDF) on three classification tasks of varying complexities. For reference, both existing streaming trees (Hoeffding trees and Mondrian forests) and batch estimators are included in the experiments. In all three tasks, SDF consistently produces high accuracy, whereas existing estimators encounter space restraints and accuracy fluctuations. Thus, our streaming trees and forests show great potential for further improvements, which are good candidates for solving problems like distribution drift and transfer learning.

[36]  arXiv:2110.08510 [pdf, ps, other]
Title: DFW-PP: Dynamic Feature Weighting based Popularity Prediction for Social Media Content
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)

The increasing popularity of social media platforms makes it important to study user engagement, which is a crucial aspect of any marketing strategy or business model. The over-saturation of content on social media platforms has persuaded us to identify the important factors that affect content popularity. This comes from the fact that only an iota of the humongous content available online receives the attention of the target audience. Comprehensive research has been done in the area of popularity prediction using several Machine Learning techniques. However, we observe that there is still significant scope for improvement in analyzing the social importance of media content. We propose the DFW-PP framework, to learn the importance of different features that vary over time. Further, the proposed method controls the skewness of the distribution of the features by applying a log-log normalization. The proposed method is experimented with a benchmark dataset, to show promising results. The code will be made publicly available at https://github.com/chaitnayabasava/DFW-PP.

[37]  arXiv:2110.08557 [pdf, other]
Title: DPNAS: Neural Architecture Search for Deep Learningwith Differential Privacy
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Training deep neural networks (DNNs) for meaningful differential privacy (DP) guarantees severely degrades model utility. In this paper, we demonstrate that the architecture of DNNs has a significant impact on model utility in the context of private deep learning, whereas its effect is largely unexplored in previous studies. In light of this missing, we propose the very first framework that employs neural architecture search to automatic model design for private deep learning, dubbed as DPNAS. To integrate private learning with architecture search, we delicately design a novel search space and propose a DP-aware method for training candidate models. We empirically certify the effectiveness of the proposed framework. The searched model DPNASNet achieves state-of-the-art privacy/utility trade-offs, e.g., for the privacy budget of $(\epsilon, \delta)=(3, 1\times10^{-5})$, our model obtains test accuracy of $98.57\%$ on MNIST, $88.09\%$ on FashionMNIST, and $68.33\%$ on CIFAR-10. Furthermore, by studying the generated architectures, we provide several intriguing findings of designing private-learning-friendly DNNs, which can shed new light on model design for deep learning with differential privacy.

[38]  arXiv:2110.08565 [pdf, ps, other]
Title: Dynamic Graph Echo State Networks
Comments: Accepted for oral presentation at ESANN 2021
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Dynamic temporal graphs represent evolving relations between entities, e.g. interactions between social network users or infection spreading. We propose an extension of graph echo state networks for the efficient processing of dynamic temporal graphs, with a sufficient condition for their echo state property, and an experimental analysis of reservoir layout impact. Compared to temporal graph kernels that need to hold the entire history of vertex interactions, our model provides a vector encoding for the dynamic graph that is updated at each time-step without requiring training. Experiments show accuracy comparable to approximate temporal graph kernels on twelve dissemination process classification tasks.

[39]  arXiv:2110.08599 [pdf, other]
Title: Mapping illegal waste dumping sites with neural-network classification of satellite imagery
Comments: 5 pages, 3 figures, KDD Workshop on Data-driven Humanitarian Mapping held with the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 14, 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)

Public health and habitat quality are crucial goals of urban planning. In recent years, the severe social and environmental impact of illegal waste dumping sites has made them one of the most serious problems faced by cities in the Global South, in a context of scarce information available for decision making. To help identify the location of dumping sites and track their evolution over time we adopt a data-driven model from the machine learning domain, analyzing satellite images. This allows us to take advantage of the increasing availability of geo-spatial open-data, high-resolution satellite imagery, and open source tools to train machine learning algorithms with a small set of known waste dumping sites in Buenos Aires, and then predict the location of other sites over vast areas at high speed and low cost. This case study shows the results of a collaboration between Dymaxion Labs and Fundaci\'on Bunge y Born to harness this technique in order to create a comprehensive map of potential locations of illegal waste dumping sites in the region.

[40]  arXiv:2110.08607 [pdf, other]
Title: Physics-guided Deep Markov Models for Learning Nonlinear Dynamical Systems with Uncertainty
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)

In this paper, we propose a probabilistic physics-guided framework, termed Physics-guided Deep Markov Model (PgDMM). The framework is especially targeted to the inference of the characteristics and latent structure of nonlinear dynamical systems from measurement data, where it is typically intractable to perform exact inference of latent variables. A recently surfaced option pertains to leveraging variational inference to perform approximate inference. In such a scheme, transition and emission functions of the system are parameterized via feed-forward neural networks (deep generative models). However, due to the generalized and highly versatile formulation of neural network functions, the learned latent space is often prone to lack physical interpretation and structured representation. To address this, we bridge physics-based state space models with Deep Markov Models, thus delivering a hybrid modeling framework for unsupervised learning and identification for nonlinear dynamical systems. Specifically, the transition process can be modeled as a physics-based model enhanced with an additive neural network component, which aims to learn the discrepancy between the physics-based model and the actual dynamical system being monitored. The proposed framework takes advantage of the expressive power of deep learning, while retaining the driving physics of the dynamical system by imposing physics-driven restrictions on the side of the latent space. We demonstrate the benefits of such a fusion in terms of achieving improved performance on illustrative simulation examples and experimental case studies of nonlinear systems. Our results indicate that the physics-based models involved in the employed transition and emission functions essentially enforce a more structured and physically interpretable latent space, which is essential to generalization and prediction capabilities.

[41]  arXiv:2110.08611 [pdf, ps, other]
Title: Deep Active Learning by Leveraging Training Dynamics
Subjects: Machine Learning (cs.LG)

Active learning theories and methods have been extensively studied in classical statistical learning settings. However, deep active learning, i.e., active learning with deep learning models, is usually based on empirical criteria without solid theoretical justification, thus suffering from heavy doubts when some of those fail to provide benefits in applications. In this paper, by exploring the connection between the generalization performance and the training dynamics, we propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. In particular, we prove that convergence speed of training and the generalization performance is positively correlated under the ultra-wide condition and show that maximizing the training dynamics leads to a better generalization performance. Further on, to scale up to large deep neural networks and data sets, we introduce two relaxations for the subset selection problem and reduce the time complexity from polynomial to constant. Empirical results show that dynamicAL not only outperforms the other baselines consistently but also scales well on large deep learning models. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in deep active learning applications.

[42]  arXiv:2110.08614 [pdf, other]
Title: Deep Learning and Spectral Embedding for Graph Partitioning
Subjects: Machine Learning (cs.LG)

We present a graph bisection and partitioning algorithm based on graph neural networks. For each node in the graph, the network outputs probabilities for each of the partitions. The graph neural network consists of two modules: an embedding phase and a partitioning phase. The embedding phase is trained first by minimizing a loss function inspired by spectral graph theory. The partitioning module is trained through a loss function that corresponds to the expected value of the normalized cut. Both parts of the neural network rely on SAGE convolutional layers and graph coarsening using heavy edge matching. The multilevel structure of the neural network is inspired by the multigrid algorithm. Our approach generalizes very well to bigger graphs and has partition quality comparable to METIS, Scotch and spectral partitioning, with shorter runtime compared to METIS and spectral partitioning.

[43]  arXiv:2110.08616 [pdf, other]
Title: GradSign: Model Performance Inference with Theoretical Insights
Comments: Preprint. Under review
Subjects: Machine Learning (cs.LG)

A key challenge in neural architecture search (NAS) is quickly inferring the predictive performance of a broad spectrum of networks to discover statistically accurate and computationally efficient ones. We refer to this task as model performance inference (MPI). The current practice for efficient MPI is gradient-based methods that leverage the gradients of a network at initialization to infer its performance. However, existing gradient-based methods rely only on heuristic metrics and lack the necessary theoretical foundations to consolidate their designs. We propose GradSign, an accurate, simple, and flexible metric for model performance inference with theoretical insights. The key idea behind GradSign is a quantity {\Psi} to analyze the optimization landscape of different networks at the granularity of individual training samples. Theoretically, we show that both the network's training and true population losses are proportionally upper-bounded by {\Psi} under reasonable assumptions. In addition, we design GradSign, an accurate and simple approximation of {\Psi} using the gradients of a network evaluated at a random initialization state. Evaluation on seven NAS benchmarks across three training datasets shows that GradSign generalizes well to real-world networks and consistently outperforms state-of-the-art gradient-based methods for MPI evaluated by Spearman's {\rho} and Kendall's Tau. Additionally, we integrate GradSign into four existing NAS algorithms and show that the GradSign-assisted NAS algorithms outperform their vanilla counterparts by improving the accuracies of best-discovered networks by up to 0.3%, 1.1%, and 1.0% on three real-world tasks.

[44]  arXiv:2110.08626 [pdf, other]
Title: Learning velocity model for complex media with deep convolutional neural networks
Comments: 14 pages, 6 figures, 6 tables
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The paper considers the problem of velocity model acquisition for a complex media based on boundary measurements. The acoustic model is used to describe the media. We used an open-source dataset of velocity distributions to compare the presented results with the previous works directly. Forward modeling is performed using the grid-characteristic numerical method. The inverse problem is solved using deep convolutional neural networks. Modifications for a baseline UNet architecture are proposed to improve both structural similarity index measure quantitative correspondence of the velocity profiles with the ground truth. We evaluate our enhancements and demonstrate the statistical significance of the results.

[45]  arXiv:2110.08627 [pdf, other]
Title: On the Pareto Frontier of Regret Minimization and Best Arm Identification in Stochastic Bandits
Comments: 27 pages, 8 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Machine Learning (stat.ML)

We study the Pareto frontier of two archetypal objectives in stochastic bandits, namely, regret minimization (RM) and best arm identification (BAI) with a fixed horizon. It is folklore that the balance between exploitation and exploration is crucial for both RM and BAI, but exploration is more critical in achieving the optimal performance for the latter objective. To make this precise, we first design and analyze the BoBW-lil'UCB$({\gamma})$ algorithm, which achieves order-wise optimal performance for RM or BAI under different values of ${\gamma}$. Complementarily, we show that no algorithm can simultaneously perform optimally for both the RM and BAI objectives. More precisely, we establish non-trivial lower bounds on the regret achievable by any algorithm with a given BAI failure probability. This analysis shows that in some regimes BoBW-lil'UCB$({\gamma})$ achieves Pareto-optimality up to constant or small terms. Numerical experiments further demonstrate that when applied to difficult instances, BoBW-lil'UCB outperforms a close competitor UCB$_{\alpha}$ (Degenne et al., 2019), which is designed for RM and BAI with a fixed confidence.

[46]  arXiv:2110.08642 [pdf, other]
Title: Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
Journal-ref: IEEE The 3rd International Symposium on Multi-Robot and Multi-Agent Systems (MRS), 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic. ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic. By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents' choices that implicitly improves credit assignment. We evaluate ROLA across diverse benchmarks and show its robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.

[47]  arXiv:2110.08649 [pdf, other]
Title: Equivariant Discrete Normalizing Flows
Comments: Preprint
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

At its core, generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modelled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformations laws. However, current approaches are couched in the formalism of continuous normalizing flows that require the construction of equivariant vector fields -- inhibiting their simple application to conventional higher dimensional generative modelling domains like natural images. In this paper we focus on building equivariant normalizing flows using discrete layers. We first theoretically prove the existence of an equivariant map for compact groups whose actions are on compact spaces. We further introduce two new equivariant flows: $G$-coupling Flows and $G$-Residual Flows that elevate classical Coupling and Residual Flows with equivariant maps to a prescribed group $G$. Our construction of $G$-Residual Flows are also universal, in the sense that we prove an $G$-equivariant diffeomorphism can be exactly mapped by a $G$-residual flow. Finally, we complement our theoretical insights with experiments -- for the first time -- on image datasets like CIFAR-10 and show $G$-Equivariant Discrete Normalizing flows lead to increased data efficiency, faster convergence, and improved likelihood estimates.

[48]  arXiv:2110.08678 [pdf, other]
Title: Transformer with a Mixture of Gaussian Keys
Comments: 21 pages, 8 figures, 4 tables
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

Multi-head attention is a driving force behind state-of-the-art transformers which achieve remarkable performance across a variety of natural language processing (NLP) and computer vision tasks. It has been observed that for many applications, those attention heads learn redundant embedding, and most of them can be removed without degrading the performance of the model. Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head. These mixtures of keys follow a Gaussian mixture model and allow each attention head to focus on different parts of the input sequence efficiently. Compared to its conventional transformer counterpart, Transformer-MGK accelerates training and inference, has fewer parameters, and requires less FLOPs to compute while achieving comparable or better accuracy across tasks. Transformer-MGK can also be easily extended to use with linear attentions. We empirically demonstrate the advantage of Transformer-MGK in a range of practical applications including language modeling and tasks that involve very long sequences. On the Wikitext-103 and Long Range Arena benchmark, Transformer-MGKs with 4 heads attain comparable or better performance to the baseline transformers with 8 heads.

[49]  arXiv:2110.08688 [pdf, other]
Title: MG-GCN: Scalable Multi-GPU GCN Training Framework
Comments: 12 pages, 13 figures, Under Review
Subjects: Machine Learning (cs.LG)

Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs containing tens of millions of vertices or more. Recent work has shown that, for the graphs used in the machine learning community, communication becomes a bottleneck and scaling is blocked outside of the single machine regime. Thus, we propose MG-GCN, a multi-GPU GCN training framework taking advantage of the high-speed communication links between the GPUs present in multi-GPU systems. MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers to reduce the memory footprint of training GNN models, as well as communication and computation overlap. These optimizations enable execution on larger datasets, that generally do not fit into memory of a single GPU in state-of-the-art implementations. Furthermore, they contribute to achieve superior speedup compared to the state-of-the-art. For example, MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.

[50]  arXiv:2110.08689 [pdf, other]
Title: Classical-to-Quantum Transfer Learning for Spoken Command Recognition Based on Quantum Neural Networks
Comments: submitted to ICASSP'22
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)

This work investigates an extension of transfer learning applied in machine learning algorithms to the emerging hybrid end-to-end quantum neural network (QNN) for spoken command recognition (SCR). Our QNN-based SCR system is composed of classical and quantum components: (1) the classical part mainly relies on a 1D convolutional neural network (CNN) to extract speech features; (2) the quantum part is built upon the variational quantum circuit with a few learnable parameters. Since it is inefficient to train the hybrid end-to-end QNN from scratch on a noisy intermediate-scale quantum (NISQ) device, we put forth a hybrid transfer learning algorithm that allows a pre-trained classical network to be transferred to the classical part of the hybrid QNN model. The pre-trained classical network is further modified and augmented through jointly fine-tuning with a variational quantum circuit (VQC). The hybrid transfer learning methodology is particularly attractive for the task of QNN-based SCR because low-dimensional classical features are expected to be encoded into quantum states. We assess the hybrid transfer learning algorithm applied to the hybrid classical-quantum QNN for SCR on the Google speech command dataset, and our classical simulation results suggest that the hybrid transfer learning can boost our baseline performance on the SCR task.

[51]  arXiv:2110.08690 [pdf, other]
Title: Tackling the Imbalance for GNNs
Subjects: Machine Learning (cs.LG)

Different from deep neural networks for non-graph data classification, graph neural networks (GNNs) leverage the information exchange between nodes (or samples) when representing nodes. The category distribution shows an imbalance or even a highly-skewed trend on nearly all existing benchmark GNN data sets. The imbalanced distribution will cause misclassification of nodes in the minority classes, and even cause the classification performance on the entire data set to decrease. This study explores the effects of the imbalance problem on the performances of GNNs and proposes new methodologies to solve it. First, a node-level index, namely, the label difference index ($LDI$), is defined to quantitatively analyze the relationship between imbalance and misclassification. The less samples in a class, the higher the value of its average $LDI$; the higher the $LDI$ of a sample, the more likely the sample will be misclassified. We define a new loss and propose four new methods based on $LDI$. Experimental results indicate that the classification accuracies of the three among our proposed four new methods are better in both transductive and inductive settings. The $LDI$ can be applied to other GNNs.

[52]  arXiv:2110.08693 [pdf, other]
Title: On the Statistical Analysis of Complex Tree-shaped 3D Objects
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG); Graphics (cs.GR); Machine Learning (stat.ML)

How can one analyze detailed 3D biological objects, such as neurons and botanical trees, that exhibit complex geometrical and topological variation? In this paper, we develop a novel mathematical framework for representing, comparing, and computing geodesic deformations between the shapes of such tree-like 3D objects. A hierarchical organization of subtrees characterizes these objects -- each subtree has the main branch with some side branches attached -- and one needs to match these structures across objects for meaningful comparisons. We propose a novel representation that extends the Square-Root Velocity Function (SRVF), initially developed for Euclidean curves, to tree-shaped 3D objects. We then define a new metric that quantifies the bending, stretching, and branch sliding needed to deform one tree-shaped object into the other. Compared to the current metrics, such as the Quotient Euclidean Distance (QED) and the Tree Edit Distance (TED), the proposed representation and metric capture the full elasticity of the branches (i.e., bending and stretching) as well as the topological variations (i.e., branch death/birth and sliding). It completely avoids the shrinkage that results from the edge collapse and node split operations of the QED and TED metrics. We demonstrate the utility of this framework in comparing, matching, and computing geodesics between biological objects such as neurons and botanical trees. The framework is also applied to various shape analysis tasks: (i) symmetry analysis and symmetrization of tree-shaped 3D objects, (ii) computing summary statistics (means and modes of variations) of populations of tree-shaped 3D objects, (iii) fitting parametric probability distributions to such populations, and (iv) finally synthesizing novel tree-shaped 3D objects through random sampling from estimated probability distributions.

[53]  arXiv:2110.08695 [pdf, other]
Title: Towards Instance-Optimal Offline Reinforcement Learning with Pessimism
Comments: NeurIPS, 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We study the offline reinforcement learning (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown Markov Decision Process (MDP) using the data coming from a policy $\mu$. In particular, we consider the sample complexity problems of offline RL for finite-horizon MDPs. Prior works study this problem based on different data-coverage assumptions, and their learning guarantees are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the Adaptive Pessimistic Value Iteration (APVI) algorithm and derive the suboptimality upper bound that nearly matches \[ O\left(\sum_{h=1}^H\sum_{s_h,a_h}d^{\pi^\star}_h(s_h,a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h,a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h,a_h)}}\sqrt{\frac{1}{n}}\right). \] In complementary, we also prove a per-instance information-theoretical lower bound under the weak assumption that $d^\mu_h(s_h,a_h)>0$ if $d^{\pi^\star}_h(s_h,a_h)>0$. Different from the previous minimax lower bounds, the per-instance lower bound (via local minimaxity) is a much stronger criterion as it applies to individual instances separately. Here $\pi^\star$ is a optimal policy, $\mu$ is the behavior policy and $d_h^\mu$ is the marginal state-action probability. We call the above equation the intrinsic offline reinforcement learning bound since it directly implies all the existing optimal results: minimax rate under uniform data-coverage assumption, horizon-free setting, single policy concentrability, and the tight problem-dependent results. Later, we extend the result to the assumption-free regime (where we make no assumption on $ \mu$) and obtain the assumption-free intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.

[54]  arXiv:2110.08710 [pdf, ps, other]
Title: NeuralArTS: Structuring Neural Architecture Search with Type Theory
Subjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL); Machine Learning (stat.ML)

Neural Architecture Search (NAS) algorithms automate the task of finding optimal deep learning architectures given an initial search space of possible operations. Developing these search spaces is usually a manual affair with pre-optimized search spaces being more efficient, rather than searching from scratch. In this paper we present a new framework called Neural Architecture Type System (NeuralArTS) that categorizes the infinite set of network operations in a structured type system. We further demonstrate how NeuralArTS can be applied to convolutional layers and propose several future directions.

[55]  arXiv:2110.08712 [pdf, other]
Title: Black-box Adversarial Attacks on Network-wide Multi-step Traffic State Prediction Models
Comments: Accepted to IEEE International Conference on Intelligent Transportation Systems (ITSC), 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Traffic state prediction is necessary for many Intelligent Transportation Systems applications. Recent developments of the topic have focused on network-wide, multi-step prediction, where state of the art performance is achieved via deep learning models, in particular, graph neural network-based models. While the prediction accuracy of deep learning models is high, these models' robustness has raised many safety concerns, given that imperceptible perturbations added to input can substantially degrade the model performance. In this work, we propose an adversarial attack framework by treating the prediction model as a black-box, i.e., assuming no knowledge of the model architecture, training data, and (hyper)parameters. However, we assume that the adversary can oracle the prediction model with any input and obtain corresponding output. Next, the adversary can train a substitute model using input-output pairs and generate adversarial signals based on the substitute model. To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined. As a result, the adversary can degrade the target model's prediction accuracy up to $54\%$. In comparison, two conventional statistical models (linear regression and historical average) are also examined. While these two models do not produce high prediction accuracy, they are either influenced negligibly (less than $3\%$) or are immune to the adversary's attack.

[56]  arXiv:2110.08717 [pdf, other]
Title: Hand Gesture Recognition Using Temporal Convolutions and Attention Mechanism
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Advances in biosignal signal processing and machine learning, in particular Deep Neural Networks (DNNs), have paved the way for the development of innovative Human-Machine Interfaces for decoding the human intent and controlling artificial limbs. DNN models have shown promising results with respect to other algorithms for decoding muscle electrical activity, especially for recognition of hand gestures. Such data-driven models, however, have been challenged by their need for a large number of trainable parameters and their structural complexity. Here we propose the novel Temporal Convolutions-based Hand Gesture Recognition architecture (TC-HGR) to reduce this computational burden. With this approach, we classified 17 hand gestures via surface Electromyogram (sEMG) signals by the adoption of attention mechanisms and temporal convolutions. The proposed method led to 81.65% and 80.72% classification accuracy for window sizes of 300ms and 200ms, respectively. The number of parameters to train the proposed TC-HGR architecture is 11.9 times less than that of its state-of-the-art counterpart.

[57]  arXiv:2110.08720 [pdf, other]
Title: Centroid Approximation for Bootstrap
Authors: Mao Ye, Qiang Liu
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Bootstrap is a principled and powerful frequentist statistical tool for uncertainty quantification. Unfortunately, standard bootstrap methods are computationally intensive due to the need of drawing a large i.i.d. bootstrap sample to approximate the ideal bootstrap distribution; this largely hinders their application in large-scale machine learning, especially deep learning problems. In this work, we propose an efficient method to explicitly \emph{optimize} a small set of high quality "centroid" points to better approximate the ideal bootstrap distribution. We achieve this by minimizing a simple objective function that is asymptotically equivalent to the Wasserstein distance to the ideal bootstrap distribution. This allows us to provide an accurate estimation of uncertainty with a small number of bootstrap centroids, outperforming the naive i.i.d. sampling approach. Empirically, we show that our method can boost the performance of bootstrap in a variety of applications.

[58]  arXiv:2110.08725 [pdf, other]
Title: A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization
Authors: Chao Ma, Lexing Ying
Subjects: Machine Learning (cs.LG)

The training dynamics of two-layer neural networks with batch normalization (BN) is studied. It is written as the training dynamics of a neural network without BN on a Riemannian manifold. Therefore, we identify BN's effect of changing the metric in the parameter space. Later, the infinite-width limit of the two-layer neural networks with BN is considered, and a mean-field formulation is derived for the training dynamics. The training dynamics of the mean-field formulation is shown to be the Wasserstein gradient flow on the manifold. Theoretical analysis are provided on the well-posedness and convergence of the Wasserstein gradient flow.

[59]  arXiv:2110.08727 [pdf, other]
Title: Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph Neural Networks (GNNs) have recently become popular for graph machine learning and have shown great results on wide node classification tasks. Yet, GNNs are less popular for practical deployments in the industry owing to their scalability challenges incurred by data dependency. Namely, GNN inference depends on neighbor nodes multiple hops away from the target, and fetching these nodes burdens latency-constrained applications. Existing inference acceleration methods like pruning and quantization can speed up GNNs to some extent by reducing Multiplication-and-ACcumulation (MAC) operations. However, their improvements are limited given the data dependency is not resolved. Conversely, multi-layer perceptrons (MLPs) have no dependency on graph data and infer much faster than GNNs, even though they are less accurate than GNNs for node classification in general. Motivated by these complementary strengths and weaknesses, we bring GNNs and MLPs together via knowledge distillation (KD). Our work shows that the performance of MLPs can be improved by large margins with GNN KD. We call the distilled MLPs Graph-less Neural Networks (GLNNs) as they have no inference graph dependency. We show that GLNN with competitive performance infer faster than GNNs by 146X-273X and faster than other acceleration methods by 14X-27X. Meanwhile, under a production setting involving both transductive and inductive predictions across 7 datasets, GLNN accuracies improve over stand alone MLPs by 12.36% on average and match GNNs on 6/7 datasets. A comprehensive analysis of GLNN shows when and why GLNN can achieve competitive results to GNNs and suggests GLNN as a handy choice for latency-constrained applications.

[60]  arXiv:2110.08760 [pdf, other]
Title: Adapting Membership Inference Attacks to GNN for Graph Classification: Approaches and Implications
Comments: The short version of this paper has been published in the IEEE International Conference on Data Mining (ICDM) 2021
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Graph Neural Networks (GNNs) are widely adopted to analyse non-Euclidean data, such as chemical networks, brain networks, and social networks, modelling complex relationships and interdependency between objects. Recently, Membership Inference Attack (MIA) against GNNs raises severe privacy concerns, where training data can be leaked from trained GNN models. However, prior studies focus on inferring the membership of only the components in a graph, e.g., an individual node or edge. How to infer the membership of an entire graph record is yet to be explored.
In this paper, we take the first step in MIA against GNNs for graph-level classification. Our objective is to infer whether a graph sample has been used for training a GNN model. We present and implement two types of attacks, i.e., training-based attacks and threshold-based attacks from different adversarial capabilities. We perform comprehensive experiments to evaluate our attacks in seven real-world datasets using five representative GNN models. Both our attacks are shown effective and can achieve high performance, i.e., reaching over 0.7 attack F1 scores in most cases. Furthermore, we analyse the implications behind the MIA against GNNs. Our findings confirm that GNNs can be even more vulnerable to MIA than the models with non-graph structures. And unlike the node-level classifier, MIAs on graph-level classification tasks are more co-related with the overfitting level of GNNs rather than the statistic property of their training graphs.

[61]  arXiv:2110.08765 [pdf, other]
Title: Temporal Knowledge Graph Reasoning Triggered by Memories
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)

Inferring missing facts in temporal knowledge graphs is a critical task and has been widely explored. Extrapolation in temporal reasoning tasks is more challenging and gradually attracts the attention of researchers since no direct history facts for prediction. Previous works attempted to apply evolutionary representation learning to solve the extrapolation problem. However, these techniques do not explicitly leverage various time-aware attribute representations, i.e. the reasoning performance is significantly affected by the history length. To alleviate the time dependence when reasoning future missing facts, we propose a memory-triggered decision-making (MTDM) network, which incorporates transient memories, long-short-term memories, and deep memories. Specifically, the transient learning network considers transient memories as a static knowledge graph, and the time-aware recurrent evolution network learns representations through a sequence of recurrent evolution units from long-short-term memories. Each evolution unit consists of a structural encoder to aggregate edge information, a time encoder with a gating unit to update attribute representations of entities. MTDM utilizes the crafted residual multi-relational aggregator as the structural encoder to solve the multi-hop coverage problem. We also introduce the dissolution learning constraint for better understanding the event dissolution process. Extensive experiments demonstrate the MTDM alleviates the history dependence and achieves state-of-the-art prediction performance. Moreover, compared with the most advanced baseline, MTDM shows a faster convergence speed and training speed.

[62]  arXiv:2110.08770 [pdf, other]
Title: Towards Better Long-range Time Series Forecasting using Generative Adversarial Networks
Comments: 7 pages main paper with 4 pages appendix
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Accurate long-range forecasting of time series data is an important problem in many sectors, such as energy, healthcare, and finance. In recent years, Generative Adversarial Networks (GAN) have provided a revolutionary approach to many problems. However, the use of GAN to improve long-range time series forecasting remains relatively unexplored. In this paper, we utilize a Conditional Wasserstein GAN (CWGAN) and augment it with an error penalty term, leading to a new generative model which aims to generate high-quality synthetic time series data, called CWGAN-TS. By using such synthetic data, we develop a long-range forecasting approach, called Generative Forecasting (GenF), consisting of three components: (i) CWGAN-TS to generate synthetic data for the next few time steps. (ii) a predictor which makes long-range predictions based on generated and observed data. (iii) an information theoretic clustering (ITC) algorithm to better train the CWGAN-TS and the predictor. Our experimental results on three public datasets demonstrate that GenF significantly outperforms a diverse range of state-of-the-art benchmarks and classical approaches. In most cases, we find a 6% - 12% improvement in predictive performance (mean absolute error) and a 37% reduction in parameters compared to the best performing benchmark. Lastly, we conduct an ablation study to demonstrate the effectiveness of the CWGAN-TS and the ITC algorithm.

[63]  arXiv:2110.08771 [pdf]
Title: An LSTM-based Plagiarism Detection via Attention Mechanism and a Population-based Approach for Pre-Training Parameters with imbalanced Classes
Comments: 12 pages, The 28th International Conference on Neural Information Processing (ICONIP2021), BALI, Indonesia
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Plagiarism is one of the leading problems in academic and industrial environments, which its goal is to find the similar items in a typical document or source code. This paper proposes an architecture based on a Long Short-Term Memory (LSTM) and attention mechanism called LSTM-AM-ABC boosted by a population-based approach for parameter initialization. Gradient-based optimization algorithms such as back-propagation (BP) are widely used in the literature for learning process in LSTM, attention mechanism, and feed-forward neural network, while they suffer from some problems such as getting stuck in local optima. To tackle this problem, population-based metaheuristic (PBMH) algorithms can be used. To this end, this paper employs a PBMH algorithm, artificial bee colony (ABC), to moderate the problem. Our proposed algorithm can find the initial values for model learning in all LSTM, attention mechanism, and feed-forward neural network, simultaneously. In other words, ABC algorithm finds a promising point for starting BP algorithm. For evaluation, we compare our proposed algorithm with both conventional and population-based methods. The results clearly show that the proposed method can provide competitive performance.

[64]  arXiv:2110.08820 [pdf, other]
Title: On-board Fault Diagnosis of a Laboratory Mini SR-30 Gas Turbine Engine
Authors: Richa Singh
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Inspired by recent progress in machine learning, a data-driven fault diagnosis and isolation (FDI) scheme is explicitly developed for failure in the fuel supply system and sensor measurements of the laboratory gas turbine system. A passive approach of fault diagnosis is implemented where a model is trained using machine learning classifiers to detect a given set of fault scenarios in real-time on which it is trained. Towards the end, a comparative study is presented for well-known classification techniques, namely Support vector classifier, linear discriminant analysis, K-neighbor, and decision trees. Several simulation studies were carried out to demonstrate and illustrate the proposed fault diagnosis scheme's advantages, capabilities, and performance.

[65]  arXiv:2110.08826 [pdf, other]
Title: Exploring Deep Neural Networks on Edge TPU
Comments: 12 pages, 16 figures
Subjects: Machine Learning (cs.LG)

This paper explores the performance of Google's Edge TPU on feed forward neural networks. We consider Edge TPU as a hardware platform and explore different architectures of deep neural network classifiers, which traditionally has been a challenge to run on resource constrained edge devices. Based on the use of a joint-time-frequency data representation, also known as spectrogram, we explore the trade-off between classification performance and the energy consumed for inference. The energy efficiency of Edge TPU is compared with that of widely-used embedded CPU ARM Cortex-A53. Our results quantify the impact of neural network architectural specifications on the Edge TPU's performance, guiding decisions on the TPU's optimal operating point, where it can provide high classification accuracy with minimal energy consumption. Also, our evaluations highlight the crossover in performance between the Edge TPU and Cortex-A53, depending on the neural network specifications. Based on our analysis, we provide a decision chart to guide decisions on platform selection based on the model parameters and context.

[66]  arXiv:2110.08847 [pdf, other]
Title: Provable RL with Exogenous Distractors via Multistep Inverse Dynamics
Subjects: Machine Learning (cs.LG)

Many real-world applications of reinforcement learning (RL) require the agent to deal with high-dimensional observations such as those generated from a megapixel camera. Prior work has addressed such problems with representation learning, through which the agent can provably extract endogenous, latent state information from raw observations and subsequently plan efficiently. However, such approaches can fail in the presence of temporally correlated noise in the observations, a phenomenon that is common in practice. We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL. We start by establishing several negative results, by highlighting failure cases of prior representation learning based approaches. Then, we introduce the Predictive Path Elimination (PPE) algorithm, that learns a generalization of inverse dynamics and is provably sample and computationally efficient in EX-BMDPs when the endogenous state dynamics are near deterministic. The sample complexity of PPE depends polynomially on the size of the latent endogenous state space while not directly depending on the size of the observation space, nor the exogenous state space. We provide experiments on challenging exploration problems which show that our approach works empirically.

[67]  arXiv:2110.08851 [pdf, other]
Title: Self-Supervised Learning for Binary Networks by Joint Classifier Training
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Despite the great success of self-supervised learning with large floating point networks, such networks are not readily deployable to edge devices. To accelerate deployment of models to edge devices for various downstream tasks by unsupervised representation learning, we propose a self-supervised learning method for binary networks. In particular, we propose to use a randomly initialized classifier attached to a pretrained floating point feature extractor as targets and jointly train it with a binary network. For better training of the binary network, we propose a feature similarity loss, a dynamic balancing scheme of loss terms, and modified multi-stage training. We call our method as BSSL. Our empirical validations show that BSSL outperforms self-supervised learning baselines for binary networks in various downstream tasks and outperforms supervised pretraining in certain tasks.

[68]  arXiv:2110.08857 [pdf, other]
Title: Growing Representation Learning
Comments: 8 pages, 5 figures
Subjects: Machine Learning (cs.LG)

Machine learning continues to grow in popularity due to its ability to learn increasingly complex tasks. However, for many supervised models, the shift in a data distribution or the appearance of a new event can result in a severe decrease in model performance. Retraining a model from scratch with updated data can be resource intensive or impossible depending on the constraints placed on an organization or system. Continual learning methods attempt to adapt models to new classes instead of retraining. However, many of these methods do not have a detection method for new classes or make assumptions about the distribution of classes. In this paper, we develop an attention based Gaussian Mixture, called GMAT, that learns interpretable representations of data with or without labels. We incorporate this method with existing Neural Architecture Search techniques to develop an algorithm for detection new events for an optimal number of representations through an iterative process of training a growing. We show that our method is capable learning new representations of data without labels or assumptions about the distributions of labels. We additionally develop a method that allows our model to utilize labels to more accurately develop representations. Lastly, we show that our method can avoid catastrophic forgetting by replaying samples from learned representations.

[69]  arXiv:2110.08871 [pdf, ps, other]
Title: Noise-robust Clustering
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

This paper presents noise-robust clustering techniques in unsupervised machine learning. The uncertainty about the noise, consistency, and other ambiguities can become severe obstacles in data analytics. As a result, data quality, cleansing, management, and governance remain critical disciplines when working with Big Data. With this complexity, it is no longer sufficient to treat data deterministically as in a classical setting, and it becomes meaningful to account for noise distribution and its impact on data sample values. Classical clustering methods group data into "similarity classes" depending on their relative distances or similarities in the underlying space. This paper addressed this problem via the extension of classical $K$-means and $K$-medoids clustering over data distributions (rather than the raw data). This involves measuring distances among distributions using two types of measures: the optimal mass transport (also called Wasserstein distance, denoted $W_2$) and a novel distance measure proposed in this paper, the expected value of random variable distance (denoted ED). The presented distribution-based $K$-means and $K$-medoids algorithms cluster the data distributions first and then assign each raw data to the cluster of data's distribution.

[70]  arXiv:2110.08896 [pdf, other]
Title: Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.

[71]  arXiv:2110.08918 [pdf, other]
Title: Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions
Comments: Published in IEEE CIBCB 2021
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Drug representations have played an important role in cheminformatics. However, in the healthcare domain, drug representations have been underused relative to the rest of Electronic Health Record (EHR) data, due to the complexity of high dimensional drug representations and the lack of proper pipeline that will allow to convert clinical drugs to their representations. Time-varying vital signs, laboratory measurements, and related time-series signals are commonly used to predict clinical outcomes. In this work, we demonstrated that using clinical drug representations in addition to other clinical features has significant potential to increase the performance of mortality and length of stay (LOS) models. We evaluate the two different drug representation methods (Extended-Connectivity Fingerprint-ECFP and SMILES-Transformer embedding) on clinical outcome predictions. The results have shown that the proposed multimodal approach achieves substantial enhancement on clinical tasks over baseline models. Using clinical drug representations as additional features improve the LOS prediction for Area Under the Receiver Operating Characteristics (AUROC) around %6 and for Area Under Precision-Recall Curve (AUPRC) by around %5. Furthermore, for the mortality prediction task, there is an improvement of around %2 over the time series baseline in terms of AUROC and %3.5 in terms of AUPRC. The code for the proposed method is available at https://github.com/tanlab/MIMIC-III-Clinical-Drug-Representations.

[72]  arXiv:2110.08922 [pdf, other]
Title: Explaining generalization in deep learning: progress and fundamental limits
Comments: arXiv admin note: text overlap with arXiv:1902.04742
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data to zero error?
In the first part of the thesis, we will empirically study how training deep networks via stochastic gradient descent implicitly controls the networks' capacity. Subsequently, to show how this leads to better generalization, we will derive {\em data-dependent} {\em uniform-convergence-based} generalization bounds with improved dependencies on the parameter count.
Uniform convergence has in fact been the most widely used tool in deep learning literature, thanks to its simplicity and generality. Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization. In particular, we will show that in some example overparameterized settings, {\em any} uniform convergence bound will provide only a vacuous generalization bound.
With this realization in mind, in the last part of the thesis, we will change course and introduce an {\em empirical} technique to estimate generalization using unlabeled data. Our technique does not rely on any notion of uniform-convergece-based complexity and is remarkably precise. We will theoretically show why our technique enjoys such precision.
We will conclude by discussing how future work could explore novel ways to incorporate distributional assumptions in generalization bounds (such as in the form of unlabeled data) and explore other tools to derive bounds, perhaps by modifying uniform convergence or by developing completely new tools altogether.

[73]  arXiv:2110.08923 [pdf, ps, other]
Title: A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization
Comments: 24 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility. By leveraging the entropy regularization, our theoretical analysis shows that its Lagrangian dual function is smooth and the Lagrangian duality gap can be decomposed into the primal optimality gap and the constraint violation. Furthermore, we propose an accelerated dual-descent method for entropy-regularized CMDPs. We prove that our method achieves the global convergence rate $\widetilde{\mathcal{O}}(1/T)$ for both the optimality gap and the constraint violation for entropy-regularized CMDPs. A discussion about a linear convergence rate for CMDPs with a single constraint is also provided.

[74]  arXiv:2110.08932 [pdf, other]
Title: Poisoning Attacks on Fair Machine Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY)

Both fair machine learning and adversarial learning have been extensively studied. However, attacking fair machine learning models has received less attention. In this paper, we present a framework that seeks to effectively generate poisoning samples to attack both model accuracy and algorithmic fairness. Our attacking framework can target fair machine learning models trained with a variety of group based fairness notions such as demographic parity and equalized odds. We develop three online attacks, adversarial sampling , adversarial labeling, and adversarial feature modification. All three attacks effectively and efficiently produce poisoning samples via sampling, labeling, or modifying a fraction of training data in order to reduce the test accuracy. Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation, thus producing effective poisoning samples. Experiments on two real datasets demonstrate the effectiveness and efficiency of our framework.

[75]  arXiv:2110.08941 [pdf, other]
Title: Distributed Optimization using Heterogeneous Compute Systems
Authors: Vineeth S
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Hardware compute power has been growing at an unprecedented rate in recent years. The utilization of such advancements plays a key role in producing better results in less time -- both in academia and industry. However, merging the existing hardware with the latest hardware within the same ecosystem poses a challenging task. One of the key challenges, in this case, is varying compute power. In this paper, we consider the training of deep neural networks on a distributed system of workers with varying compute power. A naive implementation of synchronous distributed training will result in the faster workers waiting for the slowest worker to complete processing. To mitigate this issue, we propose to dynamically adjust the data assigned for each worker during the training. We assign each worker a partition of total data proportional to its computing power. Our experiments show that dynamically adjusting the data partition helps to improve the utilization of the system and significantly reduces the time taken for training. Code is available at the repository: \url{https://github.com/vineeths96/Heterogeneous-Systems}.

[76]  arXiv:2110.08944 [pdf, other]
Title: Developing a novel fair-loan-predictor through a multi-sensitive debiasing pipeline: DualFair
Comments: 10 pages, 2 figures, 3 tables, 1 pseudocode
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Machine learning (ML) models are increasingly used for high-stake applications that can greatly impact people's lives. Despite their use, these models have the potential to be biased towards certain social groups on the basis of race, gender, or ethnicity. Many prior works have attempted to mitigate this "model discrimination" by updating the training data (pre-processing), altering the model learning process (in-processing), or manipulating model output (post-processing). However, these works have not yet been extended to the realm of multi-sensitive parameters and sensitive options (MSPSO), where sensitive parameters are attributes that can be discriminated against (e.g race) and sensitive options are options within sensitive parameters (e.g black or white), thus giving them limited real-world usability. Prior work in fairness has also suffered from an accuracy-fairness tradeoff that prevents both the accuracy and fairness from being high. Moreover, previous literature has failed to provide holistic fairness metrics that work with MSPSO. In this paper, we solve all three of these problems by (a) creating a novel bias mitigation technique called DualFair and (b) developing a new fairness metric (i.e. AWI) that can handle MSPSO. Lastly, we test our novel mitigation method using a comprehensive U.S mortgage lending dataset and show that our classifier, or fair loan predictor, obtains better fairness and accuracy metrics than current state-of-the-art models.

[77]  arXiv:2110.08949 [pdf, other]
Title: Real-time Mortality Prediction Using MIMIC-IV ICU Data Via Boosted Nonparametric Hazards
Subjects: Machine Learning (cs.LG)

Electronic Health Record (EHR) systems provide critical, rich and valuable information at high frequency. One of the most exciting applications of EHR data is in developing a real-time mortality warning system with tools from survival analysis. However, most of the survival analysis methods used recently are based on (semi)parametric models using static covariates. These models do not take advantage of the information conveyed by the time-varying EHR data. In this work, we present an application of a highly scalable survival analysis method, BoXHED 2.0 to develop a real-time in-ICU mortality warning indicator based on the MIMIC IV data set. Importantly, BoXHED can incorporate time-dependent covariates in a fully nonparametric manner and is backed by theory. Our in-ICU mortality model achieves an AUC-PRC of 0.41 and AUC-ROC of 0.83 out of sample, demonstrating the benefit of real-time monitoring.

[78]  arXiv:2110.08984 [pdf, ps, other]
Title: Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs). In this setting, both the reward function and the transition kernel are linear with respect to the given feature maps and are allowed to vary over time, as long as their respective parameter variations do not exceed certain variation budgets. We propose the $\underline{\text{p}}$eriodically $\underline{\text{r}}$estarted $\underline{\text{o}}$ptimistic $\underline{\text{p}}$olicy $\underline{\text{o}}$ptimization algorithm (PROPO), which is an optimistic policy optimization algorithm with linear function approximation. PROPO features two mechanisms: sliding-window-based policy evaluation and periodic-restart-based policy improvement, which are tailored for policy optimization in a non-stationary environment. In addition, only utilizing the technique of sliding window, we propose a value-iteration algorithm. We establish dynamic upper bounds for the proposed methods and a matching minimax lower bound which shows the (near-) optimality of the proposed methods. To our best knowledge, PROPO is the first provably efficient policy optimization algorithm that handles non-stationarity.

[79]  arXiv:2110.08996 [pdf, other]
Title: Finding Everything within Random Binary Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up line of theoretical work provides justification of these findings by proving that slightly overparameterized neural networks, with commonly used continuous-valued random initializations can indeed be pruned to approximate any target network. In this work, we show that the amplitude of those random weights does not even matter. We prove that any target network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $\{\pm1\}$ weights that is only a polylogarithmic factor wider and deeper than the target network.

[80]  arXiv:2110.09008 [pdf, other]
Title: When Are Linear Stochastic Bandits Attackable?
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

We study adversarial attacks on linear stochastic bandits, a sequential decision making problem with many important applications in recommender systems, online advertising, medical treatment, and etc. By manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this observation, this paper studies the attackability of a $k$-armed linear bandit environment. We first provide a full necessity and sufficiency characterization of attackability based on the geometry of the context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the current environment is attackable, and if Yes, modifies the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed method.

[81]  arXiv:2110.09022 [pdf, other]
Title: Demystifying How Self-Supervised Features Improve Training from Noisy Labels
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The advancement of self-supervised learning (SSL) motivates researchers to apply SSL on other tasks such as learning with noisy labels. Recent literature indicates that methods built on SSL features can substantially improve the performance of learning with noisy labels. Nonetheless, the deeper reasons why (and how) SSL features benefit the training from noisy labels are less understood. In this paper, we study why and how self-supervised features help networks resist label noise using both theoretical analyses and numerical experiments. Our result shows that, given a quality encoder pre-trained from SSL, a simple linear layer trained by the cross-entropy loss is theoretically robust to symmetric label noise. Further, we provide insights for how knowledge distilled from SSL features can alleviate the over-fitting problem. We hope our work provides a better understanding for learning with noisy labels from the perspective of self-supervised learning and can potentially serve as a guideline for further research. Code is available at github.com/UCSC-REAL/SelfSup_NoisyLabel.

[82]  arXiv:2110.09023 [pdf, other]
Title: Utilizing Active Machine Learning for Quality Assurance: A Case Study of Virtual Car Renderings in the Automotive Industry
Comments: Hawaii International Conference on System Sciences 2022 (HICSS-55)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Computer-generated imagery of car models has become an indispensable part of car manufacturers' advertisement concepts. They are for instance used in car configurators to offer customers the possibility to configure their car online according to their personal preferences. However, human-led quality assurance faces the challenge to keep up with high-volume visual inspections due to the car models' increasing complexity. Even though the application of machine learning to many visual inspection tasks has demonstrated great success, its need for large labeled data sets remains a central barrier to using such systems in practice. In this paper, we propose an active machine learning-based quality assurance system that requires significantly fewer labeled instances to identify defective virtual car renderings without compromising performance. By employing our system at a German automotive manufacturer, start-up difficulties can be overcome, the inspection process efficiency can be increased, and thus economic advantages can be realized.

[83]  arXiv:2110.09035 [pdf, other]
Title: Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)

Improving the resilience of a network protects the system from natural disasters and malicious attacks. This is typically achieved by introducing new edges, which however may reach beyond the maximum number of connections a node could sustain. Many studies then resort to the degree-preserving operation of rewiring, which swaps existing edges $AC, BD$ to new edges $AB, CD$. A significant line of studies focuses on this technique for theoretical and practical results while leaving three limitations: network utility loss, local optimality, and transductivity. In this paper, we propose ResiNet, a reinforcement learning (RL)-based framework to discover resilient network topologies against various disasters and attacks. ResiNet is objective agnostic which allows the utility to be balanced by incorporating it into the objective function. The local optimality, typically seen in greedy algorithms, is addressed by casting the cumulative resilience gain into a sequential decision process of step-wise rewiring. The transductivity, which refers to the necessity to run a computationally intensive optimization for each input graph, is lifted by our variant of RL with auto-regressive permutation-invariant variable action space. ResiNet is armed by our technical innovation, Filtration enhanced GNN (FireGNN), which distinguishes graphs with minor differences. It is thus possible for ResiNet to capture local structure changes and adapt its decision among consecutive graphs, which is known to be infeasible for GNN. Extensive experiments demonstrate that with a small number of rewiring operations, ResiNet achieves a near-optimal resilience gain on multiple graphs while balancing the utility, with a large margin compared to existing approaches.

[84]  arXiv:2110.09039 [pdf, other]
Title: Capsule Graph Neural Networks with EM Routing
Authors: Yu Lei, Jing Zhang
Subjects: Machine Learning (cs.LG)

To effectively classify graph instances, graph neural networks need to have the capability to capture the part-whole relationship existing in a graph. A capsule is a group of neurons representing complicated properties of entities, which has shown its advantages in traditional convolutional neural networks. This paper proposed novel Capsule Graph Neural Networks that use the EM routing mechanism (CapsGNNEM) to generate high-quality graph embeddings. Experimental results on a number of real-world graph datasets demonstrate that the proposed CapsGNNEM outperforms nine state-of-the-art models in graph classification tasks.

[85]  arXiv:2110.09050 [pdf, other]
Title: Data Driven and Visualization based Strategization for University Rank Improvement using Decision Trees
Comments: 29 pages
Subjects: Machine Learning (cs.LG)

Annual ranking of higher educational institutes (HEIs) is a global phenomena and past research shows that they have significant impact on higher education landscape. In spite of criticisms regarding the goals, methodologies and outcomes of such ranking systems, previous studies reveal that most of the universities pay close attention to ranking results and look forward to improving their ranks. Generally, each ranking framework uses its own set of parameters and the data for individual metrics are condensed into a single final score for determining the rank thereby making it a complex multivariate problem. Maintaining a good rank and ascending in the rankings is a difficult task because it requires considerable resources, efforts and accurate planning. In this work, we show how exploratory data analysis (EDA) using correlation heatmaps and box plots can aid in understanding the broad trends in the ranking data, however it is challenging to make institutional decisions for rank improvements completely based on EDA. We present a novel idea of classifying the rankings data using Decision Tree (DT) based algorithms and retrieve decision paths for rank improvement using data visualization techniques. Using Laplace correction to the probability estimate, we quantify the amount of certainty attached with different decision paths obtained from interpretable DT models . The proposed methodology can aid HEIs to quantitatively asses the scope of improvement, adumbrate a fine-grained long-term action plan and prepare a suitable road-map.

[86]  arXiv:2110.09057 [pdf, other]
Title: Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization algorithms for machine learning. Existing heavy ball momentum is usually weighted by a uniform hyperparameter, which relies on excessive tuning. Moreover, the calibrated fixed hyperparameter may not lead to optimal performance. In this paper, to eliminate the effort for tuning the momentum-related hyperparameter, we propose a new adaptive momentum inspired by the optimal choice of the heavy ball momentum for quadratic optimization. Our proposed adaptive heavy ball momentum can improve stochastic gradient descent (SGD) and Adam. SGD and Adam with the newly designed adaptive momentum are more robust to large learning rates, converge faster, and generalize better than the baselines. We verify the efficiency of SGD and Adam with the new adaptive momentum on extensive machine learning benchmarks, including image classification, language modeling, and machine translation. Finally, we provide convergence guarantees for SGD and Adam with the proposed adaptive momentum.

[87]  arXiv:2110.09073 [pdf, other]
Title: Semi-asynchronous Hierarchical Federated Learning for Cooperative Intelligent Transportation Systems
Subjects: Machine Learning (cs.LG)

Cooperative Intelligent Transport System (C-ITS) is a promising network to provide safety, efficiency, sustainability, and comfortable services for automated vehicles and road infrastructures by taking advantages from participants. However, the components of C-ITS usually generate large amounts of data, which makes it difficult to explore data science. Currently, federated learning has been proposed as an appealing approach to allow users to cooperatively reap the benefits from trained participants. Therefore, in this paper, we propose a novel Semi-asynchronous Hierarchical Federated Learning (SHFL) framework for C-ITS that enables elastic edge to cloud model aggregation from data sensing. We further formulate a joint edge node association and resource allocation problem under the proposed SHFL framework to prevent personalities of heterogeneous road vehicles and achieve communication-efficiency. To deal with our proposed Mixed integer nonlinear programming (MINLP) problem, we introduce a distributed Alternating Direction Method of Multipliers (ADMM)-Block Coordinate Update (BCU) algorithm. With this algorithm, a tradeoff between training accuracy and transmission latency has been derived. Numerical results demonstrate the advantages of the proposed algorithm in terms of training overhead and model performance.

[88]  arXiv:2110.09132 [pdf, other]
Title: EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Distributed data-parallel training has been widely used for natural language processing (NLP) neural network models. However, the embedding tables in NLP models, holding a large portion of parameters and bringing dramatic sparsity in communication, make it a big challenge to efficiently scale the distributed training. Current distributed training frameworks mainly concentrate on dense models but neglect the sparsity of NLP models, resulting in significant communication overhead and relatively poor scalability.
In this paper, we propose EmbRace, an efficient communication framework designed to accelerate sparse communication of distributed NLP model training. EmbRace introduces Sparsity-aware Hybrid Communication, which combines AlltoAll and AllReduce to optimize the communication overhead for sparse and dense data in NLP models. EmbRace further introduces a 2D Communication Scheduling approach to thoroughly overlap communication with computation by optimizing model computation procedure, relaxing the dependency of embeddings, and scheduling communication with a priority queue.
We implement EmbRace based on PyTorch and Horovod, and conduct comprehensive evaluations with four representative NLP models on two high-performance GPU clusters. Experimental results show that EmbRace achieves up to 30.66X speedup on 16 GPUs clusters among four popular distributed training baselines.

[89]  arXiv:2110.09133 [pdf, other]
Title: Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits
Comments: 10+15 pages. To be published in the proceedings of NeurIPS 2021
Subjects: Machine Learning (cs.LG)

In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem.

[90]  arXiv:2110.09138 [pdf, other]
Title: State-Space Constraints Improve the Generalization of the Differentiable Neural Computer in some Algorithmic Tasks
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Memory-augmented neural networks (MANNs) can solve algorithmic tasks like sorting. However, they often do not generalize to lengths of input sequences not seen in the training phase. Therefore, we introduce two approaches constraining the state-space of the network controller to improve the generalization to out-of-distribution-sized input sequences: state compression and state regularization. We show that both approaches can improve the generalization capability of a particular type of MANN, the differentiable neural computer (DNC), and compare our approaches to a stateful and a stateless controller on a set of algorithmic tasks. Furthermore, we show that especially the combination of both approaches can enable a pre-trained DNC to be extended post hoc with a larger memory. Thus, our introduced approaches allow to train a DNC using shorter input sequences and thus save computational resources. Moreover, we observed that the capability for generalization is often accompanied by loop structures in the state-space, which could correspond to looping constructs in algorithms.

[91]  arXiv:2110.09140 [pdf, other]
Title: Learning Prototype-oriented Set Representations for Meta-Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Learning from set-structured data is a fundamental problem that has recently attracted increasing attention, where a series of summary networks are introduced to deal with the set input. In fact, many meta-learning problems can be treated as set-input tasks. Most existing summary networks aim to design different architectures for the input set in order to enforce permutation invariance. However, scant attention has been paid to the common cases where different sets in a meta-distribution are closely related and share certain statistical properties. Viewing each set as a distribution over a set of global prototypes, this paper provides a novel optimal transport (OT) based way to improve existing summary networks. To learn the distribution over the global prototypes, we minimize its OT distance to the set empirical distribution over data points, providing a natural unsupervised way to improve the summary network. Since our plug-and-play framework can be applied to many meta-learning problems, we further instantiate it to the cases of few-shot classification and implicit meta generative modeling. Extensive experiments demonstrate that our framework significantly improves the existing summary networks on learning more powerful summary statistics from sets and can be successfully integrated into metric-based few-shot classification and generative modeling applications, providing a promising tool for addressing set-input and meta-learning problems.

[92]  arXiv:2110.09163 [pdf, other]
Title: A Dimensionality Reduction Approach for Convolutional Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)

The focus of this paper is the application of classical model order reduction techniques, such as Active Subspaces and Proper Orthogonal Decomposition, to Deep Neural Networks. We propose a generic methodology to reduce the number of layers of a pre-trained network by combining the aforementioned techniques for dimensionality reduction with input-output mappings, such as Polynomial Chaos Expansion and Feedforward Neural Networks. The necessity of compressing the architecture of an existing Convolutional Neural Network is motivated by its application in embedded systems with specific storage constraints. Our experiment shows that the reduced nets obtained can achieve a level of accuracy similar to the original Convolutional Neural Network under examination, while saving in memory allocation.

[93]  arXiv:2110.09164 [pdf, ps, other]
Title: Speeding-Up Back-Propagation in DNN: Approximate Outer Product with Memory
Comments: 5 pages, 3 figures
Subjects: Machine Learning (cs.LG)

In this paper, an algorithm for approximate evaluation of back-propagation in DNN training is considered, which we term Approximate Outer Product Gradient Descent with Memory (Mem-AOP-GD). The Mem-AOP-GD algorithm implements an approximation of the stochastic gradient descent by considering only a subset of the outer products involved in the matrix multiplications that encompass backpropagation. In order to correct for the inherent bias in this approximation, the algorithm retains in memory an accumulation of the outer products that are not used in the approximation. We investigate the performance of the proposed algorithm in terms of DNN training loss under two design parameters: (i) the number of outer products used for the approximation, and (ii) the policy used to select such outer products. We experimentally show that significant improvements in computational complexity as well as accuracy can indeed be obtained through Mem-AOPGD.

[94]  arXiv:2110.09182 [pdf, other]
Title: Graph Partner Neural Networks for Semi-Supervised Learning on Graphs
Subjects: Machine Learning (cs.LG)

Graph Convolutional Networks (GCNs) are powerful for processing graph-structured data and have achieved state-of-the-art performance in several tasks such as node classification, link prediction, and graph classification. However, it is inevitable for deep GCNs to suffer from an over-smoothing issue that the representations of nodes will tend to be indistinguishable after repeated graph convolution operations. To address this problem, we propose the Graph Partner Neural Network (GPNN) which incorporates a de-parameterized GCN and a parameter-sharing MLP. We provide empirical and theoretical evidence to demonstrate the effectiveness of the proposed MLP partner on tackling over-smoothing while benefiting from appropriate smoothness. To further tackle over-smoothing and regulate the learning process, we introduce a well-designed consistency contrastive loss and KL divergence loss. Besides, we present a graph enhancement technique to improve the overall quality of edges in graphs. While most GCNs can work with shallow architecture only, GPNN can obtain better results through increasing model depth. Experiments on various node classification tasks have demonstrated the state-of-the-art performance of GPNN. Meanwhile, extensive ablation studies are conducted to investigate the contributions of each component in tackling over-smoothing and improving performance.

[95]  arXiv:2110.09192 [pdf, other]
Title: Learning Optimal Conformal Classifiers
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Methodology (stat.ME); Machine Learning (stat.ML)

Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in high-stake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's probability estimates to predict confidence sets containing the true class with a user-specified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper end-to-end. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on mini-batches during training. We show that CT outperforms state-of-the-art CP methods for classification by reducing the average confidence set size (inefficiency). Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.

[96]  arXiv:2110.09193 [pdf, other]
Title: Topologically Regularized Data Embeddings
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Unsupervised feature learning often finds low-dimensional embeddings that capture the structure of complex data. For tasks for which expert prior topological knowledge is available, incorporating this into the learned representation may lead to higher quality embeddings. For example, this may help one to embed the data into a given number of clusters, or to accommodate for noise that prevents one from deriving the distribution of the data over the model directly, which can then be learned more effectively. However, a general tool for integrating different prior topological knowledge into embeddings is lacking. Although differentiable topology layers have been recently developed that can (re)shape embeddings into prespecified topological models, they have two important limitations for representation learning, which we address in this paper. First, the currently suggested topological losses fail to represent simple models such as clusters and flares in a natural manner. Second, these losses neglect all original structural (such as neighborhood) information in the data that is useful for learning. We overcome these limitations by introducing a new set of topological losses, and proposing their usage as a way for topologically regularizing data embeddings to naturally represent a prespecified model. We include thorough experiments on synthetic and real data that highlight the usefulness and versatility of this approach, with applications ranging from modeling high-dimensional single cell data, to graph embedding.

[97]  arXiv:2110.09196 [pdf, other]
Title: MDP Abstraction with Successor Features
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Abstraction plays an important role for generalisation of knowledge and skills, and is key to sample efficient learning and planning. For many complex problems an abstract plan can be formed first, which is then instantiated by filling in the necessary low-level details. Often, such abstract plans generalize well to related new problems. We study abstraction in the context of reinforcement learning, in which agents may perform state or temporal abstractions. Temporal abstractions aka options represent temporally-extended actions in the form of option policies. However, typically acquired option policies cannot be directly transferred to new environments due to changes in the state space or transition dynamics. Furthermore, many existing state abstraction schemes ignore the correlation between state and temporal abstraction. In this work, we propose successor abstraction, a novel abstraction scheme building on successor features. This includes an algorithm for encoding and instantiation of abstract options across different environments, and a state abstraction mechanism based on the abstract options. Our successor abstraction allows us to learn abstract environment models with semantics that are transferable across different environments through encoding and instantiation of abstract options. Empirically, we achieve better transfer and improved performance on a set of benchmark tasks as compared to relevant state of the art baselines.

[98]  arXiv:2110.09208 [pdf, other]
Title: Correlation-based Discovery of Disease Patterns for Syndromic Surveillance
Subjects: Machine Learning (cs.LG)

Early outbreak detection is a key aspect in the containment of infectious diseases, as it enables the identification and isolation of infected individuals before the disease can spread to a larger population. Instead of detecting unexpected increases of infections by monitoring confirmed cases, syndromic surveillance aims at the detection of cases with early symptoms, which allows a more timely disclosure of outbreaks. However, the definition of these disease patterns is often challenging, as early symptoms are usually shared among many diseases and a particular disease can have several clinical pictures in the early phase of an infection. To support epidemiologists in the process of defining reliable disease patterns, we present a novel, data-driven approach to discover such patterns in historic data. The key idea is to take into account the correlation between indicators in a health-related data source and the reported number of infections in the respective geographic region. In an experimental evaluation, we use data from several emergency departments to discover disease patterns for three infectious diseases. Our results suggest that the proposed approach is able to find patterns that correlate with the reported infections and often identifies indicators that are related to the respective diseases.

[99]  arXiv:2110.09212 [pdf, other]
Title: Noise-Resilient Ensemble Learning using Evidence Accumulation Clustering
Comments: 12 pages, submitted and accepted to ANTIC-2021 (International Conference on Advanced Network Technologies and Intelligent Computing)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Ensemble Learning methods combine multiple algorithms performing the same task to build a group with superior quality. These systems are well adapted to the distributed setup, where each peer or machine of the network hosts one algorithm and communicate its results to its peers. Ensemble learning methods are naturally resilient to the absence of several peers thanks to the ensemble redundancy. However, the network can be corrupted, altering the prediction accuracy of a peer, which has a deleterious effect on the ensemble quality. In this paper, we propose a noise-resilient ensemble classification method, which helps to improve accuracy and correct random errors. The approach is inspired by Evidence Accumulation Clustering , adapted to classification ensembles. We compared it to the naive voter model over four multi-class datasets. Our model showed a greater resilience, allowing us to recover prediction under a very high noise level. In addition as the method is based on the evidence accumulation clustering, our method is highly flexible as it can combines classifiers with different label definitions.

[100]  arXiv:2110.09246 [pdf, other]
Title: Single Layer Predictive Normalized Maximum Likelihood for Out-of-Distribution Detection
Comments: NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Detecting out-of-distribution (OOD) samples is vital for developing machine learning based models for critical safety systems. Common approaches for OOD detection assume access to some OOD samples during training which may not be available in a real-life scenario. Instead, we utilize the {\em predictive normalized maximum likelihood} (pNML) learner, in which no assumptions are made on the tested input. We derive an explicit expression of the pNML and its generalization error, denoted as the {\em regret}, for a single layer neural network (NN). We show that this learner generalizes well when (i) the test vector resides in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, or (ii) the test sample is far from the decision boundary. Furthermore, we describe how to efficiently apply the derived pNML regret to any pretrained deep NN, by employing the explicit pNML for the last layer, followed by the softmax function. Applying the derived regret to deep NN requires neither additional tunable parameters nor extra data. We extensively evaluate our approach on 74 OOD detection benchmarks using DenseNet-100, ResNet-34, and WideResNet-40 models trained with CIFAR-100, CIFAR-10, SVHN, and ImageNet-30 showing a significant improvement of up to 15.6\% over recent leading methods.

[101]  arXiv:2110.09274 [pdf, other]
Title: pygrank: A Python Package for Graph Node Ranking
Comments: 6 pages, 1 figure, 2 tables, 3 code snippets
Subjects: Machine Learning (cs.LG)

We introduce pygrank, an open source Python package to define, run and evaluate node ranking algorithms. We provide object-oriented and extensively unit-tested algorithm components, such as graph filters, post-processors, measures, benchmarks and online tuning. Computations can be delegated to numpy, tensorflow or pytorch backends and fit in back-propagation pipelines. Classes can be combined to define interoperable complex algorithms. Within the context of this paper we compare the package with related alternatives and demonstrate its flexibility and ease of use with code examples.

[102]  arXiv:2110.09276 [pdf, other]
Title: Natural Attribute-based Shift Detection
Subjects: Machine Learning (cs.LG)

Despite the impressive performance of deep networks in vision, language, and healthcare, unpredictable behaviors on samples from the distribution different than the training distribution cause severe problems in deployment. For better reliability of neural-network-based classifiers, we define a new task, natural attribute-based shift (NAS) detection, to detect the samples shifted from the training distribution by some natural attribute such as age of subjects or brightness of images. Using the natural attributes present in existing datasets, we introduce benchmark datasets in vision, language, and medical for NAS detection. Further, we conduct an extensive evaluation of prior representative out-of-distribution (OOD) detection methods on NAS datasets and observe an inconsistency in their performance. To understand this, we provide an analysis on the relationship between the location of NAS samples in the feature space and the performance of distance- and confidence-based OOD detection methods. Based on the analysis, we split NAS samples into three categories and further suggest a simple modification to the training objective to obtain an improved OOD detection method that is capable of detecting samples from all NAS categories.

[103]  arXiv:2110.09295 [pdf, other]
Title: Fair Tree Learning
Subjects: Machine Learning (cs.LG)

When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards some sensitive attribute, like gender or race, induced from biased data. Various hybrid optimisation criteria exist which combine classification performance with a fairness metric. However, while the threshold-free ROC-AUC is the standard for measuring traditional classification model performance, current fair decision tree methods only optimise for a fixed threshold on both the classification task as well as the fairness metric. Moreover, current tree learning frameworks do not allow for fair treatment with respect to multiple categories or multiple sensitive attributes. Lastly, the end-users of a fair model should be able to balance fairness and classification performance according to their specific ethical, legal, and societal needs. In this paper we address these shortcomings by proposing a threshold-independent fairness metric termed uniform demographic parity, and a derived splitting criterion entitled SCAFF -- Splitting Criterion AUC for Fairness -- towards fair decision tree learning, which extends to bagged and boosted frameworks. Compared to the state-of-the-art, our method provides three main advantages: (1) classifier performance and fairness are defined continuously instead of relying upon an, often arbitrary, decision threshold; (2) it leverages multiple sensitive attributes simultaneously, of which the values may be multicategorical; and (3) the unavoidable performance-fairness trade-off is tunable during learning. In our experiments, we demonstrate how SCAFF attains high predictive performance towards the class label and low discrimination with respect to binary, multicategorical, and multiple sensitive attributes, further substantiating our claims.

[104]  arXiv:2110.09302 [pdf, other]
Title: A Prior Guided Adversarial Representation Learning and Hypergraph Perceptual Network for Predicting Abnormal Connections of Alzheimer's Disease
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Alzheimer's disease is characterized by alterations of the brain's structural and functional connectivity during its progressive degenerative processes. Existing auxiliary diagnostic methods have accomplished the classification task, but few of them can accurately evaluate the changing characteristics of brain connectivity. In this work, a prior guided adversarial representation learning and hypergraph perceptual network (PGARL-HPN) is proposed to predict abnormal brain connections using triple-modality medical images. Concretely, a prior distribution from the anatomical knowledge is estimated to guide multimodal representation learning using an adversarial strategy. Also, the pairwise collaborative discriminator structure is further utilized to narrow the difference of representation distribution. Moreover, the hypergraph perceptual network is developed to effectively fuse the learned representations while establishing high-order relations within and between multimodal images. Experimental results demonstrate that the proposed model outperforms other related methods in analyzing and predicting Alzheimer's disease progression. More importantly, the identified abnormal connections are partly consistent with the previous neuroscience discoveries. The proposed model can evaluate characteristics of abnormal brain connections at different stages of Alzheimer's disease, which is helpful for cognitive disease study and early treatment.

[105]  arXiv:2110.09304 [pdf, other]
Title: Prediction of Occurrence of Extreme Events using Machine Learning
Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)

Machine learning models play a vital role in the prediction task in several fields of study. In this work, we utilize the ability of machine learning algorithms for the prediction of occurrence of extreme events in a nonlinear mechanical system. Extreme events are rare events which occur ubiquitously in nature. We consider four machine learning models, namely Logistic Regression, Support Vector Machine, Random Forest and Multi-Layer Perceptron in our prediction task. We train these four machine learning models using training set data and compute the performance of each model using the test set data. We show that Multi-Layer Perceptron model performs better among the four models in the prediction of extreme events in the considered system. The persistent behaviour of the considered machine learning models are cross-checked with randomly shuffled training set and test set data.

[106]  arXiv:2110.09327 [pdf, other]
Title: Self-Supervised Representation Learning: Introduction, Advances and Challenges
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets, thus alleviating the annotation bottleneck that is one of the main barriers to practical deployment of deep learning today. These methods have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pre-training alternatives across a variety of data modalities including image, video, sound, text and graphs. This article introduces this vibrant area including key concepts, the four main families of approach and associated state of the art, and how self-supervised methods are applied to diverse modalities of data. We further discuss practical considerations including workflows, representation transferability, and compute cost. Finally, we survey the major open challenges in the field that provide fertile ground for future work.

[107]  arXiv:2110.09344 [pdf, other]
Title: Intrusion-Free Graph Mixup
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We present a simple and yet effective interpolation-based regularization technique to improve the generalization of Graph Neural Networks (GNNs). We leverage the recent advances in Mixup regularizer for vision and text, where random sample pairs and their labels are interpolated to create synthetic samples for training. Unlike images or natural sentences, which embrace a grid or linear sequence format, graphs have arbitrary structure and topology, which play a vital role on the semantic information of a graph. Consequently, even simply deleting or adding one edge from a graph can dramatically change its semantic meanings. This makes interpolating graph inputs very challenging because mixing random graph pairs may naturally create graphs with identical structure but with different labels, causing the manifold intrusion issue. To cope with this obstacle, we propose the first input mixing schema for Mixup on graph. We theoretically prove that our mixing strategy can recover the source graphs from the mixed graph, and guarantees that the mixed graphs are manifold intrusion free. We also empirically show that our method can effectively regularize the graph classification learning, resulting in superior predictive accuracy over popular graph augmentation baselines.

[108]  arXiv:2110.09356 [pdf, other]
Title: Towards Federated Bayesian Network Structure Learning with Continuous Optimization
Comments: 16 pages; 5 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Traditionally, Bayesian network structure learning is often carried out at a central site, in which all data is gathered. However, in practice, data may be distributed across different parties (e.g., companies, devices) who intend to collectively learn a Bayesian network, but are not willing to disclose information related to their data owing to privacy or security concerns. In this work, we present a cross-silo federated learning approach to estimate the structure of Bayesian network from data that is horizontally partitioned across different parties. We develop a distributed structure learning method based on continuous optimization, using the alternating direction method of multipliers (ADMM), such that only the model parameters have to be exchanged during the optimization process. We demonstrate the flexibility of our approach by adopting it for both linear and nonlinear cases. Experimental results on synthetic and real datasets show that it achieves an improved performance over the other methods, especially when there is a relatively large number of clients and each has a limited sample size.

[109]  arXiv:2110.09410 [pdf, other]
Title: Exploiting Domain-Specific Features to Enhance Domain Generalization
Comments: 25 pages, 6 tables, 11 figures, published at Advances in Neural Information Processing Systems (NeurIPS), 2021
Subjects: Machine Learning (cs.LG)

Domain Generalization (DG) aims to train a model, from multiple observed source domains, in order to perform well on unseen target domains. To obtain the generalization capability, prior DG approaches have focused on extracting domain-invariant information across sources to generalize on target domains, while useful domain-specific information which strongly correlates with labels in individual domains and the generalization to target domains is usually ignored. In this paper, we propose meta-Domain Specific-Domain Invariant (mDSDI) - a novel theoretically sound framework that extends beyond the invariance view to further capture the usefulness of domain-specific information. Our key insight is to disentangle features in the latent space while jointly learning both domain-invariant and domain-specific features in a unified framework. The domain-specific representation is optimized through the meta-learning framework to adapt from source domains, targeting a robust generalization on unseen domains. We empirically show that mDSDI provides competitive results with state-of-the-art techniques in DG. A further ablation study with our generated dataset, Background-Colored-MNIST, confirms the hypothesis that domain-specific is essential, leading to better results when compared with only using domain-invariant.

[110]  arXiv:2110.09419 [pdf, other]
Title: Compositional Attention: Disentangling Search and Retrieval
Subjects: Machine Learning (cs.LG)

Multi-head, key-value attention is the backbone of the widely successful Transformer model and its variants. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interactions, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Importantly, standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner through an additional soft competition stage between the query-key combination and value pairing. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval, and can easily be implemented in lieu of standard attention heads in any network architecture.

[111]  arXiv:2110.09436 [pdf]
Title: Early Diagnostic Prediction of Covid-19 using Gradient-Boosting Machine Model
Authors: Satvik Tripathi
Comments: Presented at the Drexel Society of Artificial Intelligence Research Conference, 2021 (arXiv:2110.05263)
Subjects: Machine Learning (cs.LG)

With the huge spike in the COVID-19 cases across the globe and reverse transcriptase-polymerase chain reaction (RT-PCR) test remains a key component for rapid and accurate detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In recent months there has been an acute shortage of medical supplies in developing countries, especially a lack of RT-PCR testing resulting in delayed patient care and high infection rates. We present a gradient-boosting machine model that predicts the diagnostics result of SARS-CoV- 2 in an RT-PCR test by utilizing eight binary features. We used the publicly available nationwide dataset released by the Israeli Ministry of Health.

[112]  arXiv:2110.09442 [pdf, other]
Title: Goal Agnostic Planning using Maximum Likelihood Paths in Hypergraph World Models
Comments: 58 pages, 27 figures, comments
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In this paper, we present a hypergraph--based machine learning algorithm, a datastructure--driven maintenance method, and a planning algorithm based on a probabilistic application of Dijkstra's algorithm. Together, these form a goal agnostic automated planning engine for an autonomous learning agent which incorporates beneficial properties of both classical Machine Learning and traditional Artificial Intelligence. We prove that the algorithm determines optimal solutions within the problem space, mathematically bound learning performance, and supply a mathematical model analyzing system state progression through time yielding explicit predictions for learning curves, goal achievement rates, and response to abstractions and uncertainty. To validate performance, we exhibit results from applying the agent to three archetypal planning problems, including composite hierarchical domains, and highlight empirical findings which illustrate properties elucidated in the analysis.

[113]  arXiv:2110.09443 [pdf, other]
Title: Beltrami Flow and Neural Diffusion on Graphs
Comments: 21 pages, 5 figures. Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS) 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We propose a novel class of graph neural networks based on the discretised Beltrami flow, a non-Euclidean diffusion PDE. In our model, node features are supplemented with positional encodings derived from the graph topology and jointly evolved by the Beltrami flow, producing simultaneously continuous feature learning and topology evolution. The resulting model generalises many popular graph neural networks and achieves state-of-the-art results on several benchmarks.

[114]  arXiv:2110.09446 [pdf, other]
Title: Squeezing Backbone Feature Distributions to the Max for Efficient Few-Shot Learning
Comments: Init commit. arXiv admin note: text overlap with arXiv:2006.03806
Subjects: Machine Learning (cs.LG)

Few-shot classification is a challenging problem due to the uncertainty caused by using few labelled samples. In the past few years, many methods have been proposed with the common aim of transferring knowledge acquired on a previously solved task, what is often achieved by using a pretrained feature extractor. Following this vein, in this paper we propose a novel transfer-based method which aims at processing the feature vectors so that they become closer to Gaussian-like distributions, resulting in increased accuracy. In the case of transductive few-shot learning where unlabelled test samples are available during training, we also introduce an optimal-transport inspired algorithm to boost even further the achieved performance. Using standardized vision benchmarks, we show the ability of the proposed methodology to achieve state-of-the-art accuracy with various datasets, backbone architectures and few-shot settings.

[115]  arXiv:2110.09452 [pdf, other]
Title: SPAP: Simultaneous Demand Prediction and Planning for Electric Vehicle Chargers in a New City
Subjects: Machine Learning (cs.LG)

For a new city that is committed to promoting Electric Vehicles (EVs), it is significant to plan the public charging infrastructure where charging demands are high. However, it is difficult to predict charging demands before the actual deployment of EV chargers for lack of operational data, resulting in a deadlock. A direct idea is to leverage the urban transfer learning paradigm to learn the knowledge from a source city, then exploit it to predict charging demands, and meanwhile determine locations and amounts of slow/fast chargers for charging stations in the target city. However, the demand prediction and charger planning depend on each other, and it is required to re-train the prediction model to eliminate the negative transfer between cities for each varied charger plan, leading to the unacceptable time complexity. To this end, we propose the concept and an effective solution of Simultaneous Demand Prediction And Planning (SPAP): discriminative features are extracted from multi-source data, and fed into an Attention-based Spatial-Temporal City Domain Adaptation Network (AST-CDAN) for cross-city demand prediction; a novel Transfer Iterative Optimization (TIO) algorithm is designed for charger planning by iteratively utilizing AST-CDAN and a charger plan fine-tuning algorithm. Extensive experiments on real-world datasets collected from three cities in China validate the effectiveness and efficiency of SPAP. Specially, SPAP improves at most 72.5% revenue compared with the real-world charger deployment.

[116]  arXiv:2110.09467 [pdf, other]
Title: On Predictive Explanation of Data Anomalies
Comments: 12 pages
Subjects: Machine Learning (cs.LG)

Numerous algorithms have been proposed for detecting anomalies (outliers, novelties) in an unsupervised manner. Unfortunately, it is not trivial, in general, to understand why a given sample (record) is labelled as an anomaly and thus diagnose its root causes. We propose the following reduced-dimensionality, surrogate model approach to explain detector decisions: approximate the detection model with another one that employs only a small subset of features. Subsequently, samples can be visualized in this low-dimensionality space for human understanding. To this end, we develop PROTEUS, an AutoML pipeline to produce the surrogate model, specifically designed for feature selection on imbalanced datasets. The PROTEUS surrogate model can not only explain the training data, but also the out-of-sample (unseen) data. In other words, PROTEUS produces predictive explanations by approximating the decision surface of an unsupervised detector. PROTEUS is designed to return an accurate estimate of out-of-sample predictive performance to serve as a metric of the quality of the approximation. Computational experiments confirm the efficacy of PROTEUS to produce predictive explanations for different families of detectors and to reliably estimate their predictive performance in unseen data. Unlike several ad-hoc feature importance methods, PROTEUS is robust to high-dimensional data.

[117]  arXiv:2110.09468 [pdf, other]
Title: Improving Robustness using Generated Data
Comments: Accepted at NeurIPS 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Recent work argues that robust training requires substantially larger datasets than those required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a sizable robust-accuracy gap between models trained solely on data from the original training set and those trained with additional data extracted from the "80 Million Tiny Images" dataset (TI-80M). In this paper, we explore how generative models trained solely on the original training set can be leveraged to artificially increase the size of the original training set and improve adversarial robustness to $\ell_p$ norm-bounded perturbations. We identify the sufficient conditions under which incorporating additional generated data can improve robustness, and demonstrate that it is possible to significantly reduce the robust-accuracy gap to models trained with additional real data. Surprisingly, we even show that even the addition of non-realistic random data (generated by Gaussian sampling) can improve robustness. We evaluate our approach on CIFAR-10, CIFAR-100, SVHN and TinyImageNet against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements in robust accuracy compared to previous state-of-the-art methods. Against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our models achieve 66.10% and 33.49% robust accuracy on CIFAR-10 and CIFAR-100, respectively (improving upon the state-of-the-art by +8.96% and +3.29%). Against $\ell_2$ norm-bounded perturbations of size $\epsilon = 128/255$, our model achieves 78.31% on CIFAR-10 (+3.81%). These results beat most prior works that use external data.

[118]  arXiv:2110.09476 [pdf, other]
Title: Recovery Guarantees for Kernel-based Clustering under Non-parametric Mixture Models
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Despite the ubiquity of kernel-based clustering, surprisingly few statistical guarantees exist beyond settings that consider strong structural assumptions on the data generation process. In this work, we take a step towards bridging this gap by studying the statistical performance of kernel-based clustering algorithms under non-parametric mixture models. We provide necessary and sufficient separability conditions under which these algorithms can consistently recover the underlying true clustering. Our analysis provides guarantees for kernel clustering approaches without structural assumptions on the form of the component distributions. Additionally, we establish a key equivalence between kernel-based data-clustering and kernel density-based clustering. This enables us to provide consistency guarantees for kernel-based estimators of non-parametric mixture models. Along with theoretical implications, this connection could have practical implications, including in the systematic choice of the bandwidth of the Gaussian kernel in the context of clustering.

[119]  arXiv:2110.09485 [pdf, other]
Title: Learning in High Dimension Always Amounts to Extrapolation
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample $x$ whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when $x$ falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well because of their ability to correctly interpolate training data. A second (mis)conception is that interpolation happens throughout tasks and datasets, in fact, many intuitions and theories rely on that assumption. We empirically and theoretically argue against those two points and demonstrate that on any high-dimensional ($>$100) dataset, interpolation almost surely never happens. Those results challenge the validity of our current interpolation/extrapolation definition as an indicator of generalization performances.

[120]  arXiv:2110.09495 [pdf, other]
Title: Protecting Anonymous Speech: A Generative Adversarial Network Methodology for Removing Stylistic Indicators in Text
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)

With Internet users constantly leaving a trail of text, whether through blogs, emails, or social media posts, the ability to write and protest anonymously is being eroded because artificial intelligence, when given a sample of previous work, can match text with its author out of hundreds of possible candidates. Existing approaches to authorship anonymization, also known as authorship obfuscation, often focus on protecting binary demographic attributes rather than identity as a whole. Even those that do focus on obfuscating identity require manual feedback, lose the coherence of the original sentence, or only perform well given a limited subset of authors. In this paper, we develop a new approach to authorship anonymization by constructing a generative adversarial network that protects identity and optimizes for three different losses corresponding to anonymity, fluency, and content preservation. Our fully automatic method achieves comparable results to other methods in terms of content preservation and fluency, but greatly outperforms baselines in regards to anonymization. Moreover, our approach is able to generalize well to an open-set context and anonymize sentences from authors it has not encountered before.

[121]  arXiv:2110.09506 [pdf, other]
Title: MEMO: Test Time Robustness via Adaptation and Augmentation
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

While deep neural networks can attain good accuracy on in-distribution test points, many applications require robustness even in the face of unexpected perturbations in the input, changes in the domain, or other sources of distribution shift. We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions, such as access to multiple test points, that prevent widespread adoption. In this work, we aim to study and devise methods that make no assumptions about the model training process and are broadly applicable at test time. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable: when presented with a test example, perform different data augmentations on the data point, and then adapt (all of) the model parameters by minimizing the entropy of the model's average, or marginal, output distribution across the augmentations. Intuitively, this objective encourages the model to make the same prediction across different augmentations, thus enforcing the invariances encoded in these augmentations, while also maintaining confidence in its predictions. In our experiments, we demonstrate that this approach consistently improves robust ResNet and vision transformer models, achieving accuracy gains of 1-8% over standard model evaluation and also generally outperforming prior augmentation and adaptation strategies. We achieve state-of-the-art results for test shifts caused by image corruptions (ImageNet-C), renditions of common objects (ImageNet-R), and, among ResNet-50 models, adversarially chosen natural examples (ImageNet-A).

[122]  arXiv:2110.09507 [pdf, other]
Title: Provable Hierarchy-Based Meta-Reinforcement Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Hierarchical reinforcement learning (HRL) has seen widespread interest as an approach to tractable learning of complex modular behaviors. However, existing work either assume access to expert-constructed hierarchies, or use hierarchy-learning heuristics with no provable guarantees. To address this gap, we analyze HRL in the meta-RL setting, where a learner learns latent hierarchical structure during meta-training for use in a downstream task. We consider a tabular setting where natural hierarchical structure is embedded in the transition dynamics. Analogous to supervised meta-learning theory, we provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Furthermore, we provide regret bounds on a learner using the recovered hierarchy to solve a meta-test task. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.

[123]  arXiv:2110.09514 [pdf, other]
Title: Discovering and Achieving Goals via World Models
Comments: NeurIPS 2021. First two authors contributed equally. Website at this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)

How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/

Cross-lists for Tue, 19 Oct 21

[124]  arXiv:2110.08268 (cross-list from cs.CY) [pdf, other]
Title: Explainable Student Performance Prediction With Personalized Attention for Explaining Why A Student Fails
Comments: AAAI 2021 Workshop on AI Education/TIPCE 2021
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As student failure rates continue to increase in higher education, predicting student performance in the following semester has become a significant demand. Personalized student performance prediction helps educators gain a comprehensive view of student status and effectively intervene in advance. However, existing works scarcely consider the explainability of student performance prediction, which educators are most concerned about. In this paper, we propose a novel Explainable Student performance prediction method with Personalized Attention (ESPA) by utilizing relationships in student profiles and prior knowledge of related courses. The designed Bidirectional Long Short-Term Memory (BiLSTM) architecture extracts the semantic information in the paths with specific patterns. As for leveraging similar paths' internal relations, a local and global-level attention mechanism is proposed to distinguish the influence of different students or courses for making predictions. Hence, valid reasoning on paths can be applied to predict the performance of students. The ESPA consistently outperforms the other state-of-the-art models for student performance prediction, and the results are intuitively explainable. This work can help educators better understand the different impacts of behavior on students' studies.

[125]  arXiv:2110.08295 (cross-list from physics.flu-dyn) [pdf, other]
Title: Nonlinear proper orthogonal decomposition for convection-dominated flows
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Dynamical Systems (math.DS)

Autoencoder techniques find increasingly common use in reduced order modeling as a means to create a latent space. This reduced order representation offers a modular data-driven modeling approach for nonlinear dynamical systems when integrated with a time series predictive model. In this letter, we put forth a nonlinear proper orthogonal decomposition (POD) framework, which is an end-to-end Galerkin-free model combining autoencoders with long short-term memory networks for dynamics. By eliminating the projection error due to the truncation of Galerkin models, a key enabler of the proposed nonintrusive approach is the kinematic construction of a nonlinear mapping between the full-rank expansion of the POD coefficients and the latent space where the dynamics evolve. We test our framework for model reduction of a convection-dominated system, which is generally challenging for reduced order models. Our approach not only improves the accuracy, but also significantly reduces the computational cost of training and testing.

[126]  arXiv:2110.08313 (cross-list from eess.SY) [pdf, other]
Title: Reduced Order Dynamical Models For Complex Dynamics in Manufacturing and Natural Systems Using Machine Learning
Comments: 16 pages, 11 figures
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

Dynamical analysis of manufacturing and natural systems provides critical information about production of manufactured and natural resources respectively, thus playing an important role in assessing sustainability of these systems. However, current dynamic models for these systems exist as mechanistic models, simulation of which is computationally intensive and does not provide a simplified understanding of the mechanisms driving the overall dynamics. For such systems, lower-order models can prove useful to enable sustainability analysis through coupled dynamical analysis. There have been few attempts at finding low-order models of manufacturing and natural systems, with existing work focused on model development of individual mechanism level. This work seeks to fill this current gap in the literature of developing simplified dynamical models for these systems by developing reduced-order models using a machine learning (ML) approach. The approach is demonstrated on an entire soybean-oil to soybean-diesel process plant and a lake system. We use a grey-box ML method with a standard nonlinear optimization approach to identify relevant models of governing dynamics as ODEs using the data simulated from mechanistic models. Results show that the method identifies a high accuracy linear ODE models for the process plant, reflective of underlying linear stoichiometric mechanisms and mass balance driving the dynamics. For the natural systems, we modify the ML approach to include the effect of past dynamics, which gives non-linear ODE. While the modified approach provides a better match to dynamics of stream flow, it falls short of completely recreating the dynamics. We conclude that the proposed ML approach work well for systems where dynamics is smooth, such as in manufacturing plant whereas does not work perfectly well in case of chaotic dynamics such as water stream flow.

[127]  arXiv:2110.08324 (cross-list from cs.CR) [pdf, other]
Title: Mitigating Membership Inference Attacks by Self-Distillation Through a Novel Ensemble Architecture
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Membership inference attacks are a key measure to evaluate privacy leakage in machine learning (ML) models. These attacks aim to distinguish training members from non-members by exploiting differential behavior of the models on member and non-member inputs. The goal of this work is to train ML models that have high membership privacy while largely preserving their utility; we therefore aim for an empirical membership privacy guarantee as opposed to the provable privacy guarantees provided by techniques like differential privacy, as such techniques are shown to deteriorate model utility. Specifically, we propose a new framework to train privacy-preserving models that induces similar behavior on member and non-member inputs to mitigate membership inference attacks. Our framework, called SELENA, has two major components. The first component and the core of our defense is a novel ensemble architecture for training. This architecture, which we call Split-AI, splits the training data into random subsets, and trains a model on each subset of the data. We use an adaptive inference strategy at test time: our ensemble architecture aggregates the outputs of only those models that did not contain the input sample in their training data. We prove that our Split-AI architecture defends against a large family of membership inference attacks, however, it is susceptible to new adaptive attacks. Therefore, we use a second component in our framework called Self-Distillation to protect against such stronger attacks. The Self-Distillation component (self-)distills the training dataset through our Split-AI ensemble, without using any external public datasets. Through extensive experiments on major benchmark datasets we show that SELENA presents a superior trade-off between membership privacy and utility compared to the state of the art.

[128]  arXiv:2110.08327 (cross-list from cs.CV) [pdf, other]
Title: Solving Image PDEs with a Shallow Network
Comments: 21 pages, 22 figures, references arXiv:1802.06130, arXiv:1711.10700, arXiv:1606.01299
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Dynamical Systems (math.DS)

Partial differential equations (PDEs) are typically used as models of physical processes but are also of great interest in PDE-based image processing. However, when it comes to their use in imaging, conventional numerical methods for solving PDEs tend to require very fine grid resolution for stability, and as a result have impractically high computational cost. This work applies BLADE (Best Linear Adaptive Enhancement), a shallow learnable filtering framework, to PDE solving, and shows that the resulting approach is efficient and accurate, operating more reliably at coarse grid resolutions than classical methods. As such, the model can be flexibly used for a wide variety of problems in imaging.

[129]  arXiv:2110.08329 (cross-list from cs.CL) [pdf, other]
Title: Control Prefixes for Text Generation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Prompt learning methods adapt pre-trained language models to downstream applications by using a task-specific prompt together with the input. Most of the current work on prompt learning in text generation relies on a shared dataset-level prompt for all examples in the dataset. We extend this approach and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information in each prompt. Control Prefixes is at the intersection of prompt learning and controlled generation, empowering the model to have finer-grained control during text generation. The method incorporates attribute-level learnable representations into different layers of a pre-trained transformer, allowing for the generated text to be guided in a particular direction. We provide a systematic evaluation of the technique and apply it to five datasets from the GEM benchmark for natural language generation (NLG). We present state-of-the-art results on several data-to-text datasets, including WebNLG.

[130]  arXiv:2110.08340 (cross-list from cs.DL) [pdf, other]
Title: Return migration of German-affiliated researchers: Analyzing departure and return by gender, cohort, and discipline using Scopus bibliometric data 1996-2020
Comments: 21 pages, 6 figures
Subjects: Digital Libraries (cs.DL); Computers and Society (cs.CY); Machine Learning (cs.LG)

The international migration of researchers is a highly prized dimension of scientific mobility and motivates considerable policy debate. However, tracking migration life courses of researchers is challenging due to data limitations. In this study, we use Scopus bibliometric data on 8 million publications from 1.1 million researchers who have published at least once with an affiliation address from Germany in 1996-2020. We describe several key steps and algorithms we develop that enable us to construct the partial life histories of published researchers in this period. These tools allow us to explore both the out-migration of researchers with German affiliations as well as the subsequent return of a share of this group - the returnees. Our analyses shed light on important career stages and gender disparities between researchers who remain in Germany and those who both migrate out and those who eventually return. Return migration streams are even more gender imbalanced and point to the importance of additional efforts to attract female researchers back to Germany. We document a slightly declining trend in return migration with cohorts which, for most disciplines, is associated with decreasing German collaboration ties among cohorts of researchers who leave Germany. Also, gender disparities for the most gender imbalanced disciplines are unlikely to be mitigated by return migration given the gender compositions in cohorts of researchers who leave Germany and those who return. This analysis reveals new dimensions of scholarly migration by investigating the return migration of published researchers which is critical for science policy development.

[131]  arXiv:2110.08353 (cross-list from cs.IR) [pdf, other]
Title: Revisiting Popularity and Demographic Biases in Recommender Evaluation and Effectiveness
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recommendation algorithms are susceptible to popularity bias: a tendency to recommend popular items even when they fail to meet user needs. A related issue is that the recommendation quality can vary by demographic groups. Marginalized groups or groups that are under-represented in the training data may receive less relevant recommendations from these algorithms compared to others. In a recent study, Ekstrand et al. investigate how recommender performance varies according to popularity and demographics, and find statistically significant differences in recommendation utility between binary genders on two datasets, and significant effects based on age on one dataset. Here we reproduce those results and extend them with additional analyses. We find statistically significant differences in recommender performance by both age and gender. We observe that recommendation utility steadily degrades for older users, and is lower for women than men. We also find that the utility is higher for users from countries with more representation in the dataset. In addition, we find that total usage and the popularity of consumed content are strong predictors of recommender performance and also vary significantly across demographic groups.

[132]  arXiv:2110.08367 (cross-list from q-fin.ST) [pdf, other]
Title: Dropping diversity of products of large US firms: Models and measures
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG)

It is widely assumed that in our lifetimes the products available in the global economy have become more diverse. This assumption is difficult to investigate directly, however, because it is difficult to collect the necessary data about every product in an economy each year. We solve this problem by mining publicly available textual descriptions of the products of every large US firms each year from 1997 to 2017. Although many aspects of economic productivity have been steadily rising during this period, our text-based measurements show that the diversity of the products of at least large US firms has steadily declined. This downward trend is visible using a variety of product diversity metrics, including some that depend on a measurement of the similarity of the products of every single pair of firms. The current state of the art in comprehensive and detailed firm-similarity measurements is a Boolean word vector model due to Hoberg and Phillips. We measure diversity using firm-similarities from this Boolean model and two more sophisticated variants, and we consistently observe a significant dropping trend in product diversity. These results make it possible to frame and start to test specific hypotheses for explaining the dropping product diversity trend.

[133]  arXiv:2110.08385 (cross-list from cs.SI) [pdf, ps, other]
Title: Robust Correlation Clustering with Asymmetric Noise
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Optimization and Control (math.OC)

Graph clustering problems typically aim to partition the graph nodes such that two nodes belong to the same partition set if and only if they are similar. Correlation Clustering is a graph clustering formulation which: (1) takes as input a signed graph with edge weights representing a similarity/dissimilarity measure between the nodes, and (2) requires no prior estimate of the number of clusters in the input graph. However, the combinatorial optimization problem underlying Correlation Clustering is NP-hard. In this work, we propose a novel graph generative model, called the Node Factors Model (NFM), which is based on generating feature vectors/embeddings for the graph nodes. The graphs generated by the NFM contain asymmetric noise in the sense that there may exist pairs of nodes in the same cluster which are negatively correlated. We propose a novel Correlation Clustering algorithm, called \anormd, using techniques from semidefinite programming. Using a combination of theoretical and computational results, we demonstrate that $\texttt{$\ell_2$-norm-diag}$ recovers nodes with sufficiently strong cluster membership in graph instances generated by the NFM, thereby making progress towards establishing the provable robustness of our proposed algorithm.

[134]  arXiv:2110.08393 (cross-list from cs.AI) [pdf, other]
Title: A Bayesian Approach for Medical Inquiry and Disease Inference in Automated Differential Diagnosis
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We propose a Bayesian approach for both medical inquiry and disease inference, the two major phases in differential diagnosis. Unlike previous work that simulates data from given probabilities and uses ML algorithms on them, we directly use the Quick Medical Reference (QMR) belief network, and apply Bayesian inference in the inference phase and Bayesian experimental design in the inquiry phase. Moreover, we improve the inquiry phase by extending the Bayesian experimental design framework from one-step search to multi-step search. Our approach has some practical advantages as it is interpretable, free of costly training, and able to adapt to new changes without any additional effort. Our experiments show that our approach achieves new state-of-the-art results on two simulated datasets, SymCAT and HPO, and competitive results on two diagnosis dialogue datasets, Muzhi and Dxy.

[135]  arXiv:2110.08396 (cross-list from cs.CV) [pdf, other]
Title: Comparing Human and Machine Bias in Face Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)

Much recent research has uncovered and discussed serious concerns of bias in facial analysis technologies, finding performance disparities between groups of people based on perceived gender, skin type, lighting condition, etc. These audits are immensely important and successful at measuring algorithmic bias but have two major challenges: the audits (1) use facial recognition datasets which lack quality metadata, like LFW and CelebA, and (2) do not compare their observed algorithmic bias to the biases of their human alternatives. In this paper, we release improvements to the LFW and CelebA datasets which will enable future researchers to obtain measurements of algorithmic bias that are not tainted by major flaws in the dataset (e.g. identical images appearing in both the gallery and test set). We also use these new data to develop a series of challenging facial identification and verification questions that we administered to various algorithms and a large, balanced sample of human reviewers. We find that both computer models and human survey participants perform significantly better at the verification task, generally obtain lower accuracy rates on dark-skinned or female subjects for both tasks, and obtain higher accuracy rates when their demographics match that of the question. Computer models are observed to achieve a higher level of accuracy than the survey participants on both tasks and exhibit bias to similar degrees as the human survey participants.

[136]  arXiv:2110.08412 (cross-list from cs.CL) [pdf, other]
Title: Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

To explain NLP models, many methods inform which inputs tokens are important for a prediction. However, an open question is if these methods accurately reflect the model's logic, a property often called faithfulness. In this work, we adapt and improve a recently proposed faithfulness benchmark from computer vision called ROAR (RemOve And Retrain), by Hooker et al. (2019).
We improve ROAR by recursively removing dataset redundancies, which otherwise interfere with ROAR. We adapt and apply ROAR, to popular NLP importance measures, namely attention, gradient, and integrated gradients. Additionally, we use mutual information as an additional baseline. Evaluation is done on a suite of classification tasks often used in the faithfulness of attention literature. Finally, we propose a scalar faithfulness metric, which makes it easy to compare results across papers.
We find that, importance measures considered to be unfaithful for computer vision tasks perform favorably for NLP tasks, the faithfulness of an importance measure is task-dependent, and the computational overhead of integrated gradient is rarely justified.

[137]  arXiv:2110.08413 (cross-list from cs.CL) [pdf, other]
Title: Invariant Language Modeling
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Modern pretrained language models are critical components of NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize better across multiple environments. In particular, we adapt a game-theoretic implementation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion. In a series of controlled experiments, we demonstrate the ability of our method to (i) remove structured noise, (ii) ignore specific spurious correlations without affecting global performance, and (iii) achieve better out-of-domain generalization. These benefits come with a negligible computational overhead compared to standard training, do not require changing the local loss, and can be applied to any language model architecture. We believe this framework is promising to help mitigate spurious correlations and biases in language models.

[138]  arXiv:2110.08418 (cross-list from stat.ML) [pdf, ps, other]
Title: Nuances in Margin Conditions Determine Gains in Active Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We consider nonparametric classification with smooth regression functions, where it is well known that notions of margin in $E[Y|X]$ determine fast or slow rates in both active and passive learning. Here we elucidate a striking distinction between the two settings. Namely, we show that some seemingly benign nuances in notions of margin -- involving the uniqueness of the Bayes classifier, and which have no apparent effect on rates in passive learning -- determine whether or not any active learner can outperform passive learning rates. In particular, for Audibert-Tsybakov's margin condition (allowing general situations with non-unique Bayes classifiers), no active learner can gain over passive learning in commonly studied settings where the marginal on $X$ is near uniform. Our results thus negate the usual intuition from past literature that active rates should improve over passive rates in nonparametric settings.

[139]  arXiv:2110.08419 (cross-list from cs.CL) [pdf, other]
Title: What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent works have focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the compressed model performance for downstream tasks. However, there has been no study in analyzing the impact of compression on the generalizability and robustness of these models. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the easy samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate our mitigation framework to improve both the adversarial generalization as well as in-distribution task performance of the compressed models.

[140]  arXiv:2110.08420 (cross-list from cs.CL) [pdf, other]
Title: Information-Theoretic Measures of Dataset Difficulty
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. Not only is this framework informal, but it also provides little understanding of how difficult each instance is, or what attributes make it difficult for a given model. To address these problems, we propose an information-theoretic perspective, framing dataset difficulty as the absence of $\textit{usable information}$. Measuring usable information is as easy as measuring performance, but has certain theoretical advantages. While the latter only allows us to compare different models w.r.t the same dataset, the former also allows us to compare different datasets w.r.t the same model. We then introduce $\textit{pointwise}$ $\mathcal{V}-$$\textit{information}$ (PVI) for measuring the difficulty of individual instances, where instances with higher PVI are easier for model $\mathcal{V}$. By manipulating the input before measuring usable information, we can understand $\textit{why}$ a dataset is easy or difficult for a given model, which we use to discover annotation artefacts in widely-used benchmarks.

[141]  arXiv:2110.08421 (cross-list from cs.CV) [pdf, other]
Title: Dataset Knowledge Transfer for Class-Incremental Learning without Memory
Comments: Accepted to WACV 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Incremental learning enables artificial agents to learn from sequential data. While important progress was made by exploiting deep neural networks, incremental learning remains very challenging. This is particularly the case when no memory of past data is allowed and catastrophic forgetting has a strong negative effect. We tackle class-incremental learning without memory by adapting prediction bias correction, a method which makes predictions of past and new classes more comparable. It was proposed when a memory is allowed and cannot be directly used without memory, since samples of past classes are required. We introduce a two-step learning process which allows the transfer of bias correction parameters between reference and target datasets. Bias correction is first optimized offline on reference datasets which have an associated validation memory. The obtained correction parameters are then transferred to target datasets, for which no memory is available. The second contribution is to introduce a finer modeling of bias correction by learning its parameters per incremental state instead of the usual past vs. new class modeling. The proposed dataset knowledge transfer is applicable to any incremental method which works without memory. We test its effectiveness by applying it to four existing methods. Evaluation with four target datasets and different configurations shows consistent improvement, with practically no computational and memory overhead.

[142]  arXiv:2110.08424 (cross-list from eess.IV) [pdf]
Title: Deep learning-based detection of intravenous contrast in computed tomography scans
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Purpose: Identifying intravenous (IV) contrast use within CT scans is a key component of data curation for model development and testing. Currently, IV contrast is poorly documented in imaging metadata and necessitates manual correction and annotation by clinician experts, presenting a major barrier to imaging analyses and algorithm deployment. We sought to develop and validate a convolutional neural network (CNN)-based deep learning (DL) platform to identify IV contrast within CT scans. Methods: For model development and evaluation, we used independent datasets of CT scans of head, neck (HN) and lung cancer patients, totaling 133,480 axial 2D scan slices from 1,979 CT scans manually annotated for contrast presence by clinical experts. Five different DL models were adopted and trained in HN training datasets for slice-level contrast detection. Model performances were evaluated on a hold-out set and on an independent validation set from another institution. DL models was then fine-tuned on chest CT data and externally validated on a separate chest CT dataset. Results: Initial DICOM metadata tags for IV contrast were missing or erroneous in 1,496 scans (75.6%). The EfficientNetB4-based model showed the best overall detection performance. For HN scans, AUC was 0.996 in the internal validation set (n = 216) and 1.0 in the external validation set (n = 595). The fine-tuned model on chest CTs yielded an AUC: 1.0 for the internal validation set (n = 53), and AUC: 0.980 for the external validation set (n = 402). Conclusion: The DL model could accurately detect IV contrast in both HN and chest CT scans with near-perfect performance.

[143]  arXiv:2110.08430 (cross-list from cs.CL) [pdf, other]
Title: Metadata Shaping: Natural Language Annotations for the Tail
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Language models (LMs) have made remarkable progress, but still struggle to generalize beyond the training data to rare linguistic patterns. Since rare entities and facts are prevalent in the queries users submit to popular applications such as search and personal assistant systems, improving the ability of LMs to reliably capture knowledge over rare entities is a pressing challenge studied in significant prior work. Noticing that existing approaches primarily modify the LM architecture or introduce auxiliary objectives to inject useful entity knowledge, we ask to what extent we could match the quality of these architectures using a base LM architecture, and only changing the data? We propose metadata shaping, a method in which readily available metadata, such as entity descriptions and categorical tags, are appended to examples based on information theoretic metrics. Intuitively, if metadata corresponding to popular entities overlap with metadata for rare entities, the LM may be able to better reason about the rare entities using patterns learned from similar popular entities. On standard entity-rich tasks (TACRED, FewRel, OpenEntity), with no changes to the LM whatsoever, metadata shaping exceeds the BERT-baseline by up to 5.3 F1 points, and achieves or competes with state-of-the-art results. We further show the improvements are up to 10x larger on examples containing tail versus popular entities.

[144]  arXiv:2110.08438 (cross-list from cs.CL) [pdf, other]
Title: Unsupervised Natural Language Inference Using PHL Triplet Generation
Comments: 9 pages, 2 figures, 8 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Transformer-based models have achieved impressive performance on various Natural Language Inference (NLI) benchmarks, when trained on respective training datasets. However, in certain cases, training samples may not be available or collecting them could be time-consuming and resource-intensive. In this work, we address this challenge and present an explorative study on unsupervised NLI, a paradigm in which no human-annotated training samples are available. We investigate NLI under three challenging settings: PH, P, and NPH that differ in the extent of unlabeled data available for learning. As a solution, we propose a procedural data generation approach that leverages a set of sentence transformations to collect PHL (Premise, Hypothesis, Label) triplets for training NLI models, bypassing the need for human-annotated training datasets. Comprehensive experiments show that this approach results in accuracies of 66.75%, 65.9%, 65.39% in PH, P, NPH settings respectively, outperforming all existing baselines. Furthermore, fine-tuning our models with as little as ~0.1% of the training dataset (500 samples) leads to 12.2% higher accuracy than the model trained from scratch on the same 500 instances.

[145]  arXiv:2110.08447 (cross-list from cs.CR) [pdf, other]
Title: TESDA: Transform Enabled Statistical Detection of Attacks in Deep Neural Networks
Authors: Chandramouli Amarnath (Georgia Tech), Aishwarya H. Balwani (Georgia Tech), Kwondo Ma (Georgia Tech), Abhijit Chatterjee (Georgia Tech)
Comments: 10 pages, 2 reference pages, 2 appendix pages, 14 figures, 2 tables
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Deep neural networks (DNNs) are now the de facto choice for computer vision tasks such as image classification. However, their complexity and "black box" nature often renders the systems they're deployed in vulnerable to a range of security threats. Successfully identifying such threats, especially in safety-critical real-world applications is thus of utmost importance, but still very much an open problem. We present TESDA, a low-overhead, flexible, and statistically grounded method for {online detection} of attacks by exploiting the discrepancies they cause in the distributions of intermediate layer features of DNNs. Unlike most prior work, we require neither dedicated hardware to run in real-time, nor the presence of a Trojan trigger to detect discrepancies in behavior. We empirically establish our method's usefulness and practicality across multiple architectures, datasets and diverse attacks, consistently achieving detection coverages of above 95% with operation count overheads as low as 1-2%.

[146]  arXiv:2110.08449 (cross-list from stat.ML) [pdf, other]
Title: Adversarial Attacks on Gaussian Process Bandits
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Gaussian processes (GP) are a widely-adopted tool used to sequentially optimize black-box functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. In this paper, we study this problem from the attacker's perspective, proposing various adversarial attack methods with differing assumptions on the attacker's strength and prior information. Our goal is to understand adversarial attacks on GP bandits from both a theoretical and practical perspective. We focus primarily on targeted attacks on the popular GP-UCB algorithm and a related elimination-based algorithm, based on adversarially perturbing the function $f$ to produce another function $\tilde{f}$ whose optima are in some region $\mathcal{R}_{\rm target}$. Based on our theoretical analysis, we devise both white-box attacks (known $f$) and black-box attacks (unknown $f$), with the former including a Subtraction attack and Clipping attack, and the latter including an Aggressive subtraction attack. We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards $\mathcal{R}_{\rm target}$ even with a low attack budget, and we compare our attacks' performance and efficiency on several real and synthetic functions.

[147]  arXiv:2110.08470 (cross-list from cs.CL) [pdf, other]
Title: Case-based Reasoning for Better Generalization in Text-Adventure Games
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Text-based games (TBG) have emerged as promising environments for driving research in grounded language understanding and studying problems like generalization and sample efficiency. Several deep reinforcement learning (RL) methods with varying architectures and learning schemes have been proposed for TBGs. However, these methods fail to generalize efficiently, especially under distributional shifts. In a departure from deep RL approaches, in this paper, we propose a general method inspired by case-based reasoning to train agents and generalize out of the training distribution. The case-based reasoner collects instances of positive experiences from the agent's interaction with the world in the past and later reuses the collected experiences to act efficiently. The method can be applied in conjunction with any existing on-policy neural agent in the literature for TBGs. Our experiments show that the proposed approach consistently improves existing methods, obtains good out-of-distribution generalization, and achieves new state-of-the-art results on widely used environments.

[148]  arXiv:2110.08471 (cross-list from math.OC) [pdf, other]
Title: Fast Projection onto the Capped Simplex withApplications to Sparse Regression in Bioinformatics
Comments: 12 pages, 5 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Genomics (q-bio.GN)

We consider the problem of projecting a vector onto the so-called k-capped simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input vector with bounded elements, we found that a simple algorithm based on Newton's method is able to solve the projection problem to high precision with a complexity roughly about O(n), which has a much lower computational cost compared with the existing sorting-based methods proposed in the literature. We provide a theory for partial explanation and justification of the method.
We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets, and the algorithm is able to significantly outperform the state-of-the-art methods in terms of runtime (about 6-8 times faster than a commercial software with respect to CPU time for input vector with 1 million variables or more).
We further illustrate the effectiveness of the proposed algorithm on solving sparse regression in a bioinformatics problem. Empirical results on the GWAS dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the accelerated PQN algorithm is able to handle huge-scale regression problem and it is more efficient (about 3-6 times faster) than the current state-of-the-art methods.

[149]  arXiv:2110.08488 (cross-list from cs.RO) [pdf, other]
Title: Lifelong Topological Visual Navigation
Comments: Project page: this https URL
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The ability for a robot to navigate with only the use of vision is appealing due to its simplicity. Traditional vision-based navigation approaches required a prior map-building step that was arduous and prone to failure, or could only exactly follow previously executed trajectories. Newer learning-based visual navigation techniques reduce the reliance on a map and instead directly learn policies from image inputs for navigation. There are currently two prevalent paradigms: end-to-end approaches forego the explicit map representation entirely, and topological approaches which still preserve some loose connectivity of the space. However, while end-to-end methods tend to struggle in long-distance navigation tasks, topological map-based solutions are prone to failure due to spurious edges in the graph. In this work, we propose a learning-based topological visual navigation method with graph update strategies that improve lifelong navigation performance over time. We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods. Also, unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed. We further assess performance of our system in real-world deployments.

[150]  arXiv:2110.08500 (cross-list from stat.ML) [pdf, other]
Title: On Model Selection Consistency of Lasso for High-Dimensional Ising Models on Tree-like Graphs
Comments: 30 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

We consider the problem of high-dimensional Ising model selection using neighborhood-based least absolute shrinkage and selection operator (Lasso). It is rigorously proved that under some mild coherence conditions on the population covariance matrix of the Ising model, consistent model selection can be achieved with sample sizes $n=\Omega{(d^3\log{p})}$ for any tree-like graph in the paramagnetic phase, where $p$ is the number of variables and $d$ is the maximum node degree. When the same conditions are imposed directly on the sample covariance matrices, it is shown that a reduced sample size $n=\Omega{(d^2\log{p})}$ suffices. The obtained sufficient conditions for consistent model selection with Lasso are the same in the scaling of the sample complexity as that of $\ell_1$-regularized logistic regression. Given the popularity and efficiency of Lasso, our rigorous analysis provides a theoretical backing for its practical use in Ising model selection.

[151]  arXiv:2110.08505 (cross-list from stat.ML) [pdf, other]
Title: Mode and Ridge Estimation in Euclidean and Directional Product Spaces: A Mean Shift Approach
Comments: 51 pages, 10 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)

The set of local modes and the ridge lines estimated from a dataset are important summary characteristics of the data-generating distribution. In this work, we consider estimating the local modes and ridges from point cloud data in a product space with two or more Euclidean/directional metric spaces. Specifically, we generalize the well-known (subspace constrained) mean shift algorithm to the product space setting and illuminate some pitfalls in such generalization. We derive the algorithmic convergence of the proposed method, provide practical guidelines on the implementation, and demonstrate its effectiveness on both simulated and real datasets.

[152]  arXiv:2110.08509 (cross-list from eess.IV) [pdf, other]
Title: BAPGAN: GAN-based Bone Age Progression of Femur and Phalange X-ray Images
Comments: 6 pages, 5 figures, accepted to SPIE Medical Imaging 2022
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Convolutional Neural Networks play a key role in bone age assessment for investigating endocrinology, genetic, and growth disorders under various modalities and body regions. However, no researcher has tackled bone age progression/regression despite its valuable potential applications: bone-related disease diagnosis, clinical knowledge acquisition, and museum education. Therefore, we propose Bone Age Progression Generative Adversarial Network (BAPGAN) to progress/regress both femur/phalange X-ray images while preserving identity and realism. We exhaustively confirm the BAPGAN's clinical potential via Frechet Inception Distance, Visual Turing Test by two expert orthopedists, and t-Distributed Stochastic Neighbor Embedding.

[153]  arXiv:2110.08514 (cross-list from cs.CL) [pdf, other]
Title: Analyzing Dynamic Adversarial Training Data in the Limit
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena. Dynamic adversarial data collection (DADC), where annotators craft examples that challenge continually improving models, holds promise as an approach for generating such diverse training sets. Prior work has shown that running DADC over 1-3 rounds can help models fix some error types, but it does not necessarily lead to better generalization beyond adversarial test data. We argue that running DADC over many rounds maximizes its training-time benefits, as the different rounds can together cover many of the task-relevant phenomena. We present the first study of longer-term DADC, where we collect 20 rounds of NLI examples for a small set of premise paragraphs, with both adversarial and non-adversarial approaches. Models trained on DADC examples make 26% fewer errors on our expert-curated test set compared to models trained on non-adversarial data. Our analysis shows that DADC yields examples that are more difficult, more lexically and syntactically diverse, and contain fewer annotation artifacts compared to non-adversarial examples.

[154]  arXiv:2110.08515 (cross-list from cs.CL) [pdf, other]
Title: Multimodal Dialogue Response Generation
Comments: This paper has been submitted before 15th October @ 11:59pm AOE(UTC -12)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)

Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response. Learning such a model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider multimodal dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of text-only dialogues and text-image pairs respectively, then the whole parameters can be well fitted using the limited training examples. Extensive experiments demonstrate our method achieves state-of-the-art results in both automatic and human evaluation, and can generate informative text and high-resolution image responses.

[155]  arXiv:2110.08527 (cross-list from cs.CL) [pdf, other]
Title: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-Trained Language Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent work has shown that pre-trained language models capture social biases from the text corpora they are trained on. This has attracted attention to developing techniques that mitigate such biases. In this work, we perform a empirical survey of five recently proposed debiasing techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three different bias benchmarks while also measuring the impact of these techniques on a model's language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) CDA and Self-Debias are the strongest of the debiasing techniques, obtaining improved scores on most of the bias benchmarks (2) Current debiasing techniques do not generalize well beyond gender bias; And (3) improvements on bias benchmarks such as StereoSet and CrowS-Pairs by using debiasing strategies are usually accompanied by a decrease in language modeling ability, making it difficult to determine whether the bias mitigation is effective.

[156]  arXiv:2110.08529 (cross-list from cs.CL) [pdf, other]
Title: Sharpness-Aware Minimization Improves Language Model Generalization
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by and large, revolved around new model architectures, training tasks, and loss objectives, along with substantial engineering efforts to scale up model capacity and dataset size. Comparatively little work has been done to improve the generalization of these models through better optimization. In this work, we show that Sharpness-Aware Minimization (SAM), a recently proposed optimization procedure that encourages convergence to flatter minima, can substantially improve the generalization of language models without much computational overhead. We show that SAM is able to boost performance on SuperGLUE, GLUE, Web Questions, Natural Questions, Trivia QA, and TyDiQA, with particularly large gains when training data for these tasks is limited.

[157]  arXiv:2110.08531 (cross-list from math.OC) [pdf, other]
Title: A theoretical and empirical study of new adaptive algorithms with additional momentum steps and shifted updates for stochastic non-convex optimization
Comments: 36 pages, 5 figures, 6 tables, 35 references
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

In the following paper we introduce new adaptive algorithms endowed with momentum terms for stochastic non-convex optimization problems. We investigate the almost sure convergence to stationary points, along with a finite-time horizon analysis with respect to a chosen final iteration, and we also inspect the worst-case iteration complexity. An estimate for the expectation of the squared Euclidean norm of the gradient is given and the theoretical analysis that we perform is assisted by various computational simulations for neural network training.

[158]  arXiv:2110.08536 (cross-list from cs.CL) [pdf, other]
Title: Sparse Distillation: Speeding Up Text Classification by Using Bigger Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time. However, the improved inference speed may be still unsatisfactory for certain time-sensitive applications. In this paper, we aim to further push the limit of inference speed by exploring a new area in the design space of the student model. More specifically, we consider distilling a transformer-based text classifier into a billion-parameter, sparsely-activated student model with a embedding-averaging architecture. Our experiments show that the student models retain 97% of the RoBERTa-Large teacher performance on a collection of six text classification tasks. Meanwhile, the student model achieves up to 600x speed-up on both GPUs and CPUs, compared to the teacher models. Further investigation shows that our pipeline is also effective in privacy-preserving and domain generalization settings.

[159]  arXiv:2110.08545 (cross-list from eess.AS) [pdf, other]
Title: A Unified Speaker Adaptation Approach for ASR
Comments: Accepted by EMNLP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Transformer models have been used in automatic speech recognition (ASR) successfully and yields state-of-the-art results. However, its performance is still affected by speaker mismatch between training and test data. Further finetuning a trained model with target speaker data is the most natural approach for adaptation, but it takes a lot of compute and may cause catastrophic forgetting to the existing speakers. In this work, we propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation. For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers by making use of speaker i-vectors to form a persistent memory. For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture, which to the best of our knowledge, has never been explored in ASR. Specifically, we gradually prune less contributing parameters on model encoder to a certain sparsity level, and use the pruned parameters for adaptation, while freezing the unpruned parameters to keep the original model performance. We conduct experiments on the Librispeech dataset. Our proposed approach brings relative 2.74-6.52% word error rate (WER) reduction on general speaker adaptation. On target speaker adaptation, our method outperforms the baseline with up to 20.58% relative WER reduction, and surpasses the finetuning method by up to relative 2.54%. Besides, with extremely low-resource adaptation data (e.g., 1 utterance), our method could improve the WER by relative 6.53% with only a few epochs of training.

[160]  arXiv:2110.08551 (cross-list from cs.CL) [pdf, other]
Title: HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression
Comments: EMNLP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at https://github.com/cheneydon/hrkd.

[161]  arXiv:2110.08552 (cross-list from cs.CL) [pdf, other]
Title: Virtual Augmentation Supported Contrastive Learning of Sentence Representations
Comments: 8 pages, 3 figures, 3 tables
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Despite profound successes, contrastive representation learning relies on carefully designed data augmentations using domain specific knowledge. This challenge is magnified in natural language processing where no general rules exist for data augmentation due to the discrete nature of natural language. We tackle this challenge by presenting a Virtual augmentation Supported Contrastive Learning of sentence representations (VaSCL). Originating from the interpretation that data augmentation essentially constructs the neighborhoods of each training instance, we in turn utilize the neighborhood to generate effective data augmentations. Leveraging the large training batch size of contrastive learning, we approximate the neighborhood of an instance via its K-nearest in-batch neighbors in the representation space. We then define an instance discrimination task within this neighborhood, and generate the virtual augmentation in an adversarial training manner. We access the performance of VaSCL on a wide range of downstream tasks, and set a new state-of-the-art for unsupervised sentence representation learning.

[162]  arXiv:2110.08562 (cross-list from cs.CV) [pdf, other]
Title: BNAS v2: Learning Architectures for Binary Networks with Empirical Improvements
Comments: arXiv admin note: text overlap with arXiv:2002.06963
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Backbone architectures of most binary networks are well-known floating point (FP) architectures such as the ResNet family. Questioning that the architectures designed for FP networks might not be the best for binary networks, we propose to search architectures for binary networks (BNAS) by defining a new search space for binary architectures and a novel search objective. Specifically, based on the cell based search method, we define the new search space of binary layer types, design a new cell template, and rediscover the utility of and propose to use the Zeroise layer instead of using it as a placeholder. The novel search objective diversifies early search to learn better performing binary architectures. We show that our method searches architectures with stable training curves despite the quantization error inherent in binary networks. Quantitative analyses demonstrate that our searched architectures outperform the architectures used in state-of-the-art binary networks and outperform or perform on par with state-of-the-art binary networks that employ various techniques other than architectural changes. In addition, we further propose improvements to the training scheme of our searched architectures. With the new training scheme for our searched architectures, we achieve the state-of-the-art performance by binary networks by outperforming all previous methods by non-trivial margins.

[163]  arXiv:2110.08577 (cross-list from math.OC) [pdf, other]
Title: Nys-Curve: Nyström-Approximated Curvature for Stochastic Optimization
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)

The quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton step-based stochastic optimization algorithm for large-scale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size ($d\times k$) with $k\ll d$ randomly selected variables, then use the \textit{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. Furthermore, to address large-scale scenarios in which even computing a partial Hessian may require significant time, we used distribution-preserving (DP) sub-sampling to compute a partial Hessian. The DP sub-sampling generates $p$ sub-samples with similar first and second-order distribution statistics and selects a single sub-sample at each epoch in a round-robin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method with performance competitive with the state-of-the-art first-order and the stochastic quasi-Newton methods.

[164]  arXiv:2110.08583 (cross-list from eess.AS) [pdf, ps, other]
Title: ASR4REAL: An extended benchmark for speech models
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Popular ASR benchmarks such as Librispeech and Switchboard are limited in the diversity of settings and speakers they represent. We introduce a set of benchmarks matching real-life conditions, aimed at spotting possible biases and weaknesses in models. We have found out that even though recent models do not seem to exhibit a gender bias, they usually show important performance discrepancies by accent, and even more important ones depending on the socio-economic status of the speakers. Finally, all tested models show a strong performance drop when tested on conversational speech, and in this precise context even a language model trained on a dataset as big as Common Crawl does not seem to have significant positive effect which reiterates the importance of developing conversational language models

[165]  arXiv:2110.08586 (cross-list from cs.RO) [pdf, other]
Title: Generative Adversarial Imitation Learning for End-to-End Autonomous Driving on Urban Environments
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)

Autonomous driving is a complex task, which has been tackled since the first self-driving car ALVINN in 1989, with a supervised learning approach, or behavioral cloning (BC). In BC, a neural network is trained with state-action pairs that constitute the training set made by an expert, i.e., a human driver. However, this type of imitation learning does not take into account the temporal dependencies that might exist between actions taken in different moments of a navigation trajectory. These type of tasks are better handled by reinforcement learning (RL) algorithms, which need to define a reward function. On the other hand, more recent approaches to imitation learning, such as Generative Adversarial Imitation Learning (GAIL), can train policies without explicitly requiring to define a reward function, allowing an agent to learn by trial and error directly on a training set of expert trajectories. In this work, we propose two variations of GAIL for autonomous navigation of a vehicle in the realistic CARLA simulation environment for urban scenarios. Both of them use the same network architecture, which process high dimensional image input from three frontal cameras, and other nine continuous inputs representing the velocity, the next point from the sparse trajectory and a high-level driving command. We show that both of them are capable of imitating the expert trajectory from start to end after training ends, but the GAIL loss function that is augmented with BC outperforms the former in terms of convergence time and training stability.

[166]  arXiv:2110.08598 (cross-list from eess.AS) [pdf, other]
Title: A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent variables of DNNs via a VB inference framework. To accomplish model transfer, knowledge learnt from a source domain is encoded in prior distributions of latent variables and optimally combined, in a Bayesian sense, with a small set of adaptation data from a target domain to approximate the corresponding posterior distributions. Experimental results on device adaptation in acoustic scene classification show that our proposed VB approach can obtain good improvements on target devices, and consistently outperforms 13 state-of-the-art knowledge transfer algorithms.

[167]  arXiv:2110.08610 (cross-list from cs.HC) [pdf, other]
Title: MAAD: A Model and Dataset for "Attended Awareness" in Driving
Comments: 25 pages, 13 figures, 14 tables, Accepted at EPIC@ICCV 2021 Workshop. Main paper + Supplementary Material
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

We propose a computational model to estimate a person's attended awareness of their environment. We define attended awareness to be those parts of a potentially dynamic scene which a person has attended to in recent history and which they are still likely to be physically aware of. Our model takes as input scene information in the form of a video and noisy gaze estimates, and outputs visual saliency, a refined gaze estimate, and an estimate of the person's attended awareness. In order to test our model, we capture a new dataset with a high-precision gaze tracker including 24.5 hours of gaze sequences from 23 subjects attending to videos of driving scenes. The dataset also contains third-party annotations of the subjects' attended awareness based on observations of their scan path. Our results show that our model is able to reasonably estimate attended awareness in a controlled setting, and in the future could potentially be extended to real egocentric driving data to help enable more effective ahead-of-time warnings in safety systems and thereby augment driver performance. We also demonstrate our model's effectiveness on the tasks of saliency, gaze calibration, and denoising, using both our dataset and an existing saliency dataset. We make our model and dataset available at https://github.com/ToyotaResearchInstitute/att-aware/.

[168]  arXiv:2110.08618 (cross-list from astro-ph.IM) [pdf, other]
Title: Convolutional Deep Denoising Autoencoders for Radio Astronomical Images
Comments: 21 pages, 14 figures, Accepted for publication by MNRAS
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)

We apply a Machine Learning technique known as Convolutional Denoising Autoencoder to denoise synthetic images of state-of-the-art radio telescopes, with the goal of detecting the faint, diffused radio sources predicted to characterise the radio cosmic web. In our application, denoising is intended to address both the reduction of random instrumental noise and the minimisation of additional spurious artefacts like the sidelobes, resulting from the aperture synthesis technique. The effectiveness and the accuracy of the method are analysed for different kinds of corrupted input images, together with its computational performance. Specific attention has been devoted to create realistic mock observations for the training, exploiting the outcomes of cosmological numerical simulations, to generate images corresponding to LOFAR HBA 8 hours observations at 150 MHz. Our autoencoder can effectively denoise complex images identifying and extracting faint objects at the limits of the instrumental sensitivity. The method can efficiently scale on large datasets, exploiting high performance computing solutions, in a fully automated way (i.e. no human supervision is required after training). It can accurately perform image segmentation, identifying low brightness outskirts of diffused sources, proving to be a viable solution for detecting challenging extended objects hidden in noisy radio observations.

[169]  arXiv:2110.08633 (cross-list from cs.DC) [pdf, other]
Title: Hydra: A System for Large Multi-Model Deep Learning
Comments: 12 pages including references. Under submission at Conference on Systems and Machine Learning Foundation
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Machine Learning (cs.LG)

Training deep learning (DL) models that do not fit into the memory of a single GPU is a vexed process, forcing users to procure multiple GPUs to adopt model-parallel execution. Unfortunately, sequential dependencies in neural architectures often block efficient multi-device training, leading to suboptimal performance. We present 'model spilling', a technique aimed at models such as Transformers and CNNs to move groups of layers, or shards, between DRAM and GPU memory, thus enabling arbitrarily large models to be trained even on just one GPU. We then present a set of novel techniques leveraging spilling to raise efficiency for multi-model training workloads such as model selection: a new hybrid of task- and model-parallelism, a new shard scheduling heuristic, and 'double buffering' to hide latency. We prototype our ideas into a system we call HYDRA to support seamless single-model and multi-model training of large DL models. Experiments with real benchmark workloads show that HYDRA is over 7x faster than regular model parallelism and over 50% faster than state-of-the-art industrial tools for pipeline parallelism.

[170]  arXiv:2110.08634 (cross-list from cs.SD) [pdf, other]
Title: Towards Robust Waveform-Based Acoustic Models
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

We propose an approach for learning robust acoustic models in adverse environments, characterized by a significant mismatch between training and test conditions. This problem is of paramount importance for the deployment of speech recognition systems that need to perform well in unseen environments. Our approach is an instance of vicinal risk minimization, which aims to improve risk estimates during training by replacing the delta functions that define the empirical density over the input space with an approximation of the marginal population density in the vicinity of the training samples. More specifically, we assume that local neighborhoods centered at training samples can be approximated using a mixture of Gaussians, and demonstrate theoretically that this can incorporate robust inductive bias into the learning process. We characterize the individual mixture components implicitly via data augmentation schemes, designed to address common sources of spurious correlations in acoustic models. To avoid potential confounding effects on robustness due to information loss, which has been associated with standard feature extraction techniques (e.g., FBANK and MFCC features), we focus our evaluation on the waveform-based setting. Our empirical results show that the proposed approach can generalize to unseen noise conditions, with 150% relative improvement in out-of-distribution generalization compared to training using the standard risk minimization principle. Moreover, the results demonstrate competitive performance relative to models learned using a training sample designed to match the acoustic conditions characteristic of test utterances (i.e., optimal vicinal densities).

[171]  arXiv:2110.08668 (cross-list from eess.IV) [pdf, other]
Title: Fast Strain Estimation and Frame Selection in Ultrasound Elastography using Machine Learning
Journal-ref: journal:IEEE transactions on ultrasonics, ferroelectrics, and frequency Control, volume:68, number:3, pages:406-415, year:2020
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)

Ultrasound Elastography aims to determine the mechanical properties of the tissue by monitoring tissue deformation due to internal or external forces. Tissue deformations are estimated from ultrasound radio frequency (RF) signals and are often referred to as time delay estimation (TDE). Given two RF frames I1 and I2, we can compute a displacement image which shows the change in the position of each sample in I1 to a new position in I2. Two important challenges in TDE include high computational complexity and the difficulty in choosing suitable RF frames. Selecting suitable frames is of high importance because many pairs of RF frames either do not have acceptable deformation for extracting informative strain images or are decorrelated and deformation cannot be reliably estimated. Herein, we introduce a method that learns 12 displacement modes in quasi-static elastography by performing Principal Component Analysis (PCA) on displacement fields of a large training database. In the inference stage, we use dynamic programming (DP) to compute an initial displacement estimate of around 1% of the samples, and then decompose this sparse displacement into a linear combination of the 12 displacement modes. Our method assumes that the displacement of the whole image could also be described by this linear combination of principal components. We then use the GLobal Ultrasound Elastography (GLUE) method to fine-tune the result yielding the exact displacement image. Our method, which we call PCA-GLUE, is more than 10 times faster than DP in calculating the initial displacement map while giving the same result. Our second contribution in this paper is determining the suitability of the frame pair I1 and I2 for strain estimation, which we achieve by using the weight vector that we calculated for PCA-GLUE as an input to a multi-layer perceptron (MLP) classifier.

[172]  arXiv:2110.08676 (cross-list from stat.ML) [pdf, other]
Title: Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-purpose Regularizer and Privacy Budget Retrieval and Recycling
Authors: Yinan Li, Fang Liu
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

We propose Noise-Augmented Privacy-Preserving Empirical Risk Minimization (NAPP-ERM) that solves ERM with differential privacy guarantees. Existing privacy-preserving ERM approaches may be subject to over-regularization with the employment of an l2 term to achieve strong convexity on top of the target regularization. NAPP-ERM improves over the current approaches and mitigates over-regularization by iteratively realizing target regularization through appropriately designed augmented data and delivering strong convexity via a single adaptively weighted dual-purpose l2 regularizer. When the target regularization is for variable selection, we propose a new regularizer that achieves both privacy and sparsity guarantees simultaneously. Finally, we propose a strategy to retrieve privacy budget when the strong convexity requirement is met, which can be returned to users such that the DP of ERM is guaranteed at a lower privacy cost than originally planned, or be recycled to the ERM optimization procedure to reduce the injected DP noise and improve the utility of DP-ERM. From an implementation perspective, NAPP-ERM can be achieved by optimizing a non-perturbed object function given noise-augmented data and can thus leverage existing tools for non-private ERM optimization. We illustrate through extensive experiments the mitigation effect of the over-regularization and private budget retrieval by NAPP-ERM on variable selection and prediction.

[173]  arXiv:2110.08685 (cross-list from cs.AR) [pdf, other]
Title: A Learning-based Approach Towards Automated Tuning of SSD Configurations
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG); Operating Systems (cs.OS)

Thanks to the mature manufacturing techniques, solid-state drives (SSDs) are highly customizable for applications today, which brings opportunities to further improve their storage performance and resource utilization. However, the SSD efficiency is usually determined by many hardware parameters, making it hard for developers to manually tune them and determine the optimal SSD configurations.
In this paper, we present an automated learning-based framework, named LearnedSSD, that utilizes both supervised and unsupervised machine learning (ML) techniques to drive the tuning of hardware configurations for SSDs. LearnedSSD automatically extracts the unique access patterns of a new workload using its block I/O traces, maps the workload to previously workloads for utilizing the learned experiences, and recommends an optimal SSD configuration based on the validated storage performance. LearnedSSD accelerates the development of new SSD devices by automating the hard-ware parameter configurations and reducing the manual efforts. We develop LearnedSSD with simple yet effective learning algorithms that can run efficiently on multi-core CPUs. Given a target storage workload, our evaluation shows that LearnedSSD can always deliver an optimal SSD configuration for the target workload, and this configuration will not hurt the performance of non-target workloads.

[174]  arXiv:2110.08691 (cross-list from cs.DS) [pdf, ps, other]
Title: Terminal Embeddings in Sublinear Time
Comments: Accepted to FOCS 2021
Subjects: Data Structures and Algorithms (cs.DS); Computational Geometry (cs.CG); Machine Learning (cs.LG); Machine Learning (stat.ML)

Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying
\begin{equation*}
\forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) .
\end{equation*}
In the case that $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside is that evaluating the embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in $m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $n^{1-\Theta(\epsilon^2)+o(1)} + dn^{o(1)}$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.

[175]  arXiv:2110.08704 (cross-list from cs.IT) [pdf, other]
Title: A Q-Learning-based Approach for Distributed Beam Scheduling in mmWave Networks
Comments: 10 pages
Subjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We consider the problem of distributed downlink beam scheduling and power allocation for millimeter-Wave (mmWave) cellular networks where multiple base stations (BSs) belonging to different service operators share the same unlicensed spectrum with no central coordination or cooperation among them. Our goal is to design efficient distributed beam scheduling and power allocation algorithms such that the network-level payoff, defined as the weighted sum of the total throughput and a power penalization term, can be maximized. To this end, we propose a distributed scheduling approach to power allocation and adaptation for efficient interference management over the shared spectrum by modeling each BS as an independent Q-learning agent. As a baseline, we compare the proposed approach to the state-of-the-art non-cooperative game-based approach which was previously developed for the same problem. We conduct extensive experiments under various scenarios to verify the effect of multiple factors on the performance of both approaches. Experiment results show that the proposed approach adapts well to different interference situations by learning from experience and can achieve higher payoff than the game-based approach. The proposed approach can also be integrated into our previously developed Lyapunov stochastic optimization framework for the purpose of network utility maximization with optimality guarantee. As a result, the weights in the payoff function can be automatically and optimally determined by the virtual queue values from the sub-problems derived from the Lyapunov optimization framework.

[176]  arXiv:2110.08713 (cross-list from math.OC) [pdf, other]
Title: Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set
Authors: Mao Ye, Qiang Liu
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Many modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions that may conflict with each other. The notion of the Pareto set allows us to focus on the set of (often infinite number of) models that cannot be strictly improved. But it does not provide an actionable procedure for picking one or a few special models to return to practical users. In this paper, we consider \emph{optimization in Pareto set (OPT-in-Pareto)}, the problem of finding Pareto models that optimize an extra reference criterion function within the Pareto set. This function can either encode a specific preference from the users, or represent a generic diversity measure for obtaining a set of diversified Pareto models that are representative of the whole Pareto set. Unfortunately, despite being a highly useful framework, efficient algorithms for OPT-in-Pareto have been largely missing, especially for large-scale, non-convex, and non-linear objectives in deep learning. A naive approach is to apply Riemannian manifold gradient descent on the Pareto set, which yields a high computational cost due to the need for eigen-calculation of Hessian matrices. We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information, with both high practical efficiency and theoretically guaranteed convergence property. Empirically, we demonstrate that our method works efficiently for a variety of challenging multi-task-related problems.

[177]  arXiv:2110.08721 (cross-list from eess.IV) [pdf, other]
Title: CAE-Transformer: Transformer-based Model to Predict Invasiveness of Lung Adenocarcinoma Subsolid Nodules from Non-thin Section 3D CT Scans
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Lung cancer is the leading cause of mortality from cancer worldwide and has various histologic types, among which Lung Adenocarcinoma (LAUC) has recently been the most prevalent. Lung adenocarcinomas are classified as pre-invasive, minimally invasive, and invasive adenocarcinomas. Timely and accurate knowledge of the invasiveness of lung nodules leads to a proper treatment plan and reduces the risk of unnecessary or late surgeries. Currently, the primary imaging modality to assess and predict the invasiveness of LAUCs is the chest CT. The results based on CT images, however, are subjective and suffer from a low accuracy compared to the ground truth pathological reviews provided after surgical resections. In this paper, a predictive transformer-based framework, referred to as the "CAE-Transformer", is developed to classify LAUCs. The CAE-Transformer utilizes a Convolutional Auto-Encoder (CAE) to automatically extract informative features from CT slices, which are then fed to a modified transformer model to capture global inter-slice relations. Experimental results on our in-house dataset of 114 pathologically proven Sub-Solid Nodules (SSNs) demonstrate the superiority of the CAE-Transformer over the histogram/radiomics-based models and its deep learning-based counterparts, achieving an accuracy of 87.73%, sensitivity of 88.67%, specificity of 86.33%, and AUC of 0.913, using a 10-fold cross-validation.

[178]  arXiv:2110.08745 (cross-list from cs.CL) [pdf, other]
Title: Reminding the Incremental Language Model via Data-Free Self-Distillation
Comments: 8 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Incremental language learning with pseudo-data can alleviate catastrophic forgetting in neural networks. However, to obtain better performance, former methods have higher demands for pseudo-data of the previous tasks. The performance dramatically decreases when fewer pseudo-data are employed. In addition, the distribution of pseudo-data gradually deviates from the real data with the sequential learning of different tasks. The deviation will be greater with more tasks learned, which results in more serious catastrophic forgetting. To address these issues, we propose reminding incremental language model via data-free self-distillation (DFSD), which includes self-distillation based on the Earth Mover's Distance and hidden data augmentation. By estimating the knowledge distribution in all layers of GPT-2 and transforming it from teacher model to student model, the Self-distillation based on the Earth Mover's Distance can significantly reduce the demand for pseudo-data. Hidden data augmentation can greatly alleviate the catastrophic forgetting caused by deviations via modeling the generation of pseudo-data as a hidden data augmentation process, where each sample is a mixture of all trained task data. The experimental results demonstrate that our DFSD can exceed the previous state-of-the-art methods even if the maximum decrease in pseudo-data is 90%.

[179]  arXiv:2110.08750 (cross-list from cs.RO) [pdf, other]
Title: TIP: Task-Informed Motion Prediction for Intelligent Systems
Comments: 8 pages, 6 figures, 2 tables
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Motion prediction is important for intelligent driving systems, providing the future distributions of road agent behaviors and supporting various decision making tasks. Existing motion predictors are often optimized and evaluated via task-agnostic measures based on prediction accuracy. Such measures fail to account for the use of prediction in downstream tasks, and could result in sub-optimal task performance. We propose a task-informed motion prediction framework that jointly reasons about prediction accuracy and task utility, to better support downstream tasks through its predictions. The task utility function does not require the full task information, but rather a specification of the utility of the task, resulting in predictors that serve a wide range of downstream tasks. We demonstrate our framework on two use cases of task utilities, in the context of autonomous driving and parallel autonomy, and show the advantage of task-informed predictors over task-agnostic ones on the Waymo Open Motion dataset.

[180]  arXiv:2110.08776 (cross-list from eess.IV) [pdf, other]
Title: Self-Supervised U-Net for Segmenting Flat and Sessile Polyps
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Colorectal Cancer(CRC) poses a great risk to public health. It is the third most common cause of cancer in the US. Development of colorectal polyps is one of the earliest signs of cancer. Early detection and resection of polyps can greatly increase survival rate to 90%. Manual inspection can cause misdetections because polyps vary in color, shape, size and appearance. To this end, Computer-Aided Diagnosis systems(CADx) has been proposed that detect polyps by processing the colonoscopic videos. The system acts a secondary check to help clinicians reduce misdetections so that polyps may be resected before they transform to cancer. Polyps vary in color, shape, size, texture and appearance. As a result, the miss rate of polyps is between 6% and 27% despite the prominence of CADx solutions. Furthermore, sessile and flat polyps which have diameter less than 10 mm are more likely to be undetected. Convolutional Neural Networks(CNN) have shown promising results in polyp segmentation. However, all of these works have a supervised approach and are limited by the size of the dataset. It was observed that smaller datasets reduce the segmentation accuracy of ResUNet++. We train a U-Net to inpaint randomly dropped out pixels in the image as a proxy task. The dataset we use for pre-training is Kvasir-SEG dataset. This is followed by a supervised training on the limited Kvasir-Sessile dataset. Our experimental results demonstrate that with limited annotated dataset and a larger unlabeled dataset, self-supervised approach is a better alternative than fully supervised approach. Specifically, our self-supervised U-Net performs better than five segmentation models which were trained in supervised manner on the Kvasir-Sessile dataset.

[181]  arXiv:2110.08787 (cross-list from cs.CV) [pdf, other]
Title: PixelPyramids: Exact Inference Models from Lossless Image Pyramids
Comments: To appear at ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Autoregressive models are a class of exact inference approaches with highly flexible functional forms, yielding state-of-the-art density estimates for natural images. Yet, the sequential ordering on the dimensions makes these models computationally expensive and limits their applicability to low-resolution imagery. In this work, we propose Pixel-Pyramids, a block-autoregressive approach employing a lossless pyramid decomposition with scale-specific representations to encode the joint distribution of image pixels. Crucially, it affords a sparser dependency structure compared to fully autoregressive approaches. Our PixelPyramids yield state-of-the-art results for density estimation on various image datasets, especially for high-resolution data. For CelebA-HQ 1024 x 1024, we observe that the density estimates (in terms of bits/dim) are improved to ~44% of the baseline despite sampling speeds superior even to easily parallelizable flow-based models.

[182]  arXiv:2110.08791 (cross-list from cs.CV) [pdf, other]
Title: Taming Visually Guided Sound Generation
Comments: Accepted as an oral presentation for the BMVC 2021. Code: this https URL Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Recent advances in visually-induced audio generation are based on sampling short, low-fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the-art model takes minutes on a high-end GPU. In this work, we propose a single model capable of generating visually relevant, high-fidelity sounds prompted with a set of frames from open-domain videos in less time than it takes to play it on a single GPU.
We train a transformer to sample a new spectrogram from the pre-trained spectrogram codebook given the set of video features. The codebook is obtained using a variant of VQGAN trained to produce a compact sampling space with a novel spectrogram-based perceptual loss. The generated spectrogram is transformed into a waveform using a window-based GAN that significantly speeds up generation. Considering the lack of metrics for automatic evaluation of generated spectrograms, we also build a family of metrics called FID and MKL. These metrics are based on a novel sound classifier, called Melception, and designed to evaluate the fidelity and relevance of open-domain samples.
Both qualitative and quantitative studies are conducted on small- and large-scale datasets to evaluate the fidelity and relevance of generated samples. We also compare our model to the state-of-the-art and observe a substantial improvement in quality, size, and computation time. Code, demo, and samples: v-iashin.github.io/SpecVQGAN

[183]  arXiv:2110.08812 (cross-list from eess.IV) [pdf]
Title: Rheumatoid Arthritis: Automated Scoring of Radiographic Joint Damage
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Rheumatoid arthritis is an autoimmune disease that causes joint damage due to inflammation in the soft tissue lining the joints known as the synovium. It is vital to identify joint damage as soon as possible to provide necessary treatment early and prevent further damage to the bone structures. Radiographs are often used to assess the extent of the joint damage. Currently, the scoring of joint damage from the radiograph takes expertise, effort, and time. Joint damage associated with rheumatoid arthritis is also not quantitated in clinical practice and subjective descriptors are used. In this work, we describe a pipeline of deep learning models to automatically identify and score rheumatoid arthritic joint damage from a radiographic image. Our automatic tool was shown to produce scores with extremely high balanced accuracy within a couple of minutes and utilizing this would remove the subjectivity of the scores between human reviewers.

[184]  arXiv:2110.08825 (cross-list from cs.CV) [pdf, other]
Title: Localization with Sampling-Argmax
Comments: NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Soft-argmax operation is commonly adopted in detection-based methods to localize the target position in a differentiable manner. However, training the neural network with soft-argmax makes the shape of the probability map unconstrained. Consequently, the model lacks pixel-wise supervision through the map during training, leading to performance degradation. In this work, we propose sampling-argmax, a differentiable training method that imposes implicit constraints to the shape of the probability map by minimizing the expectation of the localization error. To approximate the expectation, we introduce a continuous formulation of the output distribution and develop a differentiable sampling process. The expectation can be approximated by calculating the average error of all samples drawn from the output distribution. We show that sampling-argmax can seamlessly replace the conventional soft-argmax operation on various localization tasks. Comprehensive experiments demonstrate the effectiveness and flexibility of the proposed method. Code is available at https://github.com/Jeff-sjtu/sampling-argmax

[185]  arXiv:2110.08828 (cross-list from cs.CV) [pdf]
Title: Compression-aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations
Comments: 5 pages, 5 figures, submitted to 2022 ICASSP
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)

Convolutional neural networks (CNNs) achieve remarkable performance in a wide range of fields. However, intensive memory access of activations introduces considerable energy consumption, impeding deployment of CNNs on resourceconstrained edge devices. Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. Nevertheless, in the case of aggressive dimension reduction, these methods lead to severe accuracy drop. To improve the trade-off between classification accuracy and compression ratio, we propose a compression-aware projection system, which employs a learnable projection to compensate for the reconstruction loss. In addition, a greedy selection metric is introduced to optimize the layer-wise compression ratio allocation by considering both accuracy and #bits reduction simultaneously. Our test results show that the proposed methods effectively reduce 2.91x~5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.

[186]  arXiv:2110.08850 (cross-list from physics.soc-ph) [pdf]
Title: Understanding the network formation pattern for better link prediction
Comments: 21 pages, 3 figures, 18 tables, and 29 references
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Molecular Networks (q-bio.MN); Machine Learning (stat.ML)

As a classical problem in the field of complex networks, link prediction has attracted much attention from researchers, which is of great significance to help us understand the evolution and dynamic development mechanisms of networks. Although various network type-specific algorithms have been proposed to tackle the link prediction problem, most of them suppose that the network structure is dominated by the Triadic Closure Principle. We still lack an adaptive and comprehensive understanding of network formation patterns for predicting potential links. In addition, it is valuable to investigate how network local information can be better utilized. To this end, we proposed a novel method named Link prediction using Multiple Order Local Information (MOLI) that exploits the local information from the neighbors of different distances, with parameters that can be a prior-driven based on prior knowledge, or data-driven by solving an optimization problem on observed networks. MOLI defined a local network diffusion process via random walks on the graph, resulting in better use of network information. We show that MOLI outperforms the other 11 widely used link prediction algorithms on 11 different types of simulated and real-world networks. We also conclude that there are different patterns of local information utilization for different networks, including social networks, communication networks, biological networks, etc. In particular, the classical common neighbor-based algorithm is not as adaptable to all social networks as it is perceived to be; instead, some of the social networks obey the Quadrilateral Closure Principle which preferentially connects paths of length three.

[187]  arXiv:2110.08855 (cross-list from cs.CV) [pdf, other]
Title: Online Continual Learning Via Candidates Voting
Comments: Accepted paper at Winter Conference on Applications of Computer Vision (WACV 2022)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Continual learning in online scenario aims to learn a sequence of new tasks from data stream using each data only once for training, which is more realistic than in offline mode assuming data from new task are all available. However, this problem is still under-explored for the challenging class-incremental setting in which the model classifies all classes seen so far during inference. Particularly, performance struggles with increased number of tasks or additional classes to learn for each task. In addition, most existing methods require storing original data as exemplars for knowledge replay, which may not be feasible for certain applications with limited memory budget or privacy concerns. In this work, we introduce an effective and memory-efficient method for online continual learning under class-incremental setting through candidates selection from each learned task together with prior incorporation using stored feature embeddings instead of original data as exemplars. Our proposed method implemented for image classification task achieves the best results under different benchmark datasets for online continual learning including CIFAR-10, CIFAR-100 and CORE-50 while requiring much less memory resource compared with existing works.

[188]  arXiv:2110.08862 (cross-list from eess.AS) [pdf, other]
Title: Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Along with the evolution of music technology, a large number of styles, or "subgenres," of Electronic Dance Music(EDM) have emerged in recent years. While the classification task of distinguishing between EDM and non-EDM has been often studied in the context of music genre classification, little work has been done on the more challenging EDM subgenre classification. The state-of-art model is based on extremely randomized trees and could be improved by deep learning methods. In this paper, we extend the state-of-art music auto-tagging model "short-chunkCNN+Resnet" to EDM subgenre classification, with the addition of two mid-level tempo-related feature representations, called the Fourier tempogram and autocorrelation tempogram. And, we explore two fusion strategies, early fusion and late fusion, to aggregate the two types of tempograms. We evaluate the proposed models using a large dataset consisting of 75,000 songs for 30 different EDM subgenres, and show that the adoption of deep learning models and tempo features indeed leads to higher classification accuracy.

[189]  arXiv:2110.08872 (cross-list from cs.CV) [pdf, other]
Title: Contrastive Learning of Visual-Semantic Embeddings
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM)

Contrastive learning is a powerful technique to learn representations that are semantically distinctive and geometrically invariant. While most of the earlier approaches have demonstrated its effectiveness on single-modality learning tasks such as image classification, recently there have been a few attempts towards extending this idea to multi-modal data. In this paper, we propose two loss functions based on normalized cross-entropy to perform the task of learning joint visual-semantic embedding using batch contrastive training. In a batch, for a given anchor point from one modality, we consider its negatives only from another modality, and define our first contrastive loss based on expected violations incurred by all the negatives. Next, we update this loss and define the second contrastive loss based on the violation incurred only by the hardest negative. We compare our results with existing visual-semantic embedding methods on cross-modal image-to-text and text-to-image retrieval tasks using the MS-COCO and Flickr30K datasets, where we outperform the state-of-the-art on the MS-COCO dataset and achieve comparable results on the Flickr30K dataset.

[190]  arXiv:2110.08875 (cross-list from cs.CL) [pdf, other]
Title: Predicting the Performance of Multilingual NLP Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent advancements in NLP have given us models like mBERT and XLMR that can serve over 100 languages. The languages that these models are evaluated on, however, are very few in number, and it is unlikely that evaluation datasets will cover all the languages that these models support. Potential solutions to the costly problem of dataset creation are to translate datasets to new languages or use template-filling based techniques for creation. This paper proposes an alternate solution for evaluating a model across languages which make use of the existing performance scores of the model on languages that a particular task has test sets for. We train a predictor on these performance scores and use this predictor to predict the model's performance in different evaluation settings. Our results show that our method is effective in filling the gaps in the evaluation for an existing set of languages, but might require additional improvements if we want it to generalize to unseen languages.

[191]  arXiv:2110.08884 (cross-list from stat.ML) [pdf, other]
Title: Persuasion by Dimension Reduction
Comments: arXiv admin note: text overlap with arXiv:2102.10909
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); General Economics (econ.GN); Statistics Theory (math.ST); Methodology (stat.ME)

How should an agent (the sender) observing multi-dimensional data (the state vector) persuade another agent to take the desired action? We show that it is always optimal for the sender to perform a (non-linear) dimension reduction by projecting the state vector onto a lower-dimensional object that we call the "optimal information manifold." We characterize geometric properties of this manifold and link them to the sender's preferences. Optimal policy splits information into "good" and "bad" components. When the sender's marginal utility is linear, revealing the full magnitude of good information is always optimal. In contrast, with concave marginal utility, optimal information design conceals the extreme realizations of good information and only reveals its direction (sign). We illustrate these effects by explicitly solving several multi-dimensional Bayesian persuasion problems.

[192]  arXiv:2110.08890 (cross-list from cs.CV) [pdf, other]
Title: Network Augmentation for Tiny Deep Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks. Existing regularization techniques (e.g., data augmentation, dropout) have shown much success on large neural networks (e.g., ResNet50) by adding noise to overcome over-fitting. However, we found these techniques hurt the performance of tiny neural networks. We argue that training tiny models are different from large models: rather than augmenting the data, we should augment the model, since tiny models tend to suffer from under-fitting rather than over-fitting due to limited capacity. To alleviate this issue, NetAug augments the network (reverse dropout) instead of inserting noise into the dataset or the network. It puts the tiny model into larger models and encourages it to work as a sub-model of larger models to get extra supervision, in addition to functioning as an independent model. At test time, only the tiny model is used for inference, incurring zero inference overhead. We demonstrate the effectiveness of NetAug on image classification and object detection. NetAug consistently improves the performance of tiny models, achieving up to 2.1% accuracy improvement on ImageNet, and 4.3% on Cars. On Pascal VOC, NetAug provides 2.96% mAP improvement with the same computational cost.

[193]  arXiv:2110.08901 (cross-list from gr-qc) [pdf, other]
Title: Gravitational wave surrogates through automated machine learning
Comments: 15 pages, 7 figures
Subjects: General Relativity and Quantum Cosmology (gr-qc); Machine Learning (cs.LG)

We analyze a prospect for predicting gravitational waveforms from compact binaries based on automated machine learning (AutoML) from around a hundred different possible regression models, without having to resort to tedious and manual case-by-case analyses and fine-tuning. The particular study of this article is within the context of the gravitational waves emitted by the collision of two spinless black holes in initial quasi-circular orbit. We find, for example, that approaches such as Gaussian process regression with radial bases as kernels do provide a sufficiently accurate solution, an approach which is generalizable to multiple dimensions with low computational evaluation cost. The results here presented suggest that AutoML might provide a framework for regression in the field of surrogates for gravitational waveforms. Our study is within the context of surrogates of numerical relativity simulations based on Reduced Basis and the Empirical Interpolation Method, where we find that for the particular case analyzed AutoML can produce surrogates which are essentially indistinguishable from the NR simulations themselves.

[194]  arXiv:2110.08904 (cross-list from cs.DL) [pdf]
Title: Deep forecasting of translational impact in medical research
Comments: 28 pages, 6 figures
Subjects: Digital Libraries (cs.DL); Machine Learning (cs.LG)

The value of biomedical research--a $1.7 trillion annual investment--is ultimately determined by its downstream, real-world impact. Current objective predictors of impact rest on proxy, reductive metrics of dissemination, such as paper citation rates, whose relation to real-world translation remains unquantified. Here we sought to determine the comparative predictability of future real-world translation--as indexed by inclusion in patents, guidelines or policy documents--from complex models of the abstract-level content of biomedical publications versus citations and publication meta-data alone. We develop a suite of representational and discriminative mathematical models of multi-scale publication data, quantifying predictive performance out-of-sample, ahead-of-time, across major biomedical domains, using the entire corpus of biomedical research captured by Microsoft Academic Graph from 1990 to 2019, encompassing 43.3 million papers across all domains. We show that citations are only moderately predictive of translational impact as judged by inclusion in patents, guidelines, or policy documents. By contrast, high-dimensional models of publication titles, abstracts and metadata exhibit high fidelity (AUROC > 0.9), generalise across time and thematic domain, and transfer to the task of recognising papers of Nobel Laureates. The translational impact of a paper indexed by inclusion in patents, guidelines, or policy documents can be predicted--out-of-sample and ahead-of-time--with substantially higher fidelity from complex models of its abstract-level content than from models of publication meta-data or citation metrics. We argue that content-based models of impact are superior in performance to conventional, citation-based measures, and sustain a stronger evidence-based claim to the objective measurement of translational potential.

[195]  arXiv:2110.08927 (cross-list from eess.SY) [pdf, other]
Title: MARTINI: Smart Meter Driven Estimation of HVAC Schedules and Energy Savings Based on WiFi Sensing and Clustering
Comments: submitted
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

HVAC systems account for a significant portion of building energy use. Nighttime setback scheduling is an energy conservation measure where cooling and heating setpoints are increased and decreased respectively during unoccupied periods with the goal of obtaining energy savings. However, knowledge of a building's real occupancy is required to maximize the success of this measure. In addition, there is the need for a scalable way to estimate energy savings potential from energy conservation measures that is not limited by building specific parameters and experimental or simulation modeling investments. Here, we propose MARTINI, a sMARt meTer drIveN estImation of occupant-derived HVAC schedules and energy savings that leverages the ubiquity of energy smart meters and WiFi infrastructure in commercial buildings. We estimate the schedules by clustering WiFi-derived occupancy profiles and, energy savings by shifting ramp-up and setback times observed in typical/measured load profiles obtained by clustering smart meter energy profiles. Our case-study results with five buildings over seven months show an average of 8.1%-10.8% (summer) and 0.2%-5.9% (fall) chilled water energy savings when HVAC system operation is aligned with occupancy. We validate our method with results from building energy performance simulation (BEPS) and find that estimated average savings of MARTINI are within 0.9%-2.4% of the BEPS predictions. In the absence of occupancy information, we can still estimate potential savings from increasing ramp-up time and decreasing setback start time. In 51 academic buildings, we find savings potentials between 1%-5%.

[196]  arXiv:2110.08936 (cross-list from stat.ML) [pdf, ps, other]
Title: Rejoinder: Learning Optimal Distributionally Robust Individualized Treatment Rules
Journal-ref: Journal of the American Statistical Association, 116:534, 699-707 (2021)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We thank the opportunity offered by editors for this discussion and the discussants for their insightful comments and thoughtful contributions. We also want to congratulate Kallus (2020) for his inspiring work in improving the efficiency of policy learning by retargeting. Motivated from the discussion in Dukes and Vansteelandt (2020), we first point out interesting connections and distinctions between our work and Kallus (2020) in Section 1. In particular, the assumptions and sources of variation for consideration in these two papers lead to different research problems with different scopes and focuses. In Section 2, following the discussions in Li et al. (2020); Liang and Zhao (2020), we also consider the efficient policy evaluation problem when we have some data from the testing distribution available at the training stage. We show that under the assumption that the sample sizes from training and testing are growing in the same order, efficient value function estimates can deliver competitive performance. We further show some connections of these estimates with existing literature. However, when the growth of testing sample size available for training is in a slower order, efficient value function estimates may not perform well anymore. In contrast, the requirement of the testing sample size for DRITR is not as strong as that of efficient policy evaluation using the combined data. Finally, we highlight the general applicability and usefulness of DRITR in Section 3.

[197]  arXiv:2110.08956 (cross-list from eess.SY) [pdf, other]
Title: Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training
Comments: Published at 2021 ICML RL4RL Workshop Submitted to 2022 PSCC
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Due to the proliferation of renewable energy and its intrinsic intermittency and stochasticity, current power systems face severe operational challenges. Data-driven decision-making algorithms from reinforcement learning (RL) offer a solution towards efficiently operating a clean energy system. Although RL algorithms achieve promising performance compared to model-based control models, there has been limited investigation of RL robustness in safety-critical physical systems. In this work, we first show that several competition-winning, state-of-the-art RL agents proposed for power system control are vulnerable to adversarial attacks. Specifically, we use an adversary Markov Decision Process to learn an attack policy, and demonstrate the potency of our attack by successfully attacking multiple winning agents from the Learning To Run a Power Network (L2RPN) challenge, under both white-box and black-box attack settings. We then propose to use adversarial training to increase the robustness of RL agent against attacks and avoid infeasible operational decisions. To the best of our knowledge, our work is the first to highlight the fragility of grid control RL algorithms, and contribute an effective defense scheme towards improving their robustness and security.

[198]  arXiv:2110.08989 (cross-list from stat.ML) [pdf, other]
Title: Valid and Exact Statistical Inference for Multi-dimensional Multiple Change-Points by Selective Inference
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In this paper, we study statistical inference of change-points (CPs) in multi-dimensional sequence. In CP detection from a multi-dimensional sequence, it is often desirable not only to detect the location, but also to identify the subset of the components in which the change occurs. Several algorithms have been proposed for such problems, but no valid exact inference method has been established to evaluate the statistical reliability of the detected locations and components. In this study, we propose a method that can guarantee the statistical reliability of both the location and the components of the detected changes. We demonstrate the effectiveness of the proposed method by applying it to the problems of genomic abnormality identification and human behavior analysis.

[199]  arXiv:2110.08991 (cross-list from cs.DS) [pdf, other]
Title: Dimensionality Reduction for Wasserstein Barycenter
Comments: Published as a conference paper in NeurIPS 2021
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Probability (math.PR)

The Wasserstein barycenter is a geometric construct which captures the notion of centrality among probability distributions, and which has found many applications in machine learning. However, most algorithms for finding even an approximate barycenter suffer an exponential dependence on the dimension $d$ of the underlying space of the distributions. In order to cope with this "curse of dimensionality," we study dimensionality reduction techniques for the Wasserstein barycenter problem. When the barycenter is restricted to support of size $n$, we show that randomized dimensionality reduction can be used to map the problem to a space of dimension $O(\log n)$ independent of both $d$ and $k$, and that \emph{any} solution found in the reduced dimension will have its cost preserved up to arbitrary small error in the original space. We provide matching upper and lower bounds on the size of the reduced dimension, showing that our methods are optimal up to constant factors. We also provide a coreset construction for the Wasserstein barycenter problem that significantly decreases the number of input distributions. The coresets can be used in conjunction with random projections and thus further improve computation time. Lastly, our experimental results validate the speedup provided by dimensionality reduction while maintaining solution quality.

[200]  arXiv:2110.09001 (cross-list from cs.IT) [pdf, ps, other]
Title: Deep Learning-Based Power Control for Uplink Cell-Free Massive MIMO Systems
Comments: 6 pages, 6 figures, accepted by IEEE Globecom 2021
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

In this paper, a general framework for deep learning-based power control methods for max-min, max-product and max-sum-rate optimization in uplink cell-free massive multiple-input multiple-output (CF mMIMO) systems is proposed. Instead of using supervised learning, the proposed method relies on unsupervised learning, in which optimal power allocations are not required to be known, and thus has low training complexity. More specifically, a deep neural network (DNN) is trained to learn the map between fading coefficients and power coefficients within short time and with low computational complexity. It is interesting to note that the spectral efficiency of CF mMIMO systems with the proposed method outperforms previous optimization methods for max-min optimization and fits well for both max-sum-rate and max-product optimizations.

[201]  arXiv:2110.09005 (cross-list from eess.SP) [pdf, other]
Title: Unsupervised Learned Kalman Filtering
Comments: 5 Pages, 5 Figures, Submitted to ICASSP 2022
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

In this paper we adapt KalmanNet, which is a recently pro-posed deep neural network (DNN)-aided system whose architecture follows the operation of the model-based Kalman filter (KF), to learn its mapping in an unsupervised manner, i.e., without requiring ground-truth states. The unsupervised adaptation is achieved by exploiting the hybrid model-based/data-driven architecture of KalmanNet, which internally predicts the next observation as the KF does. These internal features are then used to compute the loss rather than the state estimate at the output of the system. With the capability of unsupervised learning, one can use KalmanNet not only to track the hidden state, but also to adapt to variations in the state space (SS) model. We numerically demonstrate that when the noise statistics are unknown, unsupervised KalmanNet achieves a similar performance to KalmanNet with supervised learning. We also show that we can adapt a pre-trained KalmanNet to changing SS models without providing additional data thanks to the unsupervised capabilities.

[202]  arXiv:2110.09018 (cross-list from cs.RO) [pdf, other]
Title: Reinforcement Learning-Based Coverage Path Planning with Implicit Cellular Decomposition
Comments: 20 pages
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Coverage path planning in a generic known environment is shown to be NP-hard. When the environment is unknown, it becomes more challenging as the robot is required to rely on its online map information built during coverage for planning its path. A significant research effort focuses on designing heuristic or approximate algorithms that achieve reasonable performance. Such algorithms have sub-optimal performance in terms of covering the area or the cost of coverage, e.g., coverage time or energy consumption. In this paper, we provide a systematic analysis of the coverage problem and formulate it as an optimal stopping time problem, where the trade-off between coverage performance and its cost is explicitly accounted for. Next, we demonstrate that reinforcement learning (RL) techniques can be leveraged to solve the problem computationally. To this end, we provide some technical and practical considerations to facilitate the application of the RL algorithms and improve the efficiency of the solutions. Finally, through experiments in grid world environments and Gazebo simulator, we show that reinforcement learning-based algorithms efficiently cover realistic unknown indoor environments, and outperform the current state of the art.

[203]  arXiv:2110.09040 (cross-list from stat.ME) [pdf, ps, other]
Title: A Bayesian approach to multi-task learning with network lasso
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)

Network lasso is a method for solving a multi-task learning problem through the regularized maximum likelihood method. A characteristic of network lasso is setting a different model for each sample. The relationships among the models are represented by relational coefficients. A crucial issue in network lasso is to provide appropriate values for these relational coefficients. In this paper, we propose a Bayesian approach to solve multi-task learning problems by network lasso. This approach allows us to objectively determine the relational coefficients by Bayesian estimation. The effectiveness of the proposed method is shown in a simulation study and a real data analysis.

[204]  arXiv:2110.09076 (cross-list from math.OC) [pdf, ps, other]
Title: An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem
Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

There is a growing interest in integrating machine learning techniques and optimization to solve challenging optimization problems. In this work, we propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP). The aim is to build up a greedy-like heuristic able to learn on some distribution of JSSP instances, different in the number of jobs and machines. The need for fast scheduling methods is well known, and it arises in many areas, from transportation to healthcare. We model the JSSP as a Markov Decision Process and then we exploit the efficacy of reinforcement learning to solve the problem. We adopt an actor-critic scheme, where the action taken by the agent is influenced by policy considerations on the state-value function. The procedures are adapted to take into account the challenging nature of JSSP, where the state and the action space change not only for every instance but also after each decision. To tackle the variability in the number of jobs and operations in the input, we modeled the agent using two incident LSTM models, a special type of deep neural network. Experiments show the algorithm reaches good solutions in a short time, proving that is possible to generate new greedy heuristics just from learning-based methodologies. Benchmarks have been generated in comparison with the commercial solver CPLEX. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.

[205]  arXiv:2110.09083 (cross-list from cs.IR) [pdf, other]
Title: Learning to Learn a Cold-start Sequential Recommender
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The cold-start recommendation is an urgent problem in contemporary online applications. It aims to provide users whose behaviors are literally sparse with as accurate recommendations as possible. Many data-driven algorithms, such as the widely used matrix factorization, underperform because of data sparseness. This work adopts the idea of meta-learning to solve the user's cold-start recommendation problem. We propose a meta-learning based cold-start sequential recommendation framework called metaCSR, including three main components: Diffusion Representer for learning better user/item embedding through information diffusion on the interaction graph; Sequential Recommender for capturing temporal dependencies of behavior sequences; Meta Learner for extracting and propagating transferable knowledge of prior users and learning a good initialization for new users. metaCSR holds the ability to learn the common patterns from regular users' behaviors and optimize the initialization so that the model can quickly adapt to new users after one or a few gradient updates to achieve optimal performance. The extensive quantitative experiments on three widely-used datasets show the remarkable performance of metaCSR in dealing with user cold-start problem. Meanwhile, a series of qualitative analysis demonstrates that the proposed metaCSR has good generalization.

[206]  arXiv:2110.09107 (cross-list from cs.CV) [pdf, other]
Title: Differentiable Rendering with Perturbed Optimizers
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision. Solutions to this inverse and ill-posed problem typically involve a search for models that best explain observed image data. Notably, images depend both on the properties of observed scenes and on the process of image formation. Hence, if optimization techniques should be used to explain images, it is crucial to design differentiable functions for the projection of 3D scenes into images, also known as differentiable rendering. Previous approaches to differentiable rendering typically replace non-differentiable operations by smooth approximations, impacting the subsequent 3D estimation. In this paper, we take a more general approach and study differentiable renderers through the prism of randomized optimization and the related notion of perturbed optimizers. In particular, our work highlights the link between some well-known differentiable renderer formulations and randomly smoothed optimizers, and introduces differentiable perturbed renderers. We also propose a variance reduction mechanism to alleviate the computational burden inherent to perturbed optimizers and introduce an adaptive scheme to automatically adjust the smoothing parameters of the rendering process. We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction. By providing informative gradients that can be used as a strong supervisory signal, we demonstrate the benefits of perturbed renderers to obtain more accurate solutions when compared to the state-of-the-art alternatives using smooth gradient approximations.

[207]  arXiv:2110.09111 (cross-list from cs.AI) [pdf, other]
Title: Analyzing Wikipedia Membership Dataset and PredictingUnconnected Nodes in the Signed Networks
Comments: The work was done in UCLA CS249 17Spring
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

In the age of digital interaction, person-to-person relationships existing on social media may be different from the very same interactions that exist offline. Examining potential or spurious relationships between members in a social network is a fertile area of research for computer scientists -- here we examine how relationships can be predicted between two unconnected people in a social network by using area under Precison-Recall curve and ROC. Modeling the social network as a signed graph, we compare Triadic model,Latent Information model and Sentiment model and use them to predict peer to peer interactions, first using a plain signed network, and second using a signed network with comments as context. We see that our models are much better than random model and could complement each other in different cases.

[208]  arXiv:2110.09116 (cross-list from cs.SD) [pdf, ps, other]
Title: Real Additive Margin Softmax for Speaker Verification
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

The additive margin softmax (AM-Softmax) loss has delivered remarkable performance in speaker verification. A supposed behavior of AM-Softmax is that it can shrink within-class variation by putting emphasis on target logits, which in turn improves margin between target and non-target classes. In this paper, we conduct a careful analysis on the behavior of AM-Softmax loss, and show that this loss does not implement real max-margin training. Based on this observation, we present a Real AM-Softmax loss which involves a true margin function in the softmax training. Experiments conducted on VoxCeleb1, SITW and CNCeleb demonstrated that the corrected AM-Softmax loss consistently outperforms the original one. The code has been released at https://gitlab.com/csltstu/sunine.

[209]  arXiv:2110.09147 (cross-list from cs.CL) [pdf, other]
Title: BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Natural language processing (NLP) systems are increasingly trained to generate open-ended text rather than classifying between responses. This makes research on evaluation metrics for generated language -- functions that score system output given the context and/or human reference responses -- of critical importance. However, different metrics have different strengths and biases, and reflect human intuitions better on some tasks than others. There is currently no simple, unified way to compare, analyse or evaluate metrics across a representative set of tasks. Here, we describe the Benchmark to Evaluate Automatic Metrics (BEAMetrics), a resource to make research into new metrics itself easier to evaluate. BEAMetrics users can quickly compare existing and new metrics with human judgements across a diverse set of tasks, quality dimensions (fluency vs. coherence vs. informativeness etc), and languages. As generation experts might predict, BEAMetrics reveals stark task-dependent differences between existing metrics, and consistently poor performance on tasks with complex answer spaces or high reliance on general knowledge. While this analysis highlights a critical issue facing current research practice, BEAMetrics also contribute to its resolution by facilitating research into better metrics -- particularly those that can account for the complex interaction between context and general knowledge inherent to many modern NLP applications. BEAMetrics is available under the MIT License: https://github.com/ThomasScialom/BEAMetrics

[210]  arXiv:2110.09167 (cross-list from stat.ML) [pdf, other]
Title: RKHS-SHAP: Shapley Values for Kernel Methods
Comments: 11 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values, a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing Shapley values from a functional perspective, we propose \textsc{RKHS-SHAP}, an attribution method for kernel machines that can efficiently compute both \emph{Interventional} and \emph{Observational Shapley values} using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations - a key yet often overlooked desideratum for interpretability. Further, we propose \emph{Shapley regulariser}, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature's contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the Shapley values of sensitive features.

[211]  arXiv:2110.09168 (cross-list from cs.CV) [pdf, other]
Title: Domain Generalisation for Apparent Emotional Facial Expression Recognition across Age-Groups
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Apparent emotional facial expression recognition has attracted a lot of research attention recently. However, the majority of approaches ignore age differences and train a generic model for all ages. In this work, we study the effect of using different age-groups for training apparent emotional facial expression recognition models. To this end, we study Domain Generalisation in the context of apparent emotional facial expression recognition from facial imagery across different age groups. We first compare several domain generalisation algorithms on the basis of out-of-domain-generalisation, and observe that the Class-Conditional Domain-Adversarial Neural Networks (CDANN) algorithm has the best performance. We then study the effect of variety and number of age-groups used during training on generalisation to unseen age-groups and observe that an increase in the number of training age-groups tends to increase the apparent emotional facial expression recognition performance on unseen age-groups. We also show that exclusion of an age-group during training tends to affect more the performance of the neighbouring age groups.

[212]  arXiv:2110.09209 (cross-list from physics.ao-ph) [pdf, other]
Title: Graph-based Local Climate Classification in Iran
Comments: Accepted in International Journal of Climatology, 2021
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG)

In this paper, we introduce a novel graph-based method to classify the regions with similar climate in a local area. We refer our proposed method as Graph Partition Based Method (GPBM). Our proposed method attempts to overcome the shortcomings of the current state-of-the-art methods in the literature. It has no limit on the number of variables that can be used and also preserves the nature of climate data. To illustrate the capability of our proposed algorithm, we benchmark its performance with other state-of-the-art climate classification techniques. The climate data is collected from 24 synoptic stations in Fars province in southern Iran. The data includes seven climate variables stored as time series from 1951 to 2017. Our results exhibit that our proposed method performs a more realistic climate classification with less computational time. It can save more information during the climate classification process and is therefore efficient in further data analysis. Furthermore, using our method, we can introduce seasonal graphs to better investigate seasonal climate changes. To the best of our knowledge, our proposed method is the first graph-based climate classification system.

[213]  arXiv:2110.09234 (cross-list from cs.CY) [pdf, other]
Title: Impact of COVID-19 Policies and Misinformation on Social Unrest
Authors: Martha Barnard (1), Radhika Iyer (1 and 2), Sara Y. Del Valle (1), Ashlynn R. Daughton (1) ((1) A-1 Information Systems and Modeling, Los Alamos National Lab, Los Alamos, NM, USA, (2) Department of Political Science and Department of Computing, Data Science, and Society, University of California, Berkeley, Berkeley, CA, USA)
Comments: 21 pages, 9 figures
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP)

The novel coronavirus disease (COVID-19) pandemic has impacted every corner of earth, disrupting governments and leading to socioeconomic instability. This crisis has prompted questions surrounding how different sectors of society interact and influence each other during times of change and stress. Given the unprecedented economic and societal impacts of this pandemic, many new data sources have become available, allowing us to quantitatively explore these associations. Understanding these relationships can help us better prepare for future disasters and mitigate the impacts. Here, we focus on the interplay between social unrest (protests), health outcomes, public health orders, and misinformation in eight countries of Western Europe and four regions of the United States. We created 1-3 week forecasts of both a binary protest metric for identifying times of high protest activity and the overall protest counts over time. We found that for all regions, except Belgium, at least one feature from our various data streams was predictive of protests. However, the accuracy of the protest forecasts varied by country, that is, for roughly half of the countries analyzed, our forecasts outperform a na\"ive model. These mixed results demonstrate the potential of diverse data streams to predict a topic as volatile as protests as well as the difficulties of predicting a situation that is as rapidly evolving as a pandemic.

[214]  arXiv:2110.09253 (cross-list from cs.CY) [pdf]
Title: A Sociotechnical View of Algorithmic Fairness
Comments: Accepted at Information Systems Journal
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)

Algorithmic fairness has been framed as a newly emerging technology that mitigates systemic discrimination in automated decision-making, providing opportunities to improve fairness in information systems (IS). However, based on a state-of-the-art literature review, we argue that fairness is an inherently social concept and that technologies for algorithmic fairness should therefore be approached through a sociotechnical lens. We advance the discourse on algorithmic fairness as a sociotechnical phenomenon. Our research objective is to embed AF in the sociotechnical view of IS. Specifically, we elaborate on why outcomes of a system that uses algorithmic means to assure fairness depends on mutual influences between technical and social structures. This perspective can generate new insights that integrate knowledge from both technical fields and social studies. Further, it spurs new directions for IS debates. We contribute as follows: First, we problematize fundamental assumptions in the current discourse on algorithmic fairness based on a systematic analysis of 310 articles. Second, we respond to these assumptions by theorizing algorithmic fairness as a sociotechnical construct. Third, we propose directions for IS researchers to enhance their impacts by pursuing a unique understanding of sociotechnical algorithmic fairness. We call for and undertake a holistic approach to AF. A sociotechnical perspective on algorithmic fairness can yield holistic solutions to systemic biases and discrimination.

[215]  arXiv:2110.09294 (cross-list from eess.IV) [pdf]
Title: Comparative Analysis of Deep Learning Algorithms for Classification of COVID-19 X-Ray Images
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The Coronavirus was first emerged in December, in the city of China named Wuhan in 2019 and spread quickly all over the world. It has very harmful effects all over the global economy, education, social, daily living and general health of humans. To restrict the quick expansion of the disease initially, main difficulty is to explore the positive corona patients as quickly as possible. As there are no automatic tool kits accessible the requirement for supplementary diagnostic tools has risen up. Previous studies have findings acquired from radiological techniques proposed that this kind of images have important details related to the coronavirus. The usage of modified Artificial Intelligence (AI) system in combination with radio-graphical images can be fruitful for the precise and exact solution of this virus and can also be helpful to conquer the issue of deficiency of professional physicians in distant villages. In our research, we analyze the different techniques for the detection of COVID-19 using X-Ray radiographic images of the chest, we examined the different pre-trained CNN models AlexNet, VGG-16, MobileNet-V2, SqeezeNet, ResNet-34, ResNet-50 and COVIDX-Net to correct analytics for classification system of COVID-19. Our study shows that the pre trained CNN Model with ResNet-34 technique gives the higher accuracy rate of 98.33, 96.77% precision, and 98.36 F1-score, which is better than other CNN techniques. Our model may be helpful for the researchers to fine train the CNN model for the the quick screening of COVID patients.

[216]  arXiv:2110.09310 (cross-list from cs.AR) [pdf, other]
Title: Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention
Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

In recent years, transformer models have revolutionized Natural Language Processing (NLP) and also show promising performance on Computer Vision (CV) tasks. Despite their effectiveness, transformers' attention operations are hard to accelerate due to complicated data movement and quadratic computational complexity, prohibiting the real-time inference on resource-constrained edge-computing platforms.
To tackle this challenge, we propose Energon, an algorithm-architecture co-design approach that accelerates various transformers using dynamic sparse attention. With the observation that attention results only depend on a few important query-key pairs, we propose a multi-round filtering algorithm to dynamically identify such pairs at runtime. We adopt low bitwidth in each filtering round and only use high-precision tensors in the attention stage to reduce overall complexity. By this means, we significantly mitigate the computational cost with negligible accuracy loss. To enable such an algorithm with lower latency and better energy-efficiency, we also propose an Energon co-processor architecture. Elaborated pipelines and specialized optimizations jointly boost the performance and reduce power consumption. Extensive experiments on both NLP and CV benchmarks demonstrate that Energon achieves $161\times$ and $8.4\times$ geo-mean speedup and up to $10^4\times$ and $10^3\times$ energy reduction compared with Intel Xeon 5220 CPU and NVIDIA V100 GPU. Compared to state-of-the-art attention accelerators SpAtten and $A^3$, Energon also achieves $1.7\times, 1.25\times$ speedup and $1.6 \times, 1.5\times $ higher energy efficiency.

[217]  arXiv:2110.09315 (cross-list from q-fin.GN) [pdf, other]
Title: Predicting Status of Pre and Post M&A Deals Using Machine Learning and Deep Learning Techniques
Comments: 21 pages
Subjects: General Finance (q-fin.GN); Machine Learning (cs.LG)

Risk arbitrage or merger arbitrage is a well-known investment strategy that speculates on the success of M&A deals. Prediction of the deal status in advance is of great importance for risk arbitrageurs. If a deal is mistakenly classified as a completed deal, then enormous cost can be incurred as a result of investing in target company shares. On the contrary, risk arbitrageurs may lose the opportunity of making profit. In this paper, we present an ML and DL based methodology for takeover success prediction problem. We initially apply various ML techniques for data preprocessing such as kNN for data imputation, PCA for lower dimensional representation of numerical variables, MCA for categorical variables, and LSTM autoencoder for sentiment scores. We experiment with different cost functions, different evaluation metrics, and oversampling techniques to address class imbalance in our dataset. We then implement feedforward neural networks to predict the success of the deal status. Our preliminary results indicate that our methodology outperforms the benchmark models such as logit and weighted logit models. We also integrate sentiment scores into our methodology using different model architectures, but our preliminary results show that the performance is not changing much compared to the simple FFNN framework. We will explore different architectures and employ a thorough hyperparameter tuning for sentiment scores as a future work.

[218]  arXiv:2110.09326 (cross-list from cond-mat.mtrl-sci) [pdf, other]
Title: Neural message passing for predicting abnormal grain growth in Monte Carlo simulations of microstructural evolution
Comments: 17 pages, 11 figures
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

Abnormal grain growth can significantly alter the properties of materials during processing. This can cause significant variation in the properties and performance of in-spec feedstock components subjected to identical processing paths. Understanding and controlling abnormal grain growth has proved to be elusive due to the stochastic nature of this phenomenon. However, recent advances in deep learning provide a promising alternative to traditional experimental and physics-based methods for understanding this phenomenon. Neural message passing allows deep learning to be applied to irregular inputs including graph representations of grain structures in a material. In this study we generate a large database of Monte Carlo simulations of abnormal grain growth in an idealized system. We apply message passing neural networks to predict the occurrence of abnormal grain growth in these simulations using only the initial state of the system as input. A computer vision model is also trained for the same task for comparison. The preliminary results indicate that the message passing approach outperforms the computer vision method and achieved 75% prediction accuracy, significantly better than random guessing. Analysis of the uncertainty in the Monte Carlo simulations provides a road map for ongoing work on this project.

[219]  arXiv:2110.09338 (cross-list from cs.CL) [pdf, other]
Title: Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches
Comments: Accepted at HASOC @Forum for Information Retrieval Evaluation(FIRE) 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In the recent past, social media platforms have helped people in connecting and communicating to a wider audience. But this has also led to a drastic increase in cyberbullying. It is essential to detect and curb hate speech to keep the sanity of social media platforms. Also, code mixed text containing more than one language is frequently used on these platforms. We, therefore, propose automated techniques for hate speech detection in code mixed text from scraped Twitter. We specifically focus on code mixed English-Hindi text and transformer-based approaches. While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets. We try to evaluate the performances of multilingual BERT and Indic-BERT in single-encoder and dual-encoder settings. The first approach is to concatenate the target text and context text using a separator token and get a single representation from the BERT model. The second approach encodes the two texts independently using a dual BERT encoder and the corresponding representations are averaged. We show that the dual-encoder approach using independent representations yields better performance. We also employ simple ensemble methods to further improve the performance. Using these methods we were able to achieve the best F1 score of 73.07% on the HASOC 2021 ICHCL code mixed data set.

[220]  arXiv:2110.09348 (cross-list from cs.CV) [pdf, other]
Title: Understanding Dimensional Collapse in Contrastive Self-supervised Learning
Comments: 15 pages, 10 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Self-supervised visual representation learning aims to learn useful representations without relying on human annotations. Joint embedding approach bases on maximizing the agreement between embedding vectors from different views of the same image. Various methods have been proposed to solve the collapsing problem where all embedding vectors collapse to a trivial constant solution. Among these methods, contrastive learning prevents collapse via negative sample pairs. It has been shown that non-contrastive methods suffer from a lesser collapse problem of a different nature: dimensional collapse, whereby the embedding vectors end up spanning a lower-dimensional subspace instead of the entire available embedding space. Here, we show that dimensional collapse also happens in contrastive learning. In this paper, we shed light on the dynamics at play in contrastive learning that leads to dimensional collapse. Inspired by our theory, we propose a novel contrastive learning method, called DirectCLR, which directly optimizes the representation space without relying on a trainable projector. Experiments show that DirectCLR outperforms SimCLR with a trainable linear projector on ImageNet.

[221]  arXiv:2110.09360 (cross-list from stat.ML) [pdf, other]
Title: Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learning
Comments: 22 pages, 13 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Accurate determination of fuel properties of complex mixtures over a wide range of pressure and temperature conditions is essential to utilizing alternative fuels. The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach. Here, Gaussian Process (GP) and probabilistic generative models are adopted. GP is a popular non-parametric Bayesian approach to build surrogate models mainly due to its capacity to handle the aleatory and epistemic uncertainties. Generative models have shown the ability of deep neural networks employed with the same intent. In this work, ML analysis is focused on a particular property, the fuel density, but it can also be extended to other physicochemical properties. This study explores the versatility of the ML models to handle multi-fidelity data. The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.

[222]  arXiv:2110.09361 (cross-list from stat.ML) [pdf, other]
Title: Efficient Exploration in Binary and Preferential Bayesian Optimization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

Bayesian optimization (BO) is an effective approach to optimize expensive black-box functions, that seeks to trade-off between exploitation (selecting parameters where the maximum is likely) and exploration (selecting parameters where we are uncertain about the objective function). In many real-world situations, direct measurements of the objective function are not possible, and only binary measurements such as success/failure or pairwise comparisons are available. To perform efficient exploration in this setting, we show that it is important for BO algorithms to distinguish between different types of uncertainty: epistemic uncertainty, about the unknown objective function, and aleatoric uncertainty, which comes from noisy observations and cannot be reduced. In effect, only the former is important for efficient exploration. Based on this, we propose several new acquisition functions that outperform state-of-the-art heuristics in binary and preferential BO, while being fast to compute and easy to implement. We then generalize these acquisition rules to batch learning, where multiple queries are performed simultaneously.

[223]  arXiv:2110.09374 (cross-list from cs.CV) [pdf, other]
Title: Ortho-Shot: Low Displacement Rank Regularization with Data Augmentation for Few-Shot Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

In few-shot classification, the primary goal is to learn representations from a few samples that generalize well for novel classes. In this paper, we propose an efficient low displacement rank (LDR) regularization strategy termed Ortho-Shot; a technique that imposes orthogonal regularization on the convolutional layers of a few-shot classifier, which is based on the doubly-block toeplitz (DBT) matrix structure. The regularized convolutional layers of the few-shot classifier enhances model generalization and intra-class feature embeddings that are crucial for few-shot learning. Overfitting is a typical issue for few-shot models, the lack of data diversity inhibits proper model inference which weakens the classification accuracy of few-shot learners to novel classes. In this regard, we broke down the pipeline of the few-shot classifier and established that the support, query and task data augmentation collectively alleviates overfitting in networks. With compelling results, we demonstrated that combining a DBT-based low-rank orthogonal regularizer with data augmentation strategies, significantly boosts the performance of a few-shot classifier. We perform our experiments on the miniImagenet, CIFAR-FS and Stanford datasets with performance values of about 5\% when compared to state-of-the-art

[224]  arXiv:2110.09383 (cross-list from cs.AI) [pdf, other]
Title: Neuro-Symbolic Forward Reasoning
Comments: Preprint
Subjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Reasoning is an essential part of human intelligence and thus has been a long-standing goal in artificial intelligence research. With the recent success of deep learning, incorporating reasoning with deep learning systems, i.e., neuro-symbolic AI has become a major field of interest. We propose the Neuro-Symbolic Forward Reasoner (NSFR), a new approach for reasoning tasks taking advantage of differentiable forward-chaining using first-order logic. The key idea is to combine differentiable forward-chaining reasoning with object-centric (deep) learning. Differentiable forward-chaining reasoning computes logical entailments smoothly, i.e., it deduces new facts from given facts and rules in a differentiable manner. The object-centric learning approach factorizes raw inputs into representations in terms of objects. Thus, it allows us to provide a consistent framework to perform the forward-chaining inference from raw inputs. NSFR factorizes the raw inputs into the object-centric representations, converts them into probabilistic ground atoms, and finally performs differentiable forward-chaining inference using weighted rules for inference. Our comprehensive experimental evaluations on object-centric reasoning data sets, 2D Kandinsky patterns and 3D CLEVR-Hans, and a variety of tasks show the effectiveness and advantage of our approach.

[225]  arXiv:2110.09401 (cross-list from cs.CV) [pdf, other]
Title: Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The analysis of deforming 3D surface meshes is accelerated by autoencoders since the low-dimensional embeddings can be used to visualize underlying dynamics. But, state-of-the-art mesh convolutional autoencoders require a fixed connectivity of all input meshes handled by the autoencoder. This is due to either the use of spectral convolutional layers or mesh dependent pooling operations. Therefore, the types of datasets that one can study are limited and the learned knowledge cannot be transferred to other datasets that exhibit similar behavior. To address this, we transform the discretization of the surfaces to semi-regular meshes that have a locally regular connectivity and whose meshing is hierarchical. This allows us to apply the same spatial convolutional filters to the local neighborhoods and to define a pooling operator that can be applied to every semi-regular mesh. We apply the same mesh autoencoder to different datasets and our reconstruction error is more than 50% lower than the error from state-of-the-art models, which have to be trained for every mesh separately. Additionally, we visualize the underlying dynamics of unseen mesh sequences with an autoencoder trained on different classes of meshes.

[226]  arXiv:2110.09413 (cross-list from q-bio.GN) [pdf, other]
Title: SGEN: Single-cell Sequencing Graph Self-supervised Embedding Network
Comments: 6 pages body + 2 pages reference
Subjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Single-cell sequencing has a significant role to explore biological processes such as embryonic development, cancer evolution, and cell differentiation. These biological properties can be presented by a two-dimensional scatter plot. However, single-cell sequencing data generally has very high dimensionality. Therefore, dimensionality reduction should be used to process the high dimensional sequencing data for 2D visualization and subsequent biological analysis. The traditional dimensionality reduction methods, which do not consider the structure characteristics of single-cell sequencing data, are difficult to reveal the data structure in the 2D representation. In this paper, we develop a 2D feature representation method based on graph convolutional networks (GCN) for the visualization of single-cell data, termed single-cell sequencing graph embedding networks (SGEN). This method constructs the graph by the similarity relationship between cells and adopts GCN to analyze the neighbor embedding information of samples, which makes the similar cell closer to each other on the 2D scatter plot. The results show SGEN achieves obvious 2D distribution and preserves the high-dimensional relationship of different cells. Meanwhile, similar cell clusters have spatial continuity rather than relying heavily on random initialization, which can reflect the trajectory of cell development in this scatter plot.

[227]  arXiv:2110.09424 (cross-list from cs.CV) [pdf, other]
Title: Don't Judge Me by My Face : An Indirect Adversarial Approach to Remove Sensitive Information From Multimodal Neural Representation in Asynchronous Job Video Interviews
Comments: published in ACII 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

se of machine learning for automatic analysis of job interview videos has recently seen increased interest. Despite claims of fair output regarding sensitive information such as gender or ethnicity of the candidates, the current approaches rarely provide proof of unbiased decision-making, or that sensitive information is not used. Recently, adversarial methods have been proved to effectively remove sensitive information from the latent representation of neural networks. However, these methods rely on the use of explicitly labeled protected variables (e.g. gender), which cannot be collected in the context of recruiting in some countries (e.g. France). In this article, we propose a new adversarial approach to remove sensitive information from the latent representation of neural networks without the need to collect any sensitive variable. Using only a few frames of the interview, we train our model to not be able to find the face of the candidate related to the job interview in the inner layers of the model. This, in turn, allows us to remove relevant private information from these layers. Comparing our approach to a standard baseline on a public dataset with gender and ethnicity annotations, we show that it effectively removes sensitive information from the main network. Moreover, to the best of our knowledge, this is the first application of adversarial techniques for obtaining a multimodal fair representation in the context of video job interviews. In summary, our contributions aim at improving fairness of the upcoming automatic systems processing videos of job interviews for equality in job selection.

[228]  arXiv:2110.09428 (cross-list from cs.CV) [pdf, other]
Title: Distinguishing Natural and Computer-Generated Images using Multi-Colorspace fused EfficientNet
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The problem of distinguishing natural images from photo-realistic computer-generated ones either addresses natural images versus computer graphics or natural images versus GAN images, at a time. But in a real-world image forensic scenario, it is highly essential to consider all categories of image generation, since in most cases image generation is unknown. We, for the first time, to our best knowledge, approach the problem of distinguishing natural images from photo-realistic computer-generated images as a three-class classification task classifying natural, computer graphics, and GAN images. For the task, we propose a Multi-Colorspace fused EfficientNet model by parallelly fusing three EfficientNet networks that follow transfer learning methodology where each network operates in different colorspaces, RGB, LCH, and HSV, chosen after analyzing the efficacy of various colorspace transformations in this image forensics problem. Our model outperforms the baselines in terms of accuracy, robustness towards post-processing, and generalizability towards other datasets. We conduct psychophysics experiments to understand how accurately humans can distinguish natural, computer graphics, and GAN images where we could observe that humans find difficulty in classifying these images, particularly the computer-generated images, indicating the necessity of computational algorithms for the task. We also analyze the behavior of our model through visual explanations to understand salient regions that contribute to the model's decision making and compare with manual explanations provided by human participants in the form of region markings, where we could observe similarities in both the explanations indicating the powerful nature of our model to take the decisions meaningfully.

[229]  arXiv:2110.09431 (cross-list from cs.HC) [pdf, other]
Title: Comparing Deep Neural Nets with UMAP Tour
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Neural networks should be interpretable to humans. In particular, there is a growing interest in concepts learned in a layer and similarity between layers. In this work, a tool, UMAP Tour, is built to visually inspect and compare internal behavior of real-world neural network models using well-aligned, instance-level representations. The method used in the visualization also implies a new similarity measure between neural network layers. Using the visual tool and the similarity measure, we find concepts learned in state-of-the-art models and dissimilarities between them, such as GoogLeNet and ResNet.

[230]  arXiv:2110.09454 (cross-list from cs.CL) [pdf, other]
Title: SentimentArcs: A Novel Method for Self-Supervised Sentiment Analysis of Time Series Shows SOTA Transformers Can Struggle Finding Narrative Arcs
Authors: Jon Chun
Comments: 87 pages, 97 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

SOTA Transformer and DNN short text sentiment classifiers report over 97% accuracy on narrow domains like IMDB movie reviews. Real-world performance is significantly lower because traditional models overfit benchmarks and generalize poorly to different or more open domain texts. This paper introduces SentimentArcs, a new self-supervised time series sentiment analysis methodology that addresses the two main limitations of traditional supervised sentiment analysis: limited labeled training datasets and poor generalization. A large ensemble of diverse models provides a synthetic ground truth for self-supervised learning. Novel metrics jointly optimize an exhaustive search across every possible corpus:model combination. The joint optimization over both the corpus and model solves the generalization problem. Simple visualizations exploit the temporal structure in narratives so domain experts can quickly spot trends, identify key features, and note anomalies over hundreds of arcs and millions of data points. To our knowledge, this is the first self-supervised method for time series sentiment analysis and the largest survey directly comparing real-world model performance on long-form narratives.

[231]  arXiv:2110.09455 (cross-list from cs.CV) [pdf, other]
Title: TLDR: Twin Learning for Dimensionality Reduction
Comments: Code available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Dimensionality reduction methods are unsupervised approaches which learn low-dimensional spaces where some properties of the initial space, typically the notion of "neighborhood", are preserved. They are a crucial component of diverse tasks like visualization, compression, indexing, and retrieval. Aiming for a totally different goal, self-supervised visual representation learning has been shown to produce transferable representation functions by learning models that encode invariance to artificially created distortions, e.g. a set of hand-crafted image transformations. Unlike manifold learning methods that usually require propagation on large k-NN graphs or complicated optimization solvers, self-supervised learning approaches rely on simpler and more scalable frameworks for learning. In this paper, we unify these two families of approaches from the angle of manifold learning and propose TLDR, a dimensionality reduction method for generic input spaces that is porting the simple self-supervised learning framework of Barlow Twins to a setting where it is hard or impossible to define an appropriate set of distortions by hand. We propose to use nearest neighbors to build pairs from a training set and a redundancy reduction loss borrowed from the self-supervised literature to learn an encoder that produces representations invariant across such pairs. TLDR is a method that is simple, easy to implement and train, and of broad applicability; it consists of an offline nearest neighbor computation step that can be highly approximated, and a straightforward learning process that does not require mining negative samples to contrast, eigendecompositions, or cumbersome optimization solvers. By replacing PCA with TLDR, we are able to increase the performance of GeM-AP by 4% mAP for 128 dimensions, and to retain its performance with 16x fewer dimensions.

[232]  arXiv:2110.09461 (cross-list from cs.AI) [pdf, other]
Title: In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We address the problem of building agents whose goal is to satisfy out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the deep learning architecture is a key feature when teaching a DRL agent to solve OOD tasks in TL. Yet, the studies on their performance are still limited. In this work, we analyse various state-of-the-art (SOTA) architectures that include generalisation mechanisms such as relational layers, the soft-attention mechanism, or hierarchical configurations, when generalising safety-aware tasks expressed in TL. Most importantly, we present a novel deep learning architecture that induces agents to generate latent representations of their current goal given both the human instruction and the current observation from the environment. We find that applying our proposed configuration to SOTA architectures yields significantly stronger performance when executing new tasks in OOD environments.

[233]  arXiv:2110.09473 (cross-list from eess.IV) [pdf, other]
Title: DBSegment: Fast and robust segmentation of deep brain structures -- Evaluation of transportability across acquisition domains
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Segmenting deep brain structures from magnetic resonance images is important for patient diagnosis, surgical planning, and research. Most current state-of-the-art solutions follow a segmentation-by-registration approach, where subject MRIs are mapped to a template with well-defined segmentations. However, registration-based pipelines are time-consuming, thus, limiting their clinical use. This paper uses deep learning to provide a robust and efficient deep brain segmentation solution. The method consists of a pre-processing step to conform all MRI images to the same orientation, followed by a convolutional neural network using the nnU-Net framework. We use a total of 14 datasets from both research and clinical collections. Of these, seven were used for training and validation and seven were retained for independent testing. We trained the network to segment 30 deep brain structures, as well as a brain mask, using labels generated from a registration-based approach. We evaluated the generalizability of the network by performing a leave-one-dataset-out cross-validation, and extensive testing on external datasets. Furthermore, we assessed cross-domain transportability by evaluating the results separately on different domains. We achieved an average DSC of 0.89 $\pm$ 0.04 on the independent testing datasets when compared to the registration-based gold standard. On our test system, the computation time decreased from 42 minutes for a reference registration-based pipeline to 1 minute. Our proposed method is fast, robust, and generalizes with high reliability. It can be extended to the segmentation of other brain structures. The method is publicly available on GitHub, as well as a pip package for convenient usage.

[234]  arXiv:2110.09489 (cross-list from q-fin.CP) [pdf]
Title: Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks
Authors: Curtis Nybo
Comments: 26 pages
Subjects: Computational Finance (q-fin.CP); Machine Learning (cs.LG)

Recently artificial neural networks (ANNs) have seen success in volatility prediction, but the literature is divided on where an ANN should be used rather than the common GARCH model. The purpose of this study is to compare the volatility prediction performance of ANN and GARCH models when applied to stocks with low, medium, and high volatility profiles. This approach intends to identify which model should be used for each case. The volatility profiles comprise of five sectors that cover all stocks in the U.S stock market from 2005 to 2020. Three GARCH specifications and three ANN architectures are examined for each sector, where the most adequate model is chosen to move on to forecasting. The results indicate that the ANN model should be used for predicting volatility of assets with low volatility profiles, and GARCH models should be used when predicting volatility of medium and high volatility assets.

[235]  arXiv:2110.09502 (cross-list from math.ST) [pdf, other]
Title: Minimum $\ell_{1}$-norm interpolators: Precise asymptotics and multiple descent
Authors: Yue Li, Yuting Wei
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)

An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators -- the ones that achieve zero training error -- may not necessarily be harmful. This paper pursues theoretical understanding for an important type of interpolators: the minimum $\ell_{1}$-norm interpolator, which is motivated by the observation that several learning algorithms favor low $\ell_1$-norm solutions in the over-parameterized regime. Concretely, we consider the noisy sparse regression model under Gaussian design, focusing on linear sparsity and high-dimensional asymptotics (so that both the number of features and the sparsity level scale proportionally with the sample size).
We observe, and provide rigorous theoretical justification for, a curious multi-descent phenomenon; that is, the generalization risk of the minimum $\ell_1$-norm interpolator undergoes multiple (and possibly more than two) phases of descent and ascent as one increases the model capacity. This phenomenon stems from the special structure of the minimum $\ell_1$-norm interpolator as well as the delicate interplay between the over-parameterized ratio and the sparsity, thus unveiling a fundamental distinction in geometry from the minimum $\ell_2$-norm interpolator. Our finding is built upon an exact characterization of the risk behavior, which is governed by a system of two non-linear equations with two unknowns.

[236]  arXiv:2110.09510 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised Finetuning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper studies "unsupervised finetuning", the symmetrical problem of the well-known "supervised finetuning". Given a pretrained model and small-scale unlabeled target data, unsupervised finetuning is to adapt the representation pretrained from the source domain to the target domain so that better transfer performance can be obtained. This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained representation and poor representation in the target domain. In this paper, we find the source data is crucial when shifting the finetuning paradigm from supervise to unsupervise, and propose two simple and effective strategies to combine source and target data into unsupervised finetuning: "sparse source data replaying", and "data mixing". The motivation of the former strategy is to add a small portion of source data back to occupy their pretrained representation space and help push the target data to reside in a smaller compact space; and the motivation of the latter strategy is to increase the data density and help learn more compact representation. To demonstrate the effectiveness of our proposed ``unsupervised finetuning'' strategy, we conduct extensive experiments on multiple different target datasets, which show better transfer performance than the naive strategy.

Replacements for Tue, 19 Oct 21

[237]  arXiv:1902.04742 (replaced) [pdf, other]
Title: Uniform convergence may be unable to explain generalization in deep learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[238]  arXiv:1910.09714 (replaced) [pdf, other]
Title: Smoothness-Adaptive Contextual Bandits
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[239]  arXiv:1911.02319 (replaced) [pdf, other]
Title: Improving reinforcement learning algorithms: towards optimal learning rate policies
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[240]  arXiv:2003.00660 (replaced) [pdf, ps, other]
Title: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[241]  arXiv:2003.06566 (replaced) [pdf, other]
Title: On the benefits of defining vicinal distributions in latent space
Comments: Accepted at Elsevier Pattern Recognition Letters (2021), Best Paper Award at CVPR 2021 Workshop on Adversarial Machine Learning in Real-World Computer Vision (AML-CV), Also accepted at ICLR 2021 Workshops on Robust-Reliable Machine Learning (Oral) and Generalization beyond the training distribution (Abstract)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[242]  arXiv:2004.08646 (replaced) [pdf, other]
Title: Macro-Action-Based Deep Multi-Agent Reinforcement Learning
Journal-ref: 3rd Conference on Robot Learning (CoRL 2019)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[243]  arXiv:2005.03566 (replaced) [pdf, other]
Title: Noisy Differentiable Architecture Search
Comments: BMVC 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[244]  arXiv:2006.05842 (replaced) [pdf, other]
Title: The Emergence of Individuality
Comments: The extended version of ICML 2021 paper
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[245]  arXiv:2007.00823 (replaced) [pdf, other]
Title: Dropout as a Regularizer of Interaction Effects
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[246]  arXiv:2009.06087 (replaced) [pdf, other]
Title: Neural Networks Enhancement with Logical Knowledge
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Machine Learning (stat.ML)
[247]  arXiv:2010.01777 (replaced) [pdf, other]
Title: A Unified View on Graph Neural Networks as Graph Signal Denoising
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[248]  arXiv:2010.09429 (replaced) [pdf, other]
Title: Neural Additive Vector Autoregression Models for Causal Discovery in Time Series
Comments: 11 pages, 5 figures
Journal-ref: Discovery Science. DS 2021. Lecture Notes in Computer Science, vol 12986. Springer, Cham
Subjects: Machine Learning (cs.LG)
[249]  arXiv:2011.03180 (replaced) [pdf, other]
Title: FedSL: Federated Split Learning on Distributed Sequential Data in Recurrent Neural Networks
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[250]  arXiv:2011.05348 (replaced) [pdf, other]
Title: SALR: Sharpness-aware Learning Rate Scheduler for Improved Generalization
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[251]  arXiv:2011.13120 (replaced) [pdf, other]
Title: Evaluation of Out-of-Distribution Detection Performance of Self-Supervised Learning in a Controllable Environment
Comments: Accepted to NeurIPS 2020 Workshop: Self-Supervised Learning - Theory and Practice
Subjects: Machine Learning (cs.LG)
[252]  arXiv:2101.01076 (replaced) [pdf]
Title: Understanding Health Video Engagement: An Interpretable Deep Learning Approach
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[253]  arXiv:2101.12353 (replaced) [pdf, other]
Title: On the capacity of deep generative networks for approximating distributions
Subjects: Machine Learning (cs.LG); Probability (math.PR); Statistics Theory (math.ST); Machine Learning (stat.ML)
[254]  arXiv:2102.02611 (replaced) [pdf, other]
Title: CKConv: Continuous Kernel Convolution For Sequential Data
Subjects: Machine Learning (cs.LG)
[255]  arXiv:2102.02926 (replaced) [pdf, other]
Title: Alchemy: A structured task distribution for meta-reinforcement learning
Comments: Published in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[256]  arXiv:2102.03988 (replaced) [pdf, other]
Title: Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression: A Statistical Mechanics Analysis
Comments: Accepted by NeurIPS 2021
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI)
[257]  arXiv:2103.00075 (replaced) [pdf, other]
Title: Noisy Truncated SGD: Optimization and Generalization
Subjects: Machine Learning (cs.LG)
[258]  arXiv:2103.09052 (replaced) [pdf, other]
Title: Selective Intervention Planning using Restless Multi-Armed Bandits to Improve Maternal and Child Health Outcomes
Comments: 7 pages. Camera-ready version for AASG 2021 Workshop
Subjects: Machine Learning (cs.LG)
[259]  arXiv:2103.09743 (replaced) [pdf, ps, other]
Title: Deep Learning-based Extreme Heatwave Forecast
Comments: 20 pages, 5 figures
Subjects: Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph); Data Analysis, Statistics and Probability (physics.data-an)
[260]  arXiv:2103.10245 (replaced) [pdf, other]
Title: Building Safer Autonomous Agents by Leveraging Risky Driving Behavior Knowledge
Comments: Published in CCCI 2021, Best Paper Award in Informatics
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[261]  arXiv:2103.11933 (replaced) [pdf]
Title: PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT
Comments: 18 pages, 7 figures and 4 Tables
Subjects: Machine Learning (cs.LG); Econometrics (econ.EM)
[262]  arXiv:2104.05914 (replaced) [pdf, other]
Title: GSA-Forecaster: Forecasting Graph-Based Time-Dependent Data with Graph Sequence Attention
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[263]  arXiv:2104.11510 (replaced) [pdf, other]
Title: Time Series Forecasting via Learning Convolutionally Low-Rank Models
Authors: Guangcan Liu
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[264]  arXiv:2104.11734 (replaced) [pdf, other]
Title: Exact marginal prior distributions of finite Bayesian neural networks
Comments: 12+9 pages, 4 figures; v3: Accepted as NeurIPS 2021 Spotlight
Subjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (stat.ML)
[265]  arXiv:2105.01099 (replaced) [pdf, other]
Title: Reinforcement Learning for Ridesharing: A Survey
Comments: This survey has been significantly expanded and refined over the conference version presented at IEEE ITSC 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[266]  arXiv:2105.08024 (replaced) [pdf, other]
Title: Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
[267]  arXiv:2105.13967 (replaced) [pdf, other]
Title: Bridge Data Center AI Systems with Edge Computing for Actionable Information Retrieval
Subjects: Machine Learning (cs.LG)
[268]  arXiv:2106.00198 (replaced) [pdf, other]
Title: Gradient play in stochastic games: stationary points, convergence, and sample complexity
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Optimization and Control (math.OC)
[269]  arXiv:2106.01917 (replaced) [pdf, other]
Title: SpecAttack: Specification-Based Adversarial Training for Deep Neural Networks
Subjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
[270]  arXiv:2106.04117 (replaced) [pdf, ps, other]
Title: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Comments: The camera-ready version for NeurIPS 2021
Subjects: Machine Learning (cs.LG)
[271]  arXiv:2106.05232 (replaced) [pdf, ps, other]
Title: Realizing GANs via a Tunable Loss Function
Comments: Extended version of a paper accepted to ITW 2021. 8 pages, 2 figures
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
[272]  arXiv:2106.06134 (replaced) [pdf, other]
Title: Is Homophily a Necessity for Graph Neural Networks?
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[273]  arXiv:2106.07827 (replaced) [pdf, other]
Title: Improving the compromise between accuracy, interpretability and personalization of rule-based machine learning in medical problems
Comments: Accepted for the IEEE Engineering in Medicine and Biology Society Conference (EMBC 2021)
Subjects: Machine Learning (cs.LG)
[274]  arXiv:2106.07976 (replaced) [pdf, other]
Title: Federated Learning for Internet of Things: A Federated Learning Framework for On-device Anomaly Data Detection
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[275]  arXiv:2106.10065 (replaced) [pdf, other]
Title: Being a Bit Frequentist Improves Bayesian Neural Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[276]  arXiv:2106.10404 (replaced) [pdf, other]
Title: Sparse Training via Boosting Pruning Plasticity with Neuroregeneration
Comments: Published on the thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). Code can be found this https URL
Journal-ref: Conference on Neural Information Processing Systems (NeurIPS 2021)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[277]  arXiv:2106.11655 (replaced) [pdf, other]
Title: DARTS-PRIME: Regularization and Scheduling Improve Constrained Optimization in Differentiable NAS
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[278]  arXiv:2106.13423 (replaced) [pdf, other]
Title: Federated Graph Classification over Non-IID Graphs
Comments: Accepted to NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[279]  arXiv:2106.13430 (replaced) [pdf, other]
Title: Subgraph Federated Learning with Missing Neighbor Generation
Comments: Accepted to NeurIPS 2021 (spotlight presentation)
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[280]  arXiv:2106.14229 (replaced) [pdf, other]
Title: Over-the-Air Federated Multi-Task Learning
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
[281]  arXiv:2107.00520 (replaced) [pdf, other]
Title: Predictive Modeling in the Presence of Nuisance-Induced Spurious Correlations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[282]  arXiv:2107.00758 (replaced) [pdf, other]
Title: The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[283]  arXiv:2107.05686 (replaced) [pdf, other]
Title: The Role of Pretrained Representations for the OOD Generalization of RL Agents
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[284]  arXiv:2107.10370 (replaced) [pdf, other]
Title: Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II
Comments: arXiv admin note: text overlap with arXiv:2008.01805
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
[285]  arXiv:2107.11413 (replaced) [pdf, other]
Title: An Instance-Dependent Simulation Framework for Learning with Label Noise
Comments: Datasets released at this https URL
Subjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
[286]  arXiv:2108.06249 (replaced) [pdf, other]
Title: MINT -- Mainstream and Independent News Text Corpus
Subjects: Machine Learning (cs.LG)
[287]  arXiv:2108.07406 (replaced) [pdf, other]
Title: From the Greene--Wu Convolution to Gradient Estimation over Riemannian Manifolds
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
[288]  arXiv:2108.08435 (replaced) [pdf, other]
Title: Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning
Comments: This work is accepted by NeurIPS2021
Subjects: Machine Learning (cs.LG)
[289]  arXiv:2108.09676 (replaced) [pdf, other]
Title: Efficient Gaussian Neural Processes for Regression
Comments: 6 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[290]  arXiv:2108.10566 (replaced) [pdf, other]
Title: sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[291]  arXiv:2109.02934 (replaced) [pdf, other]
Title: Fishr: Invariant Gradient Variances for Out-of-distribution Generalization
Comments: 32 pages, 13 tables, 6 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
[292]  arXiv:2109.05710 (replaced) [pdf, other]
Title: Robust Stability of Neural-Network Controlled Nonlinear Systems with Parametric Variability
Comments: 15 pages, 7 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[293]  arXiv:2109.10817 (replaced) [pdf, other]
Title: Causal Inference in Non-linear Time-series using Deep Networks and Knockoff Counterfactuals
Journal-ref: IEEE International Conference on Machine Learning and Applications (ICMLA) 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[294]  arXiv:2110.00137 (replaced) [pdf, other]
Title: Iterative Teacher-Aware Learning
Journal-ref: Advances in Neural Information Processing Systems (2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
[295]  arXiv:2110.00804 (replaced) [pdf, other]
Title: ProTo: Program-Guided Transformer for Program-Guided Tasks
Comments: Accepted in NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[296]  arXiv:2110.01303 (replaced) [pdf, other]
Title: Incremental Class Learning using Variational Autoencoders with Similarity Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[297]  arXiv:2110.02628 (replaced) [pdf, ps, other]
Title: Characterizing Learning Dynamics of Deep Neural Networks via Complex Networks
Comments: IEEE/ICTAI2021 (full paper)
Journal-ref: IEEE-ICTAI2021
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[298]  arXiv:2110.03909 (replaced) [pdf, other]
Title: Meta-Learning with Task-Adaptive Loss Function for Few-Shot Learning
Comments: ICCV 2021 (Oral). Code at this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[299]  arXiv:2110.04160 (replaced) [pdf, other]
Title: Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions
Comments: Submitted for peer review in a journal
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[300]  arXiv:2110.05430 (replaced) [pdf, other]
Title: Density-based interpretable hypercube region partitioning for mixed numeric and categorical data
Subjects: Machine Learning (cs.LG); Applications (stat.AP)
[301]  arXiv:2110.05454 (replaced) [pdf, other]
Title: Momentum Centering and Asynchronous Update for Adaptive Gradient Methods
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
[302]  arXiv:2110.06532 (replaced) [pdf, ps, other]
Title: SMS: An Efficient Source Model Selection Framework for Model Reuse
Subjects: Machine Learning (cs.LG); Databases (cs.DB)
[303]  arXiv:2110.06559 (replaced) [pdf, other]
Title: Infinitely Divisible Noise in the Low Privacy Regime
Comments: Updated figure 3, which was misleading in the previous version
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS)
[304]  arXiv:2110.06976 (replaced) [pdf, other]
Title: Rethinking the Representational Continuity: Towards Unsupervised Continual Learning
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[305]  arXiv:2110.07749 (replaced) [pdf, other]
Title: Attention-Free Keyword Spotting
Comments: Submitted to ICASSP-2022 (5 pages)
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[306]  arXiv:2110.07959 (replaced) [pdf, other]
Title: Low-rank Matrix Recovery With Unknown Correspondence
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
[307]  arXiv:2110.08009 (replaced) [pdf, other]
Title: MaGNET: Uniform Sampling from Deep Generative Network Manifolds Without Retraining
Comments: 13 pages, 14 pages Appendix, 23 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[308]  arXiv:1712.01145 (replaced) [pdf, other]
Title: Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
Comments: 12 pages, 4 figures. This paper has been accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[309]  arXiv:1901.08057 (replaced) [pdf, ps, other]
Title: Large dimensional analysis of general margin based classification methods
Comments: 33 pages, 5 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
[310]  arXiv:1908.09128 (replaced) [pdf, other]
Title: Position-Aware Self-Attention based Neural Sequence Labeling
Comments: 12 pages, 6 figures
Journal-ref: published by pattern recognition, 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[311]  arXiv:2001.04286 (replaced) [pdf, other]
Title: Nonparametric Continuous Sensor Registration
Comments: 50 pages. Accepted for Journal of Machine Learning Research. arXiv admin note: text overlap with arXiv:1904.02266
Subjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[312]  arXiv:2003.12739 (replaced) [pdf, other]
Title: Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters
Comments: 13 pages, 6 figures, currently under review for IEEE Transactions on Multimedia
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
[313]  arXiv:2007.02794 (replaced) [pdf, other]
Title: Efficient Connected and Automated Driving Systemwith Multi-agent Graph Reinforcement Learning
Comments: the paper is not even ready
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[314]  arXiv:2007.03408 (replaced) [pdf, other]
Title: A Generative Model for Texture Synthesis based on Optimal Transport between Feature Distributions
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[315]  arXiv:2007.09248 (replaced) [pdf, other]
Title: Fine Timing and Frequency Synchronization for MIMO-OFDM: An Extreme Learning Approach
Comments: 13 pages, 12 figures, has been accepted for publication in IEEE Transactions on Cognitive Communications and Networking
Journal-ref: IEEE Transactions on Cognitive Communications and Networking, 2021
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[316]  arXiv:2007.14052 (replaced) [pdf, other]
Title: Multioutput Gaussian Processes with Functional Data: A Study on Coastal Flood Hazard Assessment
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
[317]  arXiv:2007.14861 (replaced) [pdf, ps, other]
Title: Efficient Sparse Secure Aggregation for Federated Learning
Subjects: Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[318]  arXiv:2008.11348 (replaced) [pdf, other]
Title: Variance-Reduced Splitting Schemes for Monotone Stochastic Generalized Equations
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[319]  arXiv:2008.13443 (replaced) [pdf, other]
Title: On the Quality Requirements of Demand Prediction for Dynamic Public Transport
Comments: 26 pages, 9 tables, 6 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)
[320]  arXiv:2009.00664 (replaced) [pdf, other]
Title: VeRNAl: Mining RNA Structures for Fuzzy Base Pairing Network Motifs
Subjects: Molecular Networks (q-bio.MN); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
[321]  arXiv:2009.02327 (replaced) [pdf, other]
Title: OnsagerNet: Learning Stable and Interpretable Dynamics using a Generalized Onsager Principle
Comments: 29 pages, 19 figures
Subjects: Dynamical Systems (math.DS); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[322]  arXiv:2009.14639 (replaced) [pdf, other]
Title: Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[323]  arXiv:2010.00373 (replaced) [pdf, other]
Title: Task Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates
Comments: The arXiv paper "Task Agnostic Continual Learning Using Online Variational Bayes" is a preliminary pre-print of this paper. The main differences between the versions are: 1. We develop new algorithmic framework (FOO-VB). 2. We add multivariate Gaussian and matrix variate Gaussian versions of the algorithm. 3. We demonstrate the new algorithm performance in task agnostic scenarios
Journal-ref: Neural Comput 2021; 33 (11)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[324]  arXiv:2011.03525 (replaced) [pdf, other]
Title: SigNet: A Novel Deep Learning Framework for Radio Signal Classification
Comments: 13 pages, 8 figures
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[325]  arXiv:2012.08418 (replaced) [pdf, other]
Title: Pedestrian Behavior Prediction for Automated Driving: Requirements, Metrics, and Relevant Features
Comments: This work has been submitted to the IEEE Transactions on Intelligent Transportation Systems for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Revision: Extended requirement analysis and evaluation. 16 pages
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)
[326]  arXiv:2012.11772 (replaced) [pdf, other]
Title: Power-SLIC: Fast Superpixel Segmentations by Diagrams
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG); Machine Learning (cs.LG)
[327]  arXiv:2101.12677 (replaced) [pdf, other]
Title: Diminishing Domain Bias by Leveraging Domain Labels in Object Detection on UAVs
Comments: Accepted for publication at ICAR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[328]  arXiv:2102.10663 (replaced) [pdf, other]
Title: MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[329]  arXiv:2102.12142 (replaced) [pdf, ps, other]
Title: Gaussian boson sampling and multi-particle event optimization by machine learning in the quantum phase space
Authors: Claudio Conti
Comments: Extended version, with correct figure 4, code available in github
Journal-ref: Quantum Machine Intelligence (2021) 3:26
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Optics (physics.optics)
[330]  arXiv:2103.01946 (replaced) [pdf, other]
Title: Fixing Data Augmentation to Improve Adversarial Robustness
Comments: Since its original publication (2 Mar 2021), this paper has been accepted to NeurIPS 2021 as two separate and updated papers (Rebuffi et al., 2021; Gowal et al., 2021). The new papers improve results and clarity
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[331]  arXiv:2103.13300 (replaced) [pdf, other]
Title: Automatic Cough Classification for Tuberculosis Screening in a Real-World Environment
Comments: This paper has been accepted in Physiological Measurement (2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[332]  arXiv:2104.07084 (replaced) [pdf, other]
Title: Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives
Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Optimization and Control (math.OC); Computation (stat.CO); Machine Learning (stat.ML)
[333]  arXiv:2104.08773 (replaced) [pdf, other]
Title: Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Comments: 20 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[334]  arXiv:2104.12761 (replaced) [pdf, other]
Title: Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium
Comments: In the 34th Annual Conference on Learning Theory (COLT 2021); 35 pages, 2 figures
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Optimization and Control (math.OC)
[335]  arXiv:2105.03425 (replaced) [pdf, other]
Title: Kernel Two-Sample Tests for Manifold Data
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[336]  arXiv:2105.08532 (replaced) [pdf, other]
Title: Robust Learning in Heterogeneous Contexts
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[337]  arXiv:2105.10266 (replaced) [pdf, other]
Title: Ensemble Quantile Networks: Uncertainty-Aware Reinforcement Learning with Applications in Autonomous Driving
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[338]  arXiv:2105.13348 (replaced) [pdf, other]
Title: Optimization in Open Networks via Dual Averaging
Comments: In 60th IEEE Conference on Decision and Control (CDC 2021); 7 pages, 1 figure
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[339]  arXiv:2106.03215 (replaced) [pdf, other]
Title: PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning
Comments: This work has been accepted to Neural Information Processing Systems (NeurIPS) 2021. First two authors contributed equally
Subjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[340]  arXiv:2106.03227 (replaced) [pdf, other]
Title: Neural Tangent Kernel Maximum Mean Discrepancy
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[341]  arXiv:2106.03762 (replaced) [pdf, other]
Title: Frustratingly Easy Uncertainty Estimation for Distribution Shift
Comments: 17 pages, 4 Tables, 9 Figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[342]  arXiv:2106.05565 (replaced) [pdf, other]
Title: Identifiability of interaction kernels in mean-field equations of interacting particles
Authors: Quanjun Lang, Fei Lu
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
[343]  arXiv:2106.09215 (replaced) [pdf, other]
Title: Optimum-statistical Collaboration Towards General and Efficient Black-box Optimization
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[344]  arXiv:2106.09685 (replaced) [pdf, other]
Title: LoRA: Low-Rank Adaptation of Large Language Models
Comments: Draft V2 includes better baselines, experiments on GLUE, and more on adapter latency
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[345]  arXiv:2106.12423 (replaced) [pdf, other]
Title: Alias-Free Generative Adversarial Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[346]  arXiv:2106.15358 (replaced) [pdf, other]
Title: Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors
Comments: Accepted to NeurIPS 2021
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
[347]  arXiv:2107.00753 (replaced) [pdf, other]
Title: An Investigation of the (In)effectiveness of Counterfactually Augmented Data
Authors: Nitish Joshi, He He
Comments: (v2) Toy theoretical example updated
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[348]  arXiv:2107.06501 (replaced) [pdf, other]
Title: AdvFilter: Predictive Perturbation-aware Filtering against Adversarial Attack via Multi-domain Learning
Comments: This work has been accepted to ACM-MM 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[349]  arXiv:2107.10884 (replaced) [pdf, other]
Title: Structured second-order methods via natural gradient descent
Comments: Fixed some typos. ICML workshop paper. A short version of arXiv:2102.07405 with a focus on optimization tasks
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[350]  arXiv:2107.13921 (replaced) [pdf, other]
Title: Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts
Comments: 10 pages, 8 figures, 2 tables
Journal-ref: IEEE CLUSTER (2021) 261-270
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[351]  arXiv:2108.04228 (replaced) [pdf, other]
Title: Iterative Distillation for Better Uncertainty Estimates in Multitask Emotion Recognition
Comments: Accepted as a Workshop paper in ICCV2021 proceeding
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[352]  arXiv:2108.07472 (replaced) [pdf, other]
Title: PAC Learnability of Approximate Nash Equilibrium in Bimatrix Games
Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
[353]  arXiv:2109.01135 (replaced) [pdf, ps, other]
Title: Sequence-to-Sequence Learning with Latent Neural Grammars
Authors: Yoon Kim
Comments: NeurIPS 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[354]  arXiv:2109.01745 (replaced) [pdf, other]
Title: A realistic approach to generate masked faces applied on two novel masked face recognition data sets
Comments: Accepted at NeurIPS 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[355]  arXiv:2109.05578 (replaced) [pdf, other]
Title: Kernel PCA with the Nyström method
Authors: Fredrik Hallgren
Comments: 44 pages, 6 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[356]  arXiv:2109.05583 (replaced) [pdf, ps, other]
Title: Automatic Componentwise Boosting: An Interpretable AutoML System
Comments: 6 pages, 4 figures, ECML-PKDD Workshop on Automating Data Science 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[357]  arXiv:2109.05675 (replaced) [pdf, other]
Title: Online Unsupervised Learning of Visual Representations and Categories
Comments: Technical report, 28 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[358]  arXiv:2109.05941 (replaced) [pdf, other]
Title: Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning
Comments: EMNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[359]  arXiv:2109.07830 (replaced) [pdf, other]
Title: Reframing Instructional Prompts to GPTk's Language
Comments: 11 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[360]  arXiv:2109.14420 (replaced) [pdf, other]
Title: FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Comments: Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[361]  arXiv:2110.00629 (replaced) [pdf, other]
Title: Factored couplings in multi-marginal optimal transport via difference of convex programming
Comments: Fix typo and correct the corollary 3.3
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[362]  arXiv:2110.00987 (replaced) [pdf, other]
Title: Motif-based Graph Self-Supervised Learning for Molecular Property Prediction
Comments: Accepted by NeurIPS'21
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[363]  arXiv:2110.01593 (replaced) [pdf, other]
Title: Generalized Kernel Thinning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)
[364]  arXiv:2110.01773 (replaced) [pdf, other]
Title: Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games
Subjects: Computer Science and Game Theory (cs.GT); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Optimization and Control (math.OC)
[365]  arXiv:2110.02011 (replaced) [pdf, other]
Title: Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[366]  arXiv:2110.03267 (replaced) [pdf, other]
Title: Propagating State Uncertainty Through Trajectory Forecasting
Comments: 8 pages, 6 figures, 4 tables. Fixed section name typo
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[367]  arXiv:2110.04624 (replaced) [pdf, other]
Title: Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design
Subjects: Biomolecules (q-bio.BM); Machine Learning (cs.LG)
[368]  arXiv:2110.05668 (replaced) [pdf, other]
Title: NAS-Bench-360: Benchmarking Diverse Tasks for Neural Architecture Search
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[369]  arXiv:2110.06021 (replaced) [pdf, other]
Title: Embedded-model flows: Combining the inductive biases of model-free deep learning and explicit probabilistic modeling
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[370]  arXiv:2110.07604 (replaced) [pdf, other]
Title: NeRS: Neural Reflectance Surfaces for Sparse-view 3D Reconstruction in the Wild
Comments: In NeurIPS 2021. v2-3: Fixed minor typos
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[371]  arXiv:2110.07809 (replaced) [pdf, other]
Title: PTQ-SL: Exploring the Sub-layerwise Post-training Quantization
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[372]  arXiv:2110.08059 (replaced) [pdf, other]
Title: FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes
Comments: First two authors contributed equally to this work
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[ total of 372 entries: 1-372 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2110, contact, help  (Access key information)