We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 138 entries: 1-138 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 22 Apr 21

[1]  arXiv:2104.10190 [pdf, other]
Title: Outcome-Driven Reinforcement Learning via Variational Inference
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Methodology (stat.ME); Machine Learning (stat.ML)

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the task, but also provide sufficient shaping to accomplish it. In this paper, we discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function which can be learned directly from environment interactions. From the corresponding variational objective, we also derive a new probabilistic Bellman backup operator reminiscent of the standard Bellman backup operator and use it to develop an off-policy algorithm to solve goal-directed tasks. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.

[2]  arXiv:2104.10201 [pdf, other]
Title: Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

This paper presents the results and insights from the black-box optimization (BBO) challenge at NeurIPS 2020 which ran from July-October, 2020. The challenge emphasized the importance of evaluating derivative-free optimizers for tuning the hyperparameters of machine learning models. This was the first black-box optimization challenge with a machine learning emphasis. It was based on tuning (validation set) performance of standard machine learning models on real datasets. This competition has widespread impact as black-box optimization (e.g., Bayesian optimization) is relevant for hyperparameter tuning in almost every machine learning project as well as many applications outside of machine learning. The final leaderboard was determined using the optimization performance on held-out (hidden) objective functions, where the optimizers ran without human intervention. Baselines were set using the default settings of several open-source black-box optimization packages as well as random search.

[3]  arXiv:2104.10223 [pdf, other]
Title: More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data
Comments: Presented as a RobustML workshop paper at ICLR 2021. Both authors contributed equally. This article extends arXiv:2006.07767
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

A common heuristic in semi-supervised deep learning (SSDL) is to select unlabelled data based on a notion of semantic similarity to the labelled data. For example, labelled images of numbers should be paired with unlabelled images of numbers instead of, say, unlabelled images of cars. We refer to this practice as semantic data set matching. In this work, we demonstrate the limits of semantic data set matching. We show that it can sometimes even degrade the performance for a state of the art SSDL algorithm. We present and make available a comprehensive simulation sandbox, called non-IID-SSDL, for stress testing an SSDL algorithm under different degrees of distribution mismatch between the labelled and unlabelled data sets. In addition, we demonstrate that simple density based dissimilarity measures in the feature space of a generic classifier offer a promising and more reliable quantitative matching criterion to select unlabelled data before SSDL training.

[4]  arXiv:2104.10228 [pdf, other]
Title: Concept Drift Detection from Multi-Class Imbalanced Data Streams
Comments: 37th IEEE International Conference on Data Engineering (ICDE), 2021. arXiv admin note: text overlap with arXiv:2009.09497
Subjects: Machine Learning (cs.LG)

Continual learning from data streams is among the most important topics in contemporary machine learning. One of the biggest challenges in this domain lies in creating algorithms that can continuously adapt to arriving data. However, previously learned knowledge may become outdated, as streams evolve over time. This phenomenon is known as concept drift and must be detected to facilitate efficient adaptation of the learning model. While there exists a plethora of drift detectors, all of them assume that we are dealing with roughly balanced classes. In the case of imbalanced data streams, those detectors will be biased towards the majority classes, ignoring changes happening in the minority ones. Furthermore, class imbalance may evolve over time and classes may change their roles (majority becoming minority and vice versa). This is especially challenging in the multi-class setting, where relationships among classes become complex. In this paper, we propose a detailed taxonomy of challenges posed by concept drift in multi-class imbalanced data streams, as well as a novel trainable concept drift detector based on Restricted Boltzmann Machine. It is capable of monitoring multiple classes at once and using reconstruction error to detect changes in each of them independently. Our detector utilizes a skew-insensitive loss function that allows it to handle multiple imbalanced distributions. Due to its trainable nature, it is capable of following changes in a stream and evolving class roles, as well as it can deal with local concept drift occurring in minority classes. Extensive experimental study on multi-class drifting data streams, enriched with a detailed analysis of the impact of local drifts and changing imbalance ratios, confirms the high efficacy of our approach.

[5]  arXiv:2104.10255 [pdf, other]
Title: Extraction of Hierarchical Functional Connectivity Components in human brain using Adversarial Learning
Subjects: Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

The estimation of sparse hierarchical components reflecting patterns of the brain's functional connectivity from rsfMRI data can contribute to our understanding of the brain's functional organization, and can lead to biomarkers of diseases. However, inter-scanner variations and other confounding factors pose a challenge to the robust and reproducible estimation of functionally-interpretable brain networks, and especially to reproducible biomarkers. Moreover, the brain is believed to be organized hierarchically, and hence single-scale decompositions miss this hierarchy. The paper aims to use current advancements in adversarial learning to estimate interpretable hierarchical patterns in the human brain using rsfMRI data, which are robust to "adversarial effects" such as inter-scanner variations. We write the estimation problem as a minimization problem and solve it using alternating updates. Extensive experiments on simulation and a real-world dataset show high reproducibility of the components compared to other well-known methods.

[6]  arXiv:2104.10258 [pdf, other]
Title: Discovering an Aid Policy to Minimize Student Evasion Using Offline Reinforcement Learning
Comments: 8 pages, 6 figures, accepted for publication in 2021 International Joint Conference on Neural Networks (IJCNN 2021)
Subjects: Machine Learning (cs.LG)

High dropout rates in tertiary education expose a lack of efficiency that causes frustration of expectations and financial waste. Predicting students at risk is not enough to avoid student dropout. Usually, an appropriate aid action must be discovered and applied in the proper time for each student. To tackle this sequential decision-making problem, we propose a decision support method to the selection of aid actions for students using offline reinforcement learning to support decision-makers effectively avoid student dropout. Additionally, a discretization of student's state space applying two different clustering methods is evaluated. Our experiments using logged data of real students shows, through off-policy evaluation, that the method should achieve roughly 1.0 to 1.5 times as much cumulative reward as the logged policy. So, it is feasible to help decision-makers apply appropriate aid actions and, possibly, reduce student dropout.

[7]  arXiv:2104.10289 [pdf, other]
Title: A windowed correlation based feature selection method to improve time series prediction of dengue fever cases
Comments: 13 pages, 13 figures
Subjects: Machine Learning (cs.LG)

The performance of data-driven prediction models depends on the availability of data samples for model training. A model that learns about dengue fever incidence in a population uses historical data from that corresponding location. Poor performance in prediction can result in places with inadequate data. This work aims to enhance temporally limited dengue case data by methodological addition of epidemically relevant data from nearby locations as predictors (features). A novel framework is presented for windowing incidence data and computing time-shifted correlation-based metrics to quantify feature relevance. The framework ranks incidence data of adjacent locations around a target location by combining the correlation metric with two other metrics: spatial distance and local prevalence. Recurrent neural network-based prediction models achieve up to 33.6% accuracy improvement on average using the proposed method compared to using training data from the target location only. These models achieved mean absolute error (MAE) values as low as 0.128 on [0,1] normalized incidence data for a municipality with the highest dengue prevalence in Brazil's Espirito Santo. When predicting cases aggregated over geographical ecoregions, the models achieved accuracy improvements up to 16.5%, using only 6.5% of incidence data from ranked feature sets. The paper also includes two techniques for windowing time series data: fixed-sized windows and outbreak detection windows. Both of these techniques perform comparably, while the window detection method uses less data for computations. The framework presented in this paper is application-independent, and it could improve the performances of prediction models where data from spatially adjacent locations are available.

[8]  arXiv:2104.10314 [pdf, ps, other]
Title: Efficient Sparse Coding using Hierarchical Riemannian Pursuit
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP)

Sparse coding is a class of unsupervised methods for learning a sparse representation of the input data in the form of a linear combination of a dictionary and a sparse code. This learning framework has led to state-of-the-art results in various image and video processing tasks. However, classical methods learn the dictionary and the sparse code based on alternative optimizations, usually without theoretical guarantees for either optimality or convergence due to non-convexity of the problem. Recent works on sparse coding with a complete dictionary provide strong theoretical guarantees thanks to the development of the non-convex optimization. However, initial non-convex approaches learn the dictionary in the sparse coding problem sequentially in an atom-by-atom manner, which leads to a long execution time. More recent works seek to directly learn the entire dictionary at once, which substantially reduces the execution time. However, the associated recovery performance is degraded with a finite number of data samples. In this paper, we propose an efficient sparse coding scheme with a two-stage optimization. The proposed scheme leverages the global and local Riemannian geometry of the two-stage optimization problem and facilitates fast implementation for superb dictionary recovery performance by a finite number of samples without atom-by-atom calculation. We further prove that, with high probability, the proposed scheme can exactly recover any atom in the target dictionary with a finite number of samples if it is adopted to recover one atom of the dictionary. An application on wireless sensor data compression is also proposed. Experiments on both synthetic and real-world data verify the efficiency and effectiveness of the proposed scheme.

[9]  arXiv:2104.10322 [pdf, other]
Title: Gradient Masked Federated Optimization
Journal-ref: ICLR 2021 Distributed and Private Machine Learning(DPML) Workshop
Subjects: Machine Learning (cs.LG)

Federated Averaging (FedAVG) has become the most popular federated learning algorithm due to its simplicity and low communication overhead. We use simple examples to show that FedAVG has the tendency to sew together the optima across the participating clients. These sewed optima exhibit poor generalization when used on a new client with new data distribution. Inspired by the invariance principles in (Arjovsky et al., 2019; Parascandolo et al., 2020), we focus on learning a model that is locally optimal across the different clients simultaneously. We propose a modification to FedAVG algorithm to include masked gradients (AND-mask from (Parascandolo et al., 2020)) across the clients and uses them to carry out an additional server model update. We show that this algorithm achieves better accuracy (out-of-distribution) than FedAVG, especially when the data is non-identically distributed across clients.

[10]  arXiv:2104.10329 [pdf, ps, other]
Title: Deep Transform and Metric Learning Networks
Comments: Accepted by ICASSP 2021. arXiv admin note: substantial text overlap with arXiv:2002.07898
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Based on its great successes in inference and denosing tasks, Dictionary Learning (DL) and its related sparse optimization formulations have garnered a lot of research interest. While most solutions have focused on single layer dictionaries, the recently improved Deep DL methods have also fallen short on a number of issues. We hence propose a novel Deep DL approach where each DL layer can be formulated and solved as a combination of one linear layer and a Recurrent Neural Network, where the RNN is flexibly regraded as a layer-associated learned metric. Our proposed work unveils new insights between the Neural Networks and Deep DL, and provides a novel, efficient and competitive approach to jointly learn the deep transforms and metrics. Extensive experiments are carried out to demonstrate that the proposed method can not only outperform existing Deep DL, but also state-of-the-art generic Convolutional Neural Networks.

[11]  arXiv:2104.10340 [pdf, other]
Title: CVLight: Deep Reinforcement Learning for Adaptive Traffic Signal Control with Connected Vehicles
Comments: 27 pages, 13 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

This paper develops a reinforcement learning (RL) scheme for adaptive traffic signal control (ATSC), called "CVLight", that leverages data collected only from connected vehicles (CV). Seven types of RL models are proposed within this scheme that contain various state and reward representations, including incorporation of CV delay and green light duration into state and the usage of CV delay as reward. To further incorporate information of both CV and non-CV into CVLight, an algorithm based on actor-critic, A2C-Full, is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to update the policy network and execute optimal signal timing. These models are compared at an isolated intersection under various CV market penetration rates. A full model with the best performance (i.e., minimum average travel delay per vehicle) is then selected and applied to compare with state-of-the-art benchmarks under different levels of traffic demands, turning proportions, and dynamic traffic demands, respectively. Two case studies are performed on an isolated intersection and a corridor with three consecutive intersections located in Manhattan, New York, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models that use all vehicle information, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and can achieve a similar or even greater performance when the CV penetration rate is no less than 20%.

[12]  arXiv:2104.10350 [pdf]
Title: Carbon Emissions and Large Neural Network Training
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

The computation demand for machine learning (ML) has grown rapidly recently, which comes with a number of costs. Estimating the energy cost helps measure its environmental impact and finding greener strategies, yet it is challenging without detailed information. We calculate the energy use and carbon footprint of several recent large models-T5, Meena, GShard, Switch Transformer, and GPT-3-and refine earlier estimates for the neural architecture search that found Evolved Transformer. We highlight the following opportunities to improve energy efficiency and CO2 equivalent emissions (CO2e): Large but sparsely activated DNNs can consume <1/10th the energy of large, dense DNNs without sacrificing accuracy despite using as many or even more parameters. Geographic location matters for ML workload scheduling since the fraction of carbon-free energy and resulting CO2e vary ~5X-10X, even within the same country and the same organization. We are now optimizing where and when large models are trained. Specific datacenter infrastructure matters, as Cloud datacenters can be ~1.4-2X more energy efficient than typical datacenters, and the ML-oriented accelerators inside them can be ~2-5X more effective than off-the-shelf systems. Remarkably, the choice of DNN, datacenter, and processor can reduce the carbon footprint up to ~100-1000X. These large factors also make retroactive estimates of energy cost difficult. To avoid miscalculations, we believe ML papers requiring large computational resources should make energy consumption and CO2e explicit when practical. We are working to be more transparent about energy use and CO2e in our future research. To help reduce the carbon footprint of ML, we believe energy usage and CO2e should be a key metric in evaluating models, and we are collaborating with MLPerf developers to include energy usage during training and inference in this industry standard benchmark.

[13]  arXiv:2104.10377 [pdf, other]
Title: Dual Head Adversarial Training
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Deep neural networks (DNNs) are known to be vulnerable to adversarial examples/attacks, raising concerns about their reliability in safety-critical applications. A number of defense methods have been proposed to train robust DNNs resistant to adversarial attacks, among which adversarial training has so far demonstrated the most promising results. However, recent studies have shown that there exists an inherent tradeoff between accuracy and robustness in adversarially-trained DNNs. In this paper, we propose a novel technique Dual Head Adversarial Training (DH-AT) to further improve the robustness of existing adversarial training methods. Different from existing improved variants of adversarial training, DH-AT modifies both the architecture of the network and the training strategy to seek more robustness. Specifically, DH-AT first attaches a second network head (or branch) to one intermediate layer of the network, then uses a lightweight convolutional neural network (CNN) to aggregate the outputs of the two heads. The training strategy is also adapted to reflect the relative importance of the two heads. We empirically show, on multiple benchmark datasets, that DH-AT can bring notable robustness improvements to existing adversarial training methods. Compared with TRADES, one state-of-the-art adversarial training method, our DH-AT can improve the robustness by 3.4% against PGD40 and 2.3% against AutoAttack, and also improve the clean accuracy by 1.8%.

[14]  arXiv:2104.10398 [pdf, other]
Title: Learning future terrorist targets through temporal meta-graphs
Comments: 19 pages, 18 figures
Journal-ref: Sci Rep 11, 8533 (2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In the last 20 years, terrorism has led to hundreds of thousands of deaths and massive economic, political, and humanitarian crises in several regions of the world. Using real-world data on attacks occurred in Afghanistan and Iraq from 2001 to 2018, we propose the use of temporal meta-graphs and deep learning to forecast future terrorist targets. Focusing on three event dimensions, i.e., employed weapons, deployed tactics and chosen targets, meta-graphs map the connections among temporally close attacks, capturing their operational similarities and dependencies. From these temporal meta-graphs, we derive 2-day-based time series that measure the centrality of each feature within each dimension over time. Formulating the problem in the context of the strategic behavior of terrorist actors, these multivariate temporal sequences are then utilized to learn what target types are at the highest risk of being chosen. The paper makes two contributions. First, it demonstrates that engineering the feature space via temporal meta-graphs produces richer knowledge than shallow time-series that only rely on frequency of feature occurrences. Second, the performed experiments reveal that bi-directional LSTM networks achieve superior forecasting performance compared to other algorithms, calling for future research aiming at fully discovering the potential of artificial intelligence to counter terrorist violence.

[15]  arXiv:2104.10400 [pdf, other]
Title: Federated Traffic Synthesizing and Classification Using Generative Adversarial Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

With the fast growing demand on new services and applications as well as the increasing awareness of data protection, traditional centralized traffic classification approaches are facing unprecedented challenges. This paper introduces a novel framework, Federated Generative Adversarial Networks and Automatic Classification (FGAN-AC), which integrates decentralized data synthesizing with traffic classification. FGAN-AC is able to synthesize and classify multiple types of service data traffic from decentralized local datasets without requiring a large volume of manually labeled dataset or causing any data leakage. Two types of data synthesizing approaches have been proposed and compared: computation-efficient FGAN (FGAN-\uppercase\expandafter{\romannumeral1}) and communication-efficient FGAN (FGAN-\uppercase\expandafter{\romannumeral2}). The former only implements a single CNN model for processing each local dataset and the later only requires coordination of intermediate model training parameters. An automatic data classification and model updating framework has been proposed to automatically identify unknown traffic from the synthesized data samples and create new pseudo-labels for model training. Numerical results show that our proposed framework has the ability to synthesize highly mixed service data traffic and can significantly improve the traffic classification performance compared to existing solutions.

[16]  arXiv:2104.10410 [pdf, other]
Title: Principal Component Density Estimation for Scenario Generation Using Normalizing Flows
Comments: 15 pages, 9 figures
Subjects: Machine Learning (cs.LG)

Neural networks-based learning of the distribution of non-dispatchable renewable electricity generation from sources such as photovoltaics (PV) and wind as well as load demands has recently gained attention. Normalizing flow density models have performed particularly well in this task due to the training through direct log-likelihood maximization. However, research from the field of image generation has shown that standard normalizing flows can only learn smeared-out versions of manifold distributions and can result in the generation of noisy data. To avoid the generation of time series data with unrealistic noise, we propose a dimensionality-reducing flow layer based on the linear principal component analysis (PCA) that sets up the normalizing flow in a lower-dimensional space. We train the resulting principal component flow (PCF) on data of PV and wind power generation as well as load demand in Germany in the years 2013 to 2015. The results of this investigation show that the PCF preserves critical features of the original distributions, such as the probability density and frequency behavior of the time series. The application of the PCF is, however, not limited to renewable power generation but rather extends to any data set, time series, or otherwise, which can be efficiently reduced using PCA.

[17]  arXiv:2104.10424 [pdf, other]
Title: Link Prediction on N-ary Relational Data Based on Relatedness Evaluation
Comments: Accepted to TKDE 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

With the overwhelming popularity of Knowledge Graphs (KGs), researchers have poured attention to link prediction to fill in missing facts for a long time. However, they mainly focus on link prediction on binary relational data, where facts are usually represented as triples in the form of (head entity, relation, tail entity). In practice, n-ary relational facts are also ubiquitous. When encountering such facts, existing studies usually decompose them into triples by introducing a multitude of auxiliary virtual entities and additional triples. These conversions result in the complexity of carrying out link prediction on n-ary relational data. It has even proven that they may cause loss of structure information. To overcome these problems, in this paper, we represent each n-ary relational fact as a set of its role and role-value pairs. We then propose a method called NaLP to conduct link prediction on n-ary relational data, which explicitly models the relatedness of all the role and role-value pairs in an n-ary relational fact. We further extend NaLP by introducing type constraints of roles and role-values without any external type-specific supervision, and proposing a more reasonable negative sampling mechanism. Experimental results validate the effectiveness and merits of the proposed methods.

[18]  arXiv:2104.10425 [pdf, other]
Title: Sparse-Shot Learning for Extremely Many Localisations
Comments: 14 pages, 7 figures, 5 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Object localisation is typically considered in the context of regular images, for instance depicting objects like people or cars. In these images there is typically a relatively small number of instances per image per class, which usually is manageable to annotate. However, outside the realm of regular images we are often confronted with a different situation. In computational pathology digitised tissue sections are extremely large images, whose dimensions quickly exceed 250'000x250'000 pixels, where relevant objects, such as tumour cells or lymphocytes can quickly number in the millions. Annotating them all is practically impossible and annotating sparsely a few, out of many more, is the only possibility. Unfortunately, learning from sparse annotations, or sparse-shot learning, clashes with standard supervised learning because what is not annotated is treated as a negative. However, assigning negative labels to what are true positives leads to confusion in the gradients and biased learning. To this end, we present exclusive cross entropy, which slows down the biased learning by examining the second-order loss derivatives in order to drop the loss terms corresponding to likely biased terms. Experiments on nine datasets and two different localisation tasks, detection with YOLLO and segmentation with Unet, show that we obtain considerable improvements compared to cross entropy or focal loss, while often reaching the best possible performance for the model with only 10-40 of annotations.

[19]  arXiv:2104.10450 [pdf, other]
Title: Making Differentiable Architecture Search less local
Comments: ICLR 2021 Workshop on Neural Architecture Search
Subjects: Machine Learning (cs.LG)

Neural architecture search (NAS) is a recent methodology for automating the design of neural network architectures. Differentiable neural architecture search (DARTS) is a promising NAS approach that dramatically increases search efficiency. However, it has been shown to suffer from performance collapse, where the search often leads to detrimental architectures. Many recent works try to address this issue of DARTS by identifying indicators for early stopping, regularising the search objective to reduce the dominance of some operations, or changing the parameterisation of the search problem. In this work, we hypothesise that performance collapses can arise from poor local optima around typical initial architectures and weights. We address this issue by developing a more global optimisation scheme that is able to better explore the space without changing the DARTS problem formulation. Our experiments show that our changes in the search algorithm allow the discovery of architectures with both better test performance and fewer parameters.

[20]  arXiv:2104.10453 [pdf, other]
Title: Brittle Features May Help Anomaly Detection
Comments: Accepted to Women in Computer Vision workshop at CVPR (2021)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

One-class anomaly detection is challenging. A representation that clearly distinguishes anomalies from normal data is ideal, but arriving at this representation is difficult since only normal data is available at training time. We examine the performance of representations, transferred from auxiliary tasks, for anomaly detection. Our results suggest that the choice of representation is more important than the anomaly detector used with these representations, although knowledge distillation can work better than using the representations directly. In addition, separability between anomalies and normal data is important but not the sole factor for a good representation, as anomaly detection performance is also correlated with more adversarially brittle features in the representation space. Finally, we show our configuration can detect 96.4% of anomalies in a genuine X-ray security dataset, outperforming previous results.

[21]  arXiv:2104.10455 [pdf, other]
Title: Reinforcement Learning for Traffic Signal Control: Comparison with Commercial Systems
Comments: 8 pages, 13 figures, 3 tables, conference paper
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)

Recently, Intelligent Transportation Systems are leveraging the power of increased sensory coverage and computing power to deliver data-intensive solutions achieving higher levels of performance than traditional systems. Within Traffic Signal Control (TSC), this has allowed the emergence of Machine Learning (ML) based systems. Among this group, Reinforcement Learning (RL) approaches have performed particularly well. Given the lack of industry standards in ML for TSC, literature exploring RL often lacks comparison against commercially available systems and straightforward formulations of how the agents operate. Here we attempt to bridge that gap. We propose three different architectures for TSC RL agents and compare them against the currently used commercial systems MOVA, SurTrac and Cyclic controllers and provide pseudo-code for them. The agents use variations of Deep Q-Learning and Actor Critic, using states and rewards based on queue lengths. Their performance is compared in across different map scenarios with variable demand, assessing them in terms of the global delay and average queue length. We find that the RL-based systems can significantly and consistently achieve lower delays when compared with existing commercial systems.

[22]  arXiv:2104.10459 [pdf, ps, other]
Title: Jacobian Regularization for Mitigating Universal Adversarial Perturbations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

Universal Adversarial Perturbations (UAPs) are input perturbations that can fool a neural network on large sets of data. They are a class of attacks that represents a significant threat as they facilitate realistic, practical, and low-cost attacks on neural networks. In this work, we derive upper bounds for the effectiveness of UAPs based on norms of data-dependent Jacobians. We empirically verify that Jacobian regularization greatly increases model robustness to UAPs by up to four times whilst maintaining clean performance. Our theoretical analysis also allows us to formulate a metric for the strength of shared adversarial perturbations between pairs of inputs. We apply this metric to benchmark datasets and show that it is highly correlated with the actual observed robustness. This suggests that realistic and practical universal attacks can be reliably mitigated without sacrificing clean accuracy, which shows promise for the robustness of machine learning systems.

[23]  arXiv:2104.10461 [pdf, other]
Title: Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning
Comments: Accepted by the 2021 International Joint Conference on Neural Networks (IJCNN 2021)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Deploying deep learning services for time-sensitive and resource-constrained settings such as IoT using edge computing systems is a challenging task that requires dynamic adjustment of inference time. Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy. To mitigate this cost, in this paper we introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning, a training strategy for neural networks that imitates human learning by sorting the training samples based on their difficulty and gradually introducing them to the network. Experiments on CIFAR-10 and CIFAR-100 datasets and various configurations of multi-exit architectures show that our method consistently improves the accuracy of early exits compared to the standard training approach.

[24]  arXiv:2104.10482 [pdf, other]
Title: GraphSVX: Shapley Value Explanations for Graph Neural Networks
Subjects: Machine Learning (cs.LG)

Graph Neural Networks (GNNs) achieve significant performance for various learning tasks on geometric data due to the incorporation of graph structure into the learning of node representations, which renders their comprehension challenging. In this paper, we first propose a unified framework satisfied by most existing GNN explainers. Then, we introduce GraphSVX, a post hoc local model-agnostic explanation method specifically designed for GNNs. GraphSVX is a decomposition technique that captures the "fair" contribution of each feature and node towards the explained prediction by constructing a surrogate model on a perturbed dataset. It extends to graphs and ultimately provides as explanation the Shapley Values from game theory. Experiments on real-world and synthetic datasets demonstrate that GraphSVX achieves state-of-the-art performance compared to baseline models while presenting core theoretical and human-centric properties.

[25]  arXiv:2104.10483 [pdf, other]
Title: Adaptive learning for financial markets mixing model-based and model-free RL for volatility targetting
Comments: 8 pages, 10 figures
Subjects: Machine Learning (cs.LG); Mathematical Finance (q-fin.MF); Portfolio Management (q-fin.PM)

Model-Free Reinforcement Learning has achieved meaningful results in stable environments but, to this day, it remains problematic in regime changing environments like financial markets. In contrast, model-based RL is able to capture some fundamental and dynamical concepts of the environment but suffer from cognitive bias. In this work, we propose to combine the best of the two techniques by selecting various model-based approaches thanks to Model-Free Deep Reinforcement Learning. Using not only past performance and volatility, we include additional contextual information such as macro and risk appetite signals to account for implicit regime changes. We also adapt traditional RL methods to real-life situations by considering only past data for the training sets. Hence, we cannot use future information in our training data set as implied by K-fold cross validation. Building on traditional statistical methods, we use the traditional "walk-forward analysis", which is defined by successive training and testing based on expanding periods, to assert the robustness of the resulting agent.
Finally, we present the concept of statistical difference's significance based on a two-tailed T-test, to highlight the ways in which our models differ from more traditional ones. Our experimental results show that our approach outperforms traditional financial baseline portfolio models such as the Markowitz model in almost all evaluation metrics commonly used in financial mathematics, namely net performance, Sharpe and Sortino ratios, maximum drawdown, maximum drawdown over volatility.

[26]  arXiv:2104.10496 [pdf, other]
Title: Comparing merging behaviors observed in naturalistic data with behaviors generated by a machine learned model
Comments: This paper has been submitted to 24th IEEE International Conference on Intelligent Transportation - ITSC2021, September 19-22, 2021 Indianapolis, IN, United States
Subjects: Machine Learning (cs.LG)

There is quickly growing literature on machine-learned models that predict human driving trajectories in road traffic. These models focus their learning on low-dimensional error metrics, for example average distance between model-generated and observed trajectories. Such metrics permit relative comparison of models, but do not provide clearly interpretable information on how close to human behavior the models actually come, for example in terms of higher-level behavior phenomena that are known to be present in human driving. We study highway driving as an example scenario, and introduce metrics to quantitatively demonstrate the presence, in a naturalistic dataset, of two familiar behavioral phenomena: (1) The kinematics-dependent contest, between on-highway and on-ramp vehicles, of who passes the merging point first. (2) Courtesy lane changes away from the outermost lane, to leave space for a merging vehicle. Applying the exact same metrics to the output of a state-of-the-art machine-learned model, we show that the model is capable of reproducing the former phenomenon, but not the latter. We argue that this type of behavioral analysis provides information that is not available from conventional model-fitting metrics, and that it may be useful to analyze (and possibly fit) models also based on these types of behavioral criteria.

[27]  arXiv:2104.10505 [pdf, other]
Title: Interpretation of multi-label classification models using shapley values
Authors: Shikun Chen
Comments: 11 pages, 7 figures
Subjects: Machine Learning (cs.LG)

Multi-label classification is a type of classification task, it is used when there are two or more classes, and the data point we want to predict may belong to none of the classes or all of them at the same time. In the real world, many applications are actually multi-label involved, including information retrieval, multimedia content annotation, web mining, and so on. A game theory-based framework known as SHapley Additive exPlanations (SHAP) has been applied to explain various supervised learning models without being aware of the exact model. Herein, this work further extends the explanation of multi-label classification task by using the SHAP methodology. The experiment demonstrates a comprehensive comparision of different algorithms on well known multi-label datasets and shows the usefulness of the interpretation.

[28]  arXiv:2104.10519 [pdf]
Title: Bearings Fault Detection Using Hidden Markov Models and Principal Component Analysis Enhanced Features
Subjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO)

Asset health monitoring continues to be of increasing importance on productivity, reliability, and cost reduction. Early Fault detection is a keystone of health management as part of the emerging Prognostics and Health Management (PHM) philosophy. This paper proposes a Hidden Markov Model (HMM) to assess the machine health degradation. using Principal Component Analysis (PCA) to enhance features extracted from vibration signals is considered. The enhanced features capture the second order structure of the data. The experimental results based on a bearing test bed show the plausibility of the proposed method.

[29]  arXiv:2104.10527 [pdf, other]
Title: Stateless Neural Meta-Learning using Second-Order Gradients
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Deep learning typically requires large data sets and much compute power for each new problem that is learned. Meta-learning can be used to learn a good prior that facilitates quick learning, thereby relaxing these requirements so that new tasks can be learned quicker; two popular approaches are MAML and the meta-learner LSTM. In this work, we compare the two and formally show that the meta-learner LSTM subsumes MAML. Combining this insight with recent empirical findings, we construct a new algorithm (dubbed TURTLE) which is simpler than the meta-learner LSTM yet more expressive than MAML. TURTLE outperforms both techniques at few-shot sine wave regression and image classification on miniImageNet and CUB without any additional hyperparameter tuning, at a computational cost that is comparable with second-order MAML. The key to TURTLE's success lies in the use of second-order gradients, which also significantly increases the performance of the meta-learner LSTM by 1-6% accuracy.

[30]  arXiv:2104.10529 [pdf, other]
Title: A Lightweight Concept Drift Detection and Adaptation Framework for IoT Data Streams
Comments: Accepted and to appear in IEEE Internet of Things Magazine; Code is available at Github link:this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In recent years, with the increasing popularity of "Smart Technology", the number of Internet of Things (IoT) devices and systems have surged significantly. Various IoT services and functionalities are based on the analytics of IoT streaming data. However, IoT data analytics faces concept drift challenges due to the dynamic nature of IoT systems and the ever-changing patterns of IoT data streams. In this article, we propose an adaptive IoT streaming data analytics framework for anomaly detection use cases based on optimized LightGBM and concept drift adaptation. A novel drift adaptation method named Optimized Adaptive and Sliding Windowing (OASW) is proposed to adapt to the pattern changes of online IoT data streams. Experiments on two public datasets show the high accuracy and efficiency of our proposed adaptive LightGBM model compared against other state-of-the-art approaches. The proposed adaptive LightGBM model can perform continuous learning and drift adaptation on IoT data streams without human intervention.

[31]  arXiv:2104.10544 [pdf, other]
Title: Lossless Compression with Latent Variable Models
Authors: James Townsend
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Computation (stat.CO); Machine Learning (stat.ML)

We develop a simple and elegant method for lossless compression using latent variable models, which we call 'bits back with asymmetric numeral systems' (BB-ANS). The method involves interleaving encode and decode steps, and achieves an optimal rate when compressing batches of data. We demonstrate it firstly on the MNIST test set, showing that state-of-the-art lossless compression is possible using a small variational autoencoder (VAE) model. We then make use of a novel empirical insight, that fully convolutional generative models, trained on small images, are able to generalize to images of arbitrary size, and extend BB-ANS to hierarchical latent variable models, enabling state-of-the-art lossless compression of full-size colour images from the ImageNet dataset. We describe 'Craystack', a modular software framework which we have developed for rapid prototyping of compression using deep generative models.

[32]  arXiv:2104.10555 [pdf, other]
Title: MLDS: A Dataset for Weight-Space Analysis of Neural Networks
Authors: John Clemens
Comments: For further information and download links, see this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Neural networks are powerful models that solve a variety of complex real-world problems. However, the stochastic nature of training and large number of parameters in a typical neural model makes them difficult to evaluate via inspection. Research shows this opacity can hide latent undesirable behavior, be it from poorly representative training data or via malicious intent to subvert the behavior of the network, and that this behavior is difficult to detect via traditional indirect evaluation criteria such as loss. Therefore, it is time to explore direct ways to evaluate a trained neural model via its structure and weights. In this paper we present MLDS, a new dataset consisting of thousands of trained neural networks with carefully controlled parameters and generated via a global volunteer-based distributed computing platform. This dataset enables new insights into both model-to-model and model-to-training-data relationships. We use this dataset to show clustering of models in weight-space with identical training data and meaningful divergence in weight-space with even a small change to the training data, suggesting that weight-space analysis is a viable and effective alternative to loss for evaluating neural networks.

[33]  arXiv:2104.10569 [pdf, other]
Title: GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy
Comments: 15 pages, 9 figures, submitted to VLDB 2022
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Graph neural networks (GNNs) have been demonstrated as a powerful tool for analysing non-Euclidean graph data. However, the lack of efficient distributed graph learning systems severely hinders applications of GNNs, especially when graphs are big, of high density or with highly skewed node degree distributions. In this paper, we present a new distributed graph learning system GraphTheta, which supports multiple training strategies and enables efficient and scalable learning on big graphs. GraphTheta implements both localized and globalized graph convolutions on graphs, where a new graph learning abstraction NN-TGAR is designed to bridge the gap between graph processing and graph learning frameworks. A distributed graph engine is proposed to conduct the stochastic gradient descent optimization with hybrid-parallel execution. Moreover, we add support for a new cluster-batched training strategy in addition to the conventional global-batched and mini-batched ones. We evaluate GraphTheta using a number of network data with network size ranging from small-, modest- to large-scale. Experimental results show that GraphTheta scales almost linearly to 1,024 workers and trains an in-house developed GNN model within 26 hours on Alipay dataset of 1.4 billion nodes and 4.1 billion attributed edges. Moreover, GraphTheta also obtains better prediction results than the state-of-the-art GNN methods. To the best of our knowledge, this work represents the largest edge-attributed GNN learning task conducted on a billion-scale network in the literature.

[34]  arXiv:2104.10586 [pdf, other]
Title: Mixture of Robust Experts (MoRE): A Flexible Defense Against Multiple Perturbations
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

To tackle the susceptibility of deep neural networks to adversarial examples, the adversarial training has been proposed which provides a notion of security through an inner maximization problem presenting the first-order adversaries embedded within the outer minimization of the training loss. To generalize the adversarial robustness over different perturbation types, the adversarial training method has been augmented with the improved inner maximization presenting a union of multiple perturbations e.g., various $\ell_p$ norm-bounded perturbations. However, the improved inner maximization only enjoys limited flexibility in terms of the allowable perturbation types. In this work, through a gating mechanism, we assemble a set of expert networks, each one either adversarially trained to deal with a particular perturbation type or normally trained for boosting accuracy on clean data. The gating module assigns weights dynamically to each expert to achieve superior accuracy under various data types e.g., adversarial examples, adverse weather perturbations, and clean input. In order to deal with the obfuscated gradients issue, the training of the gating module is conducted together with fine-tuning of the last fully connected layers of expert networks through adversarial training approach. Using extensive experiments, we show that our Mixture of Robust Experts (MoRE) approach enables flexible integration of a broad range of robust experts with superior performance.

[35]  arXiv:2104.10610 [pdf, ps, other]
Title: Policy Fusion for Adaptive and Customizable Reinforcement Learning Agents
Subjects: Machine Learning (cs.LG)

In this article we study the problem of training intelligent agents using Reinforcement Learning for the purpose of game development. Unlike systems built to replace human players and to achieve super-human performance, our agents aim to produce meaningful interactions with the player, and at the same time demonstrate behavioral traits as desired by game designers. We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy which comprises all these behaviors. To this end, we propose four different policy fusion methods for combining pre-trained policies. We further demonstrate how these methods can be used in combination with Inverse Reinforcement Learning in order to create intelligent agents with specific behavioral styles as chosen by game designers, without having to define many and possibly poorly-designed reward functions. Experiments on two different environments indicate that entropy-weighted policy fusion significantly outperforms all others. We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.

[36]  arXiv:2104.10625 [pdf, other]
Title: Searching to Sparsify Tensor Decomposition for N-ary Relational Data
Comments: WebConf 2021
Subjects: Machine Learning (cs.LG)

Tensor, an extension of the vector and matrix to the multi-dimensional case, is a natural way to describe the N-ary relational data. Recently, tensor decomposition methods have been introduced into N-ary relational data and become state-of-the-art on embedding learning. However, the performance of existing tensor decomposition methods is not as good as desired. First, they suffer from the data-sparsity issue since they can only learn from the N-ary relational data with a specific arity, i.e., parts of common N-ary relational data. Besides, they are neither effective nor efficient enough to be trained due to the over-parameterization problem. In this paper, we propose a novel method, i.e., S2S, for effectively and efficiently learning from the N-ary relational data. Specifically, we propose a new tensor decomposition framework, which allows embedding sharing to learn from facts with mixed arity. Since the core tensors may still suffer from the over-parameterization, we propose to reduce parameters by sparsifying the core tensors while retaining their expressive power using neural architecture search (NAS) techniques, which can search for data-dependent architectures. As a result, the proposed S2S not only guarantees to be expressive but also efficiently learns from mixed arity. Finally, empirical results have demonstrated that S2S is efficient to train and achieves state-of-the-art performance.

[37]  arXiv:2104.10631 [pdf, other]
Title: MetricOpt: Learning to Optimize Black-Box Evaluation Metrics
Comments: CVPR 2021 (Oral), Supplementary Materials added
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

We study the problem of directly optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall. Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown. We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations. The learned value function is easily pluggable into existing optimizers like SGD and Adam, and is effective for rapidly finetuning a pre-trained model. This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision. MetricOpt achieves state-of-the-art performance on a variety of metrics for (image) classification, image retrieval and object detection. Solid benefits are found over competing methods, which often involve complex loss design or adaptation. MetricOpt also generalizes well to new tasks and model architectures.

[38]  arXiv:2104.10637 [pdf, ps, other]
Title: Robust Kernel-based Distribution Regression
Comments: 29 pages
Subjects: Machine Learning (cs.LG); Functional Analysis (math.FA); Machine Learning (stat.ML)

Regularization schemes for regression have been widely studied in learning theory and inverse problems. In this paper, we study distribution regression (DR) which involves two stages of sampling, and aims at regressing from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS). Recently, theoretical analysis on DR has been carried out via kernel ridge regression and several learning behaviors have been observed. However, the topic has not been explored and understood beyond the least square based DR. By introducing a robust loss function $l_{\sigma}$ for two-stage sampling problems, we present a novel robust distribution regression (RDR) scheme. With a windowing function $V$ and a scaling parameter $\sigma$ which can be appropriately chosen, $l_{\sigma}$ can include a wide range of popular used loss functions that enrich the theme of DR. Moreover, the loss $l_{\sigma}$ is not necessarily convex, hence largely improving the former regression class (least square) in the literature of DR. The learning rates under different regularity ranges of the regression function $f_{\rho}$ are comprehensively studied and derived via integral operator techniques. The scaling parameter $\sigma$ is shown to be crucial in providing robustness and satisfactory learning rates of RDR.

[39]  arXiv:2104.10644 [pdf, other]
Title: A Comparative Study of Using Spatial-Temporal Graph Convolutional Networks for Predicting Availability in Bike Sharing Schemes
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

Accurately forecasting transportation demand is crucial for efficient urban traffic guidance, control and management. One solution to enhance the level of prediction accuracy is to leverage graph convolutional networks (GCN), a neural network based modelling approach with the ability to process data contained in graph based structures. As a powerful extension of GCN, a spatial-temporal graph convolutional network (ST-GCN) aims to capture the relationship of data contained in the graphical nodes across both spatial and temporal dimensions, which presents a novel deep learning paradigm for the analysis of complex time-series data that also involves spatial information as present in transportation use cases. In this paper, we present an Attention-based ST-GCN (AST-GCN) for predicting the number of available bikes in bike-sharing systems in cities, where the attention-based mechanism is introduced to further improve the performance of a ST-GCN. Furthermore, we also discuss the impacts of different modelling methods of adjacency matrices on the proposed architecture. Our experimental results are presented using two real-world datasets, Dublinbikes and NYC-Citi Bike, to illustrate the efficacy of our proposed model which outperforms the majority of existing approaches.

[40]  arXiv:2104.10645 [pdf, other]
Title: Measuring what Really Matters: Optimizing Neural Networks for TinyML
Subjects: Machine Learning (cs.LG)

With the surge of inexpensive computational and memory resources, neural networks (NNs) have experienced an unprecedented growth in architectural and computational complexity. Introducing NNs to resource-constrained devices enables cost-efficient deployments, widespread availability, and the preservation of sensitive data. This work addresses the challenges of bringing Machine Learning to MCUs, where we focus on the ubiquitous ARM Cortex-M architecture. The detailed effects and trade-offs that optimization methods, software frameworks, and MCU hardware architecture have on key performance metrics such as inference latency and energy consumption have not been previously studied in depth for state-of-the-art frameworks such as TensorFlow Lite Micro. We find that empirical investigations which measure the perceptible metrics - performance as experienced by the user - are indispensable, as the impact of specialized instructions and layer types can be subtle. To this end, we propose an implementation-aware design as a cost-effective method for verification and benchmarking. Employing our developed toolchain, we demonstrate how existing NN deployments on resource-constrained devices can be improved by systematically optimizing NNs to their targeted application scenario.

[41]  arXiv:2104.10680 [pdf, other]
Title: Causal-TGAN: Generating Tabular Data Using Causal Generative Adversarial Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Synthetic data generation becomes prevalent as a solution to privacy leakage and data shortage. Generative models are designed to generate a realistic synthetic dataset, which can precisely express the data distribution for the real dataset. The generative adversarial networks (GAN), which gain great success in the computer vision fields, are doubtlessly used for synthetic data generation. Though there are prior works that have demonstrated great progress, most of them learn the correlations in the data distributions rather than the true processes in which the datasets are naturally generated. Correlation is not reliable for it is a statistical technique that only tells linear dependencies and is easily affected by the dataset's bias. Causality, which encodes all underlying factors of how the real data be naturally generated, is more reliable than correlation. In this work, we propose a causal model named Causal Tabular Generative Neural Network (Causal-TGAN) to generate synthetic tabular data using the tabular data's causal information. Extensive experiments on both simulated datasets and real datasets demonstrate the better performance of our method when given the true causal graph and a comparable performance when using the estimated causal graph.

Cross-lists for Thu, 22 Apr 21

[42]  arXiv:2104.10180 (cross-list from physics.data-an) [pdf]
Title: Robust Feature Disentanglement in Imaging Data via Joint Invariant Variational Autoencoders: from Cards to Atoms
Subjects: Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (cs.LG)

Recent advances in imaging from celestial objects in astronomy visualized via optical and radio telescopes to atoms and molecules resolved via electron and probe microscopes are generating immense volumes of imaging data, containing information about the structure of the universe from atomic to astronomic levels. The classical deep convolutional neural network architectures traditionally perform poorly on the data sets having a significant orientational disorder, that is, having multiple copies of the same or similar object in arbitrary orientation in the image plane. Similarly, while clustering methods are well suited for classification into discrete classes and manifold learning and variational autoencoders methods can disentangle representations of the data, the combined problem is ill-suited to a classical non-supervised learning paradigm. Here we introduce a joint rotationally (and translationally) invariant variational autoencoder (j-trVAE) that is ideally suited to the solution of such a problem. The performance of this method is validated on several synthetic data sets and extended to high-resolution imaging data of electron and scanning probe microscopy. We show that latent space behaviors directly comport to the known physics of ferroelectric materials and quantum systems. We further note that the engineering of the latent space structure via imposed topological structure or directed graph relationship allows for applications in topological discovery and causal physical learning.

[43]  arXiv:2104.10207 (cross-list from cond-mat.dis-nn) [pdf]
Title: Decoding the shift-invariant data: applications for band-excitation scanning probe microscopy
Comments: 17 pages, 7 figures
Subjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

A shift-invariant variational autoencoder (shift-VAE) is developed as an unsupervised method for the analysis of spectral data in the presence of shifts along the parameter axis, disentangling the physically-relevant shifts from other latent variables. Using synthetic data sets, we show that the shift-VAE latent variables closely match the ground truth parameters. The shift VAE is extended towards the analysis of band-excitation piezoresponse force microscopy (BE-PFM) data, disentangling the resonance frequency shifts from the peak shape parameters in a model-free unsupervised manner. The extensions of this approach towards denoising of data and model-free dimensionality reduction in imaging and spectroscopic data are further demonstrated. This approach is universal and can also be extended to analysis of X-ray diffraction, photoluminescence, Raman spectra, and other data sets.

[44]  arXiv:2104.10213 (cross-list from cs.CL) [pdf, ps, other]
Title: Machine Learning Meets Natural Language Processing -- The story so far
Comments: 13 pages, 5 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Natural Language Processing (NLP) has evolved significantly over the last decade. This paper highlights the most important milestones of this period while trying to pinpoint the contribution of each individual model and algorithm to the overall progress. Furthermore, it focuses on issues still remaining to be solved, emphasizing the groundbreaking proposals of Transformers, BERT, and all the similar attention-based models.

[45]  arXiv:2104.10215 (cross-list from cs.CL) [pdf, other]
Title: Evaluating the Impact of a Hierarchical Discourse Representation on Entity Coreference Resolution Performance
Comments: Also contains the Appendix. Accepted to NAACL 2021 as a short paper
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recent work on entity coreference resolution (CR) follows current trends in Deep Learning applied to embeddings and relatively simple task-related features. SOTA models do not make use of hierarchical representations of discourse structure. In this work, we leverage automatically constructed discourse parse trees within a neural approach and demonstrate a significant improvement on two benchmark entity coreference-resolution datasets. We explore how the impact varies depending upon the type of mention.

[46]  arXiv:2104.10217 (cross-list from eess.AS) [pdf, other]
Title: Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets
Comments: Accepted at QoMEX 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)

The ground truth used for training image, video, or speech quality prediction models is based on the Mean Opinion Scores (MOS) obtained from subjective experiments. Usually, it is necessary to conduct multiple experiments, mostly with different test participants, to obtain enough data to train quality models based on machine learning. Each of these experiments is subject to an experiment-specific bias, where the rating of the same file may be substantially different in two experiments (e.g. depending on the overall quality distribution). These different ratings for the same distortion levels confuse neural networks during training and lead to lower performance. To overcome this problem, we propose a bias-aware loss function that estimates each dataset's biases during training with a linear function and considers it while optimising the network weights. We prove the efficiency of the proposed method by training and validating quality prediction models on synthetic and subjective image and speech quality datasets.

[47]  arXiv:2104.10219 (cross-list from eess.SY) [pdf, other]
Title: Scalable Synthesis of Verified Controllers in Deep Reinforcement Learning
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

There has been significant recent interest in devising verification techniques for learning-enabled controllers (LECs) that manage safety-critical systems. Given the opacity and lack of interpretability of the neural policies that govern the behavior of such controllers, many existing approaches enforce safety properties through the use of shields, a dynamic monitoring and repair mechanism that ensures a LEC does not emit actions that would violate desired safety conditions. These methods, however, have shown to have significant scalability limitations because verification costs grow as problem dimensionality and objective complexity increase. In this paper, we propose a new automated verification pipeline capable of synthesizing high-quality safety shields even when the problem domain involves hundreds of dimensions, or when the desired objective involves stochastic perturbations, liveness considerations, and other complex non-functional properties. Our key insight involves separating safety verification from neural controller, using pre-computed verified safety shields to constrain neural controller training which does not only focus on safety. Experimental results over a range of realistic high-dimensional deep RL benchmarks demonstrate the effectiveness of our approach.

[48]  arXiv:2104.10241 (cross-list from cs.AI) [pdf, other]
Title: Predicting Human Trajectories by Learning and Matching Patterns
Authors: Dapeng Zhao
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Thesis document of the degree of Master of Science in Robotics of Carnegie Mellon University School of Computer Science.

[49]  arXiv:2104.10249 (cross-list from cs.CV) [pdf, other]
Title: Superpixels and Graph Convolutional Neural Networks for Efficient Detection of Nutrient Deficiency Stress from Aerial Imagery
Comments: 10 pages, 3 figures, 1 table, 1 algorithm
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Advances in remote sensing technology have led to the capture of massive amounts of data. Increased image resolution, more frequent revisit times, and additional spectral channels have created an explosion in the amount of data that is available to provide analyses and intelligence across domains, including agriculture. However, the processing of this data comes with a cost in terms of computation time and money, both of which must be considered when the goal of an algorithm is to provide real-time intelligence to improve efficiencies. Specifically, we seek to identify nutrient deficient areas from remotely sensed data to alert farmers to regions that require attention; detection of nutrient deficient areas is a key task in precision agriculture as farmers must quickly respond to struggling areas to protect their harvests. Past methods have focused on pixel-level classification (i.e. semantic segmentation) of the field to achieve these tasks, often using deep learning models with tens-of-millions of parameters. In contrast, we propose a much lighter graph-based method to perform node-based classification. We first use Simple Linear Iterative Cluster (SLIC) to produce superpixels across the field. Then, to perform segmentation across the non-Euclidean domain of superpixels, we leverage a Graph Convolutional Neural Network (GCN). This model has 4-orders-of-magnitude fewer parameters than a CNN model and trains in a matter of minutes.

[50]  arXiv:2104.10299 (cross-list from cs.GR) [pdf, other]
Title: Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices
Comments: Project page: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers. Previous works for cross-modal face synthesis study image generation from voices. However, image synthesis includes variations such as hairstyles, backgrounds, and facial textures, that are arguably irrelevant to voice or without direct studies to show correlations. We instead investigate the ability to reconstruct 3D faces to concentrate on only geometry, which is more physiologically grounded. We propose both the supervised learning and unsupervised learning frameworks. Especially we demonstrate how unsupervised learning is possible in the absence of a direct voice-to-3D-face dataset under limited availability of 3D face scans when the model is equipped with knowledge distillation. To evaluate the performance, we also propose several metrics to measure the geometric fitness of two 3D faces based on points, lines, and regions. We find that 3D face shapes can be reconstructed from voices. Experimental results suggest that 3D faces can be reconstructed from voices, and our method can improve the performance over the baseline. The best performance gains (15% - 20%) on ear-to-ear distance ratio metric (ER) coincides with the intuition that one can roughly envision whether a speaker's face is overall wider or thinner only from a person's voice. See our project page for codes and data.

[51]  arXiv:2104.10328 (cross-list from eess.AS) [pdf, ps, other]
Title: Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

This paper proposes a novel label-synchronous speech-to-text alignment technique for automatic speech recognition (ASR). The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pairs of speech and text. Unlike conventional methods based on frame-synchronous prediction, the proposed method re-defines the speech-to-text alignment as a label-synchronous text mapping problem. This enables an accurate alignment benefiting from the strong inference ability of the state-of-the-art attention-based encoder-decoder models, which cannot be applied to the conventional methods. Two different Transformer models named forward Transformer and backward Transformer are respectively used for estimating an initial and final tokens of a given speech segment based on end-of-sentence prediction with teacher-forcing. Experiments using the corpus of spontaneous Japanese (CSJ) demonstrate that the proposed method provides an accurate utterance-wise alignment, that matches the manually annotated alignment with as few as 0.2% errors. It is also confirmed that a Transformer-based hybrid CTC/Attention ASR model using the aligned speech and text pairs as an additional training data reduces character error rates relatively up to 59.0%, which is significantly better than 39.0% reduction by a conventional alignment method based on connectionist temporal classification model.

[52]  arXiv:2104.10343 (cross-list from cs.CL) [pdf, other]
Title: Sensitivity as a Complexity Measure for Sequence Classification Tasks
Comments: Accepted by TACL. This is a pre-MIT Press publication version
Subjects: Computation and Language (cs.CL); Computational Complexity (cs.CC); Machine Learning (cs.LG)

We introduce a theoretical framework for understanding and predicting the complexity of sequence classification tasks, using a novel extension of the theory of Boolean function sensitivity. The sensitivity of a function, given a distribution over input sequences, quantifies the number of disjoint subsets of the input sequence that can each be individually changed to change the output. We argue that standard sequence classification methods are biased towards learning low-sensitivity functions, so that tasks requiring high sensitivity are more difficult. To that end, we show analytically that simple lexical classifiers can only express functions of bounded sensitivity, and we show empirically that low-sensitivity functions are easier to learn for LSTMs. We then estimate sensitivity on 15 NLP tasks, finding that sensitivity is higher on challenging tasks collected in GLUE than on simple text classification tasks, and that sensitivity predicts the performance both of simple lexical classifiers and of vanilla BiLSTMs without pretrained contextualized embeddings. Within a task, sensitivity predicts which inputs are hard for such simple models. Our results suggest that the success of massively pretrained contextual representations stems in part because they provide representations from which information can be extracted by low-sensitivity decoders.

[53]  arXiv:2104.10347 (cross-list from stat.ML) [pdf, ps, other]
Title: A class of network models recoverable by spectral clustering
Comments: 15 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Finding communities in networks is a problem that remains difficult, in spite of the amount of attention it has recently received. The Stochastic Block-Model (SBM) is a generative model for graphs with "communities" for which, because of its simplicity, the theoretical understanding has advanced fast in recent years. In particular, there have been various results showing that simple versions of spectral clustering using the Normalized Laplacian of the graph can recover the communities almost perfectly with high probability. Here we show that essentially the same algorithm used for the SBM and for its extension called Degree-Corrected SBM, works on a wider class of Block-Models, which we call Preference Frame Models, with essentially the same guarantees. Moreover, the parametrization we introduce clearly exhibits the free parameters needed to specify this class of models, and results in bounds that expose with more clarity the parameters that control the recovery error in this model class.

[54]  arXiv:2104.10376 (cross-list from cs.CV) [pdf, other]
Title: Towards Corruption-Agnostic Robust Domain Adaptation
Comments: The first literature to investigate the topic of corruption-agnostic robust domain adaptation, a new practical and challenging domain adaptation setting
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Big progress has been achieved in domain adaptation in decades. Existing works are always based on an ideal assumption that testing target domain are i.i.d. with training target domains. However, due to unpredictable corruptions (e.g., noise and blur) in real data like web images, domain adaptation methods are increasingly required to be corruption robust on target domains. In this paper, we investigate a new task, Corruption-agnostic Robust Domain Adaptation (CRDA): to be accurate on original data and robust against unavailable-for-training corruptions on target domains. This task is non-trivial due to large domain discrepancy and unsupervised target domains. We observe that simple combinations of popular methods of domain adaptation and corruption robustness have sub-optimal CRDA results. We propose a new approach based on two technical insights into CRDA: 1) an easy-to-plug module called Domain Discrepancy Generator (DDG) that generates samples that enlarge domain discrepancy to mimic unpredictable corruptions; 2) a simple but effective teacher-student scheme with contrastive loss to enhance the constraints on target domains. Experiments verify that DDG keeps or even improves performance on original data and achieves better corruption robustness that baselines.

[55]  arXiv:2104.10378 (cross-list from cs.IT) [pdf, ps, other]
Title: Wireless Sensing With Deep Spectrogram Network and Primitive Based Autoregressive Hybrid Channel Model
Comments: 12 pages, 5 pages, submitted to IEEE SPAWC 2021
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)

Human motion recognition (HMR) based on wireless sensing is a low-cost technique for scene understanding. Current HMR systems adopt support vector machines (SVMs) and convolutional neural networks (CNNs) to classify radar signals. However, whether a deeper learning model could improve the system performance is currently not known. On the other hand, training a machine learning model requires a large dataset, but data gathering from experiment is cost-expensive and time-consuming. Although wireless channel models can be adopted for dataset generation, current channel models are mostly designed for communication rather than sensing. To address the above problems, this paper proposes a deep spectrogram network (DSN) by leveraging the residual mapping technique to enhance the HMR performance. Furthermore, a primitive based autoregressive hybrid (PBAH) channel model is developed, which facilitates efficient training and testing dataset generation for HMR in a virtual environment. Experimental results demonstrate that the proposed PBAH channel model matches the actual experimental data very well and the proposed DSN achieves significantly smaller recognition error than that of CNN.

[56]  arXiv:2104.10401 (cross-list from cs.CV) [pdf, ps, other]
Title: Multi-Attention-Based Soft Partition Network for Vehicle Re-Identification
Comments: 10 pages, 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Vehicle re-identification (Re-ID) distinguishes between the same vehicle and other vehicles in images. It is challenging due to significant intra-instance differences between identical vehicles from different views and subtle inter-instance differences of similar vehicles. Researchers have tried to address this problem by extracting features robust to variations of viewpoints and environments. More recently, they tried to improve performance by using additional metadata such as key points, orientation, and temporal information. Although these attempts have been relatively successful, they all require expensive annotations. Therefore, this paper proposes a novel deep neural network called a multi-attention-based soft partition (MUSP) network to solve this problem. This network does not use metadata and only uses multiple soft attentions to identify a specific vehicle area. This function was performed by metadata in previous studies. Experiments verified that MUSP achieved state-of-the-art (SOTA) performance for the VehicleID dataset without any additional annotations and was comparable to VeRi-776 and VERI-Wild.

[57]  arXiv:2104.10403 (cross-list from cs.IT) [pdf, ps, other]
Title: Model-aided Deep Reinforcement Learning for Sample-efficient UAV Trajectory Design in IoT Networks
Comments: 6 pages, 2 figures, submitted to GLOBECOM 2021
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

Deep Reinforcement Learning (DRL) has become a prominent paradigm to design trajectories for autonomous unmanned aerial vehicles (UAV) used as flying access points in the context of cellular or Internet of Things (IoT) connectivity. However, the prohibitively high training data demand severely restricts the applicability of RL-based trajectory planning in real-world missions. We propose a model-aided deep Q-learning approach that, in contrast to previous work, requires a minimum of expensive training data samples and is able to guide a flight-time restricted UAV on a data harvesting mission without prior knowledge of wireless channel characteristics and limited knowledge of wireless node locations. By exploiting some known reference wireless node positions and channel gain measurements, we seek to learn a model of the environment by estimating unknown node positions and learning the wireless channel characteristics. Interaction with the model allows us to train a deep Q-network (DQN) to approximate the optimal UAV control policy. We show that in comparison with standard DRL approaches, the proposed model-aided approach requires at least one order of magnitude less training data samples to reach identical data collection performance, hence offering a first step towards making DRL a viable solution to the problem.

[58]  arXiv:2104.10415 (cross-list from cs.AR) [pdf, other]
Title: Tackling Variabilities in Autonomous Driving
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

The state-of-the-art driving automation system demands extreme computational resources to meet rigorous accuracy and latency requirements. Though emerging driving automation computing platforms are based on ASIC to provide better performance and power guarantee, building such an accelerator-based computing platform for driving automation still present challenges. First, the workloads mix and performance requirements exposed to driving automation system present significant variability. Second, with more cameras/sensors integrated in a future fully autonomous driving vehicle, a heterogeneous multi-accelerator architecture substrate is needed that requires a design space exploration for a new form of parallelism. In this work, we aim to extensively explore the above system design challenges and these challenges motivate us to propose a comprehensive framework that synergistically handles the heterogeneous hardware accelerator design principles, system design criteria, and task scheduling mechanism. Specifically, we propose a novel heterogeneous multi-core AI accelerator (HMAI) to provide the hardware substrate for the driving automation tasks with variability. We also define system design criteria to better utilize hardware resources and achieve increased throughput while satisfying the performance and energy restrictions. Finally, we propose a deep reinforcement learning (RL)-based task scheduling mechanism FlexAI, to resolve task mapping issue. Experimental results show that with FlexAI scheduling, basically 100% tasks in each driving route can be processed by HMAI within their required period to ensure safety, and FlexAI can also maximally reduce the breaking distance up to 96% as compared to typical heuristics and guided random-search-based algorithms.

[59]  arXiv:2104.10441 (cross-list from cs.CL) [pdf, ps, other]
Title: Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Most work in NLP makes the assumption that it is desirable to develop solutions in the native language in question. There is consequently a strong trend towards building native language models even for low-resource languages. This paper questions this development, and explores the idea of simply translating the data into English, thereby enabling the use of pretrained, and large-scale, English language models. We demonstrate empirically that a large English language model coupled with modern machine translation outperforms native language models in most Scandinavian languages. The exception to this is Finnish, which we assume is due to inferior translation quality. Our results suggest that machine translation is a mature technology, which raises a serious counter-argument for training native language models for low-resource languages. This paper therefore strives to make a provocative but important point. As English language models are improving at an unprecedented pace, which in turn improves machine translation, it is from an empirical and environmental stand-point more effective to translate data from low-resource languages into English, than to build language models for such languages.

[60]  arXiv:2104.10454 (cross-list from cs.CL) [pdf, other]
Title: Text Summarization of Czech News Articles Using Named Entities
Journal-ref: The Prague Bulletin of Mathematical Linguistics 2021 116
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The foundation for the research of summarization in the Czech language was laid by the work of Straka et al. (2018). They published the SumeCzech, a large Czech news-based summarization dataset, and proposed several baseline approaches. However, it is clear from the achieved results that there is a large space for improvement. In our work, we focus on the impact of named entities on the summarization of Czech news articles. First, we annotate SumeCzech with named entities. We propose a new metric ROUGE_NE that measures the overlap of named entities between the true and generated summaries, and we show that it is still challenging for summarization systems to reach a high score in it. We propose an extractive summarization approach Named Entity Density that selects a sentence with the highest ratio between a number of entities and the length of the sentence as the summary of the article. The experiments show that the proposed approach reached results close to the solid baseline in the domain of news articles selecting the first sentence. Moreover, we demonstrate that the selected sentence reflects the style of reports concisely identifying to whom, when, where, and what happened. We propose that such a summary is beneficial in combination with the first sentence of an article in voice applications presenting news articles. We propose two abstractive summarization approaches based on Seq2Seq architecture. The first approach uses the tokens of the article. The second approach has access to the named entity annotations. The experiments show that both approaches exceed state-of-the-art results previously reported by Straka et al. (2018), with the latter achieving slightly better results on SumeCzech's out-of-domain testing set.

[61]  arXiv:2104.10481 (cross-list from cs.CV) [pdf, other]
Title: SSLM: Self-Supervised Learning for Medical Diagnosis from MR Video
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In medical image analysis, the cost of acquiring high-quality data and their annotation by experts is a barrier in many medical applications. Most of the techniques used are based on supervised learning framework and need a large amount of annotated data to achieve satisfactory performance. As an alternative, in this paper, we propose a self-supervised learning approach to learn the spatial anatomical representations from the frames of magnetic resonance (MR) video clips for the diagnosis of knee medical conditions. The pretext model learns meaningful spatial context-invariant representations. The downstream task in our paper is a class imbalanced multi-label classification. Different experiments show that the features learnt by the pretext model provide explainable performance in the downstream task. Moreover, the efficiency and reliability of the proposed pretext model in learning representations of minority classes without applying any strategy towards imbalance in the dataset can be seen from the results. To the best of our knowledge, this work is the first work of its kind in showing the effectiveness and reliability of self-supervised learning algorithms in class imbalanced multi-label classification tasks on MR video.
The code for evaluation of the proposed work is available at https://github.com/anonymous-cvpr/sslm

[62]  arXiv:2104.10488 (cross-list from eess.IV) [pdf, other]
Title: A Two-Stage Attentive Network for Single Image Super-Resolution
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and contribute remarkable progress. However, most of the existing CNNs-based SISR methods do not adequately explore contextual information in the feature extraction stage and pay little attention to the final high-resolution (HR) image reconstruction step, hence hindering the desired SR performance. To address the above two issues, in this paper, we propose a two-stage attentive network (TSAN) for accurate SISR in a coarse-to-fine manner. Specifically, we design a novel multi-context attentive block (MCAB) to make the network focus on more informative contextual features. Moreover, we present an essential refined attention block (RAB) which could explore useful cues in HR space for reconstructing fine-detailed HR image. Extensive evaluations on four benchmark datasets demonstrate the efficacy of our proposed TSAN in terms of quantitative metrics and visual effects. Code is available at https://github.com/Jee-King/TSAN.

[63]  arXiv:2104.10489 (cross-list from cs.HC) [pdf, other]
Title: Eye Know You: Metric Learning for End-to-end Biometric Authentication Using Eye Movements from a Longitudinal Dataset
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

While numerous studies have explored eye movement biometrics since the modality's inception in 2004, the permanence of eye movements remains largely unexplored as most studies utilize datasets collected within a short time frame. This paper presents a convolutional neural network for authenticating users using their eye movements. The network is trained with an established metric learning loss function, multi-similarity loss, which seeks to form a well-clustered embedding space and directly enables the enrollment and authentication of out-of-sample users. Performance measures are computed on GazeBase, a task-diverse and publicly-available dataset collected over a 37-month period. This study includes an exhaustive analysis of the effects of training on various tasks and downsampling from 1000 Hz to several lower sampling rates. Our results reveal that reasonable authentication accuracy may be achieved even during a low-cognitive-load task or at low sampling rates. Moreover, we find that eye movements are quite resilient against template aging after 3 years.

[64]  arXiv:2104.10501 (cross-list from cs.DC) [pdf]
Title: A Survey on Federated Learning and its Applications for Accelerating Industrial Internet of Things
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Federated learning (FL) brings collaborative intelligence into industries without centralized training data to accelerate the process of Industry 4.0 on the edge computing level. FL solves the dilemma in which enterprises wish to make the use of data intelligence with security concerns. To accelerate industrial Internet of things with the further leverage of FL, existing achievements on FL are developed from three aspects: 1) define terminologies and elaborate a general framework of FL for accommodating various scenarios; 2) discuss the state-of-the-art of FL on fundamental researches including data partitioning, privacy preservation, model optimization, local model transportation, personalization, motivation mechanism, platform & tools, and benchmark; 3) discuss the impacts of FL from the economic perspective. To attract more attention from industrial academia and practice, a FL-transformed manufacturing paradigm is presented, and future research directions of FL are given and possible immediate applications in Industry 4.0 domain are also proposed.

[65]  arXiv:2104.10511 (cross-list from cs.CV) [pdf, other]
Title: Hierarchical Convolutional Neural Network with Feature Preservation and Autotuned Thresholding for Crack Detection
Journal-ref: IEEE Access, 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Drone imagery is increasingly used in automated inspection for infrastructure surface defects, especially in hazardous or unreachable environments. In machine vision, the key to crack detection rests with robust and accurate algorithms for image processing. To this end, this paper proposes a deep learning approach using hierarchical convolutional neural networks with feature preservation (HCNNFP) and an intercontrast iterative thresholding algorithm for image binarization. First, a set of branch networks is proposed, wherein the output of previous convolutional blocks is half-sizedly concatenated to the current ones to reduce the obscuration in the down-sampling stage taking into account the overall information loss. Next, to extract the feature map generated from the enhanced HCNN, a binary contrast-based autotuned thresholding (CBAT) approach is developed at the post-processing step, where patterns of interest are clustered within the probability map of the identified features. The proposed technique is then applied to identify surface cracks on the surface of roads, bridges or pavements. An extensive comparison with existing techniques is conducted on various datasets and subject to a number of evaluation criteria including the average F-measure (AF\b{eta}) introduced here for dynamic quantification of the performance. Experiments on crack images, including those captured by unmanned aerial vehicles inspecting a monorail bridge. The proposed technique outperforms the existing methods on various tested datasets especially for GAPs dataset with an increase of about 1.4% in terms of AF\b{eta} while the mean percentage error drops by 2.2%. Such performance demonstrates the merits of the proposed HCNNFP architecture for surface defect inspection.

[66]  arXiv:2104.10513 (cross-list from cs.CL) [pdf, other]
Title: How Will Your Tweet Be Received? Predicting the Sentiment Polarity of Tweet Replies
Comments: Published in 2021 IEEE 15th International Conference on Semantic Computing (ICSC)
Journal-ref: 2021 IEEE 15th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 2021, pp. 356-359
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Twitter sentiment analysis, which often focuses on predicting the polarity of tweets, has attracted increasing attention over the last years, in particular with the rise of deep learning (DL). In this paper, we propose a new task: predicting the predominant sentiment among (first-order) replies to a given tweet. Therefore, we created RETWEET, a large dataset of tweets and replies manually annotated with sentiment labels. As a strong baseline, we propose a two-stage DL-based method: first, we create automatically labeled training data by applying a standard sentiment classifier to tweet replies and aggregating its predictions for each original tweet; our rationale is that individual errors made by the classifier are likely to cancel out in the aggregation step. Second, we use the automatically labeled data for supervised training of a neural network to predict reply sentiment from the original tweets. The resulting classifier is evaluated on the new RETWEET dataset, showing promising results, especially considering that it has been trained without any manually labeled data. Both the dataset and the baseline implementation are publicly available.

[67]  arXiv:2104.10516 (cross-list from cs.CL) [pdf, other]
Title: Improving BERT Pretraining with Syntactic Supervision
Comments: 4 pages, rejected by IWCS due to "not fitting the conference theme"
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models' capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network's training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.

[68]  arXiv:2104.10558 (cross-list from cs.RO) [pdf, other]
Title: Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models
Comments: To be published at ICRA 2021. Project page: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Humans have a remarkable ability to make decisions by accurately reasoning about future events, including the future behaviors and states of mind of other agents. Consider driving a car through a busy intersection: it is necessary to reason about the physics of the vehicle, the intentions of other drivers, and their beliefs about your own intentions. If you signal a turn, another driver might yield to you, or if you enter the passing lane, another driver might decelerate to give you room to merge in front. Competent drivers must plan how they can safely react to a variety of potential future behaviors of other agents before they make their next move. This requires contingency planning: explicitly planning a set of conditional actions that depend on the stochastic outcome of future events. In this work, we develop a general-purpose contingency planner that is learned end-to-end using high-dimensional scene observations and low-dimensional behavioral observations. We use a conditional autoregressive flow model to create a compact contingency planning space, and show how this model can tractably learn contingencies from behavioral observations. We developed a closed-loop control benchmark of realistic multi-agent scenarios in a driving simulator (CARLA), on which we compare our method to various noncontingent methods that reason about multi-agent future behavior, including several state-of-the-art deep learning-based planning approaches. We illustrate that these noncontingent planning methods fundamentally fail on this benchmark, and find that our deep contingency planning method achieves significantly superior performance. Code to run our benchmark and reproduce our results is available at https://sites.google.com/view/contingency-planning

[69]  arXiv:2104.10561 (cross-list from cs.CR) [pdf, other]
Title: Covert Channel Attack to Federated Learning Systems
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Federated learning (FL) goes beyond traditional, centralized machine learning by distributing model training among a large collection of edge clients. These clients cooperatively train a global, e.g., cloud-hosted, model without disclosing their local, private training data. The global model is then shared among all the participants which use it for local predictions. In this paper, we put forward a novel attacker model aiming at turning FL systems into covert channels to implement a stealth communication infrastructure. The main intuition is that, during federated training, a malicious sender can poison the global model by submitting purposely crafted examples. Although the effect of the model poisoning is negligible to other participants, and does not alter the overall model performance, it can be observed by a malicious receiver and used to transmit a single bit.

[70]  arXiv:2104.10584 (cross-list from cs.IR) [pdf, other]
Title: Deep Learning for Click-Through Rate Estimation
Comments: Paper accepted at IJCAI 2021 (Survey Track)
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)

Click-through rate (CTR) estimation plays as a core function module in various personalized online services, including online advertising, recommender systems, and web search etc. From 2015, the success of deep learning started to benefit CTR estimation performance and now deep CTR models have been widely applied in many industrial platforms. In this survey, we provide a comprehensive review of deep learning models for CTR estimation tasks. First, we take a review of the transfer from shallow to deep CTR models and explain why going deep is a necessary trend of development. Second, we concentrate on explicit feature interaction learning modules of deep CTR models. Then, as an important perspective on large platforms with abundant user histories, deep behavior models are discussed. Moreover, the recently emerged automated methods for deep CTR architecture design are presented. Finally, we summarize the survey and discuss the future prospects of this field.

[71]  arXiv:2104.10596 (cross-list from eess.IV) [pdf]
Title: Using CNNs for AD classification based on spatial correlation of BOLD signals during the observation
Comments: 11 pages
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Resting state functional magnetic resonance images (fMRI) are commonly used for classification of patients as having Alzheimer's disease (AD), mild cognitive impairment (MCI), or being cognitive normal (CN). Most methods use time-series correlation of voxels signals during the observation period as a basis for the classification. In this paper we show that Convolutional Neural Network (CNN) classification based on spatial correlation of time-averaged signals yield a classification accuracy of up to 82% (sensitivity 86%, specificity 80%)for a data set with 429 subjects (246 cognitive normal and 183 Alzheimer patients). For the spatial correlation of time-averaged signal values we use voxel subdomains around center points of the 90 regions AAL atlas. We form the subdomains as sets of voxels along a Hilbert curve of a bounding box in which the brain is embedded with the AAL regions center points serving as subdomain seeds. The matrix resulting from the spatial correlation of the 90 arrays formed by the subdomain segments of the Hilbert curve yields a symmetric 90x90 matrix that is used for the classification based on two different CNN networks, a 4-layer CNN network with 3x3 filters and with 4, 8, 16, and 32 output channels respectively, and a 2-layer CNN network with 3x3 filters and with 4 and 8 output channels respectively. The results of the two networks are reported and compared.

[72]  arXiv:2104.10611 (cross-list from eess.IV) [pdf, other]
Title: Programmable 3D snapshot microscopy with Fourier convolutional networks
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

3D snapshot microscopy enables volumetric imaging as fast as a camera allows by capturing a 3D volume in a single 2D camera image, and has found a variety of biological applications such as whole brain imaging of fast neural activity in larval zebrafish. The optimal microscope design for this optical 3D-to-2D encoding to preserve as much 3D information as possible is generally unknown and sample-dependent. Highly-programmable optical elements create new possibilities for sample-specific computational optimization of microscope parameters, e.g. tuning the collection of light for a given sample structure, especially using deep learning. This involves a differentiable simulation of light propagation through the programmable microscope and a neural network to reconstruct volumes from the microscope image. We introduce a class of global kernel Fourier convolutional neural networks which can efficiently integrate the globally mixed information encoded in a 3D snapshot image. We show in silico that our proposed global Fourier convolutional networks succeed in large field-of-view volume reconstruction and microscope parameter optimization where traditional networks fail.

[73]  arXiv:2104.10615 (cross-list from cs.CV) [pdf, ps, other]
Title: Recurrent Feedback Improves Recognition of Partially Occluded Objects
Comments: 6 pages, 2 figures, 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2020). arXiv admin note: substantial text overlap with arXiv:1909.06175
Journal-ref: Proceedings of the 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2020) 327-332
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recurrent connectivity in the visual cortex is believed to aid object recognition for challenging conditions such as occlusion. Here we investigate if and how artificial neural networks also benefit from recurrence. We compare architectures composed of bottom-up, lateral and top-down connections and evaluate their performance using two novel stereoscopic occluded object datasets. We find that classification accuracy is significantly higher for recurrent models when compared to feedforward models of matched parametric complexity. Additionally we show that for challenging stimuli, the recurrent feedback is able to correctly revise the initial feedforward guess.

[74]  arXiv:2104.10638 (cross-list from eess.SP) [pdf, other]
Title: Deep Gaussian Processes for Biogeophysical Parameter Retrieval and Model Inversion
Journal-ref: ISPRS Journal of Photogrammetry and Remote Sensing 166 (2020): 68-81
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Parameter retrieval and model inversion are key problems in remote sensing and Earth observation. Currently, different approximations exist: a direct, yet costly, inversion of radiative transfer models (RTMs); the statistical inversion with in situ data that often results in problems with extrapolation outside the study area; and the most widely adopted hybrid modeling by which statistical models, mostly nonlinear and non-parametric machine learning algorithms, are applied to invert RTM simulations. We will focus on the latter. Among the different existing algorithms, in the last decade kernel based methods, and Gaussian Processes (GPs) in particular, have provided useful and informative solutions to such RTM inversion problems. This is in large part due to the confidence intervals they provide, and their predictive accuracy. However, RTMs are very complex, highly nonlinear, and typically hierarchical models, so that often a shallow GP model cannot capture complex feature relations for inversion. This motivates the use of deeper hierarchical architectures, while still preserving the desirable properties of GPs. This paper introduces the use of deep Gaussian Processes (DGPs) for bio-geo-physical model inversion. Unlike shallow GP models, DGPs account for complicated (modular, hierarchical) processes, provide an efficient solution that scales well to big datasets, and improve prediction accuracy over their single layer counterpart. In the experimental section, we provide empirical evidence of performance for the estimation of surface temperature and dew point temperature from infrared sounding data, as well as for the prediction of chlorophyll content, inorganic suspended matter, and coloured dissolved matter from multispectral data acquired by the Sentinel-3 OLCI sensor. The presented methodology allows for more expressive forms of GPs in remote sensing model inversion problems.

[75]  arXiv:2104.10640 (cross-list from cs.CL) [pdf]
Title: The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Comments: 27 pages and 28 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In recent years, Natural Language Processing (NLP) models have achieved phenomenal success in linguistical and semantical tasks like machine translation, cognitive dialogue systems, information retrieval via Natural Language Understanding (NLU), and Natural Language Generation (NLG). This feat is primarily attributed due to the seminal Transformer architecture, leading to designs such as BERT, GPT (I, II, III), etc. Although these large-size models have achieved unprecedented performances, they come at high computational costs. Consequently, some of the recent NLP architectures have utilized concepts of transfer learning, pruning, quantization, and knowledge distillation to achieve moderate model sizes while keeping nearly similar performances as achieved by their predecessors. Additionally, to mitigate the data size challenge raised by language models from a knowledge extraction perspective, Knowledge Retrievers have been built to extricate explicit data documents from a large corpus of databases with greater efficiency and accuracy. Recent research has also focused on superior inference by providing efficient attention on longer input sequences. In this paper, we summarize and examine the current state-of-the-art (SOTA) NLP models that have been employed for numerous NLP tasks for optimal performance and efficiency. We provide a detailed understanding and functioning of the different architectures, the taxonomy of NLP designs, comparative evaluations, and future directions in NLP.

[76]  arXiv:2104.10652 (cross-list from cs.CL) [pdf, other]
Title: TransICD: Transformer Based Code-wise Attention Model for Explainable ICD Coding
Comments: 10 pages, 4 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

International Classification of Disease (ICD) coding procedure which refers to tagging medical notes with diagnosis codes has been shown to be effective and crucial to the billing system in medical sector. Currently, ICD codes are assigned to a clinical note manually which is likely to cause many errors. Moreover, training skilled coders also requires time and human resources. Therefore, automating the ICD code determination process is an important task. With the advancement of artificial intelligence theory and computational hardware, machine learning approach has emerged as a suitable solution to automate this process. In this project, we apply a transformer-based architecture to capture the interdependence among the tokens of a document and then use a code-wise attention mechanism to learn code-specific representations of the entire document. Finally, they are fed to separate dense layers for corresponding code prediction. Furthermore, to handle the imbalance in the code frequency of clinical datasets, we employ a label distribution aware margin (LDAM) loss function. The experimental results on the MIMIC-III dataset show that our proposed model outperforms other baselines by a significant margin. In particular, our best setting achieves a micro-AUC score of 0.923 compared to 0.868 of bidirectional recurrent neural networks. We also show that by using the code-wise attention mechanism, the model can provide more insights about its prediction, and thus it can support clinicians to make reliable decisions. Our code is available online (https://github.com/biplob1ly/TransICD)

[77]  arXiv:2104.10658 (cross-list from cs.CL) [pdf]
Title: Using GPT-2 to Create Synthetic Data to Improve the Prediction Performance of NLP Machine Learning Classification Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Classification Models use input data to predict the likelihood that the subsequent input data will fall into predetermined categories. To perform effective classifications, these models require large datasets for training. It is becoming common practice to utilize synthetic data to boost the performance of Machine Learning Models. It is reported that Shell is using synthetic data to build models to detect problems that rarely occur; for example Shell created synthetic data to help models to identify deteriorating oil lines. It is common practice for Machine Learning Practitioners to generate synthetic data by rotating, flipping, and cropping images to increase the volume of image data to train Convolutional Neural Networks. The purpose of this paper is to explore creating and utilizing synthetic NLP data to improve the performance of Natural Language Processing Machine Learning Classification Models. In this paper I used a Yelp pizza restaurant reviews dataset and transfer learning to fine-tune a pre-trained GPT-2 Transformer Model to generate synthetic pizza reviews data. I then combined this synthetic data with the original genuine data to create a new joint dataset. The new combined model significantly outperformed the original model in accuracy and precision.

[78]  arXiv:2104.10667 (cross-list from eess.SP) [pdf, other]
Title: Modeling Classroom Occupancy using Data of WiFi Infrastructure in a University Campus
Comments: 23 pages, 20 figures, 8 tables
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Universities worldwide are experiencing a surge in enrollments, therefore campus estate managers are seeking continuous data on attendance patterns to optimize the usage of classroom space. As a result, there is an increasing trend to measure classrooms attendance by employing various sensing technologies, among which pervasive WiFi infrastructure is seen as a low cost method. In a dense campus environment, the number of connected WiFi users does not well estimate room occupancy since connection counts are polluted by adjoining rooms, outdoor walkways, and network load balancing.
In this paper, we develop machine learning based models to infer classroom occupancy from WiFi sensing infrastructure. Our contributions are three-fold: (1) We analyze metadata from a dense and dynamic wireless network comprising of thousands of access points (APs) to draw insights into coverage of APs, behavior of WiFi connected users, and challenges of estimating room occupancy; (2) We propose a method to automatically map APs to classrooms using unsupervised clustering algorithms; and (3) We model classroom occupancy using a combination of classification and regression methods of varying algorithms. We achieve 84.6% accuracy in mapping APs to classrooms while the accuracy of our estimation for room occupancy is comparable to beam counter sensors with a symmetric Mean Absolute Percentage Error (sMAPE) of 13.10%.

[79]  arXiv:2104.10671 (cross-list from cs.IR) [pdf, other]
Title: User-oriented Fairness in Recommendation
Comments: Accepted to the 30th Web Conference (WWW 2021)
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

As a highly data-driven application, recommender systems could be affected by data bias, resulting in unfair results for different data groups, which could be a reason that affects the system performance. Therefore, it is important to identify and solve the unfairness issues in recommendation scenarios. In this paper, we address the unfairness problem in recommender systems from the user perspective. We group users into advantaged and disadvantaged groups according to their level of activity, and conduct experiments to show that current recommender systems will behave unfairly between two groups of users. Specifically, the advantaged users (active) who only account for a small proportion in data enjoy much higher recommendation quality than those disadvantaged users (inactive). Such bias can also affect the overall performance since the disadvantaged users are the majority. To solve this problem, we provide a re-ranking approach to mitigate this unfairness problem by adding constraints over evaluation metrics. The experiments we conducted on several real-world datasets with various recommendation algorithms show that our approach can not only improve group fairness of users in recommender systems, but also achieve better overall recommendation performance.

Replacements for Thu, 22 Apr 21

[80]  arXiv:1904.01831 (replaced) [pdf, other]
Title: Model Slicing for Supporting Complex Analytics with Elastic Inference Cost and Resource Constraints
Comments: 14 pages, 8 figures
Subjects: Machine Learning (cs.LG); Databases (cs.DB); Performance (cs.PF)
[81]  arXiv:1909.03306 (replaced) [pdf, other]
Title: A scalable constructive algorithm for the optimization of neural network architectures
Comments: 12 pages, 15 figures, 3 table
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[82]  arXiv:2005.04986 (replaced) [pdf, ps, other]
Title: Symplectic Neural Networks in Taylor Series Form for Hamiltonian Systems
Journal-ref: Journal of Computational Physics, p.110325 (2021)
Subjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)
[83]  arXiv:2006.16469 (replaced) [pdf, other]
Title: Model-Targeted Poisoning Attacks with Provable Convergence
Comments: 32 pages, code available at: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[84]  arXiv:2007.01179 (replaced) [pdf, other]
Title: Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[85]  arXiv:2010.00636 (replaced) [pdf, ps, other]
Title: Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces
Comments: To appear in JMLR
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[86]  arXiv:2010.03522 (replaced) [pdf, other]
Title: A Survey of Deep Meta-Learning
Comments: Published in the AI Review (AIRE) Journal (2021)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[87]  arXiv:2011.13786 (replaced) [pdf, other]
Title: Navigating the GAN Parameter Space for Semantic Image Editing
Comments: Supplementary code: this https URL
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[88]  arXiv:2012.03488 (replaced) [pdf, other]
Title: Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[89]  arXiv:2012.07386 (replaced) [pdf, other]
Title: Phase Retrieval with Holography and Untrained Priors: Tackling the Challenges of Low-Photon Nanoscale Imaging
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optics (physics.optics); Machine Learning (stat.ML)
[90]  arXiv:2012.07723 (replaced) [pdf, other]
Title: Evolutionary learning of interpretable decision trees
Comments: 69 pages, 31 figures, code available at: this https URL
Subjects: Machine Learning (cs.LG)
[91]  arXiv:2101.05974 (replaced) [pdf, other]
Title: Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks
Comments: Accepted at ICLR 2021
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[92]  arXiv:2101.06255 (replaced) [pdf, ps, other]
Title: Harmonization and the Worst Scanner Syndrome
Comments: Med-NeurIPS 2020 Workshop Paper, updated 4/2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[93]  arXiv:2101.06861 (replaced) [pdf, other]
Title: Discrete Graph Structure Learning for Forecasting Multiple Time Series
Comments: ICLR 2021. Code is available at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[94]  arXiv:2102.01356 (replaced) [pdf, ps, other]
Title: Recent Advances in Adversarial Training for Adversarial Robustness
Comments: accepted by International Joint Conference on Artificial Intelligence (IJCAI-21)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[95]  arXiv:2102.02969 (replaced) [pdf, other]
Title: Implicit Regularization of Sub-Gradient Method in Robust Matrix Recovery: Don't be Afraid of Outliers
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[96]  arXiv:2102.09890 (replaced) [pdf, other]
Title: Condensed Composite Memory Continual Learning
Comments: Paper accepted for publication at IJCNN2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[97]  arXiv:2103.00582 (replaced) [pdf, other]
Title: Symbiotic Hybrid Neural Network Watchdog For Outlier Detection
Comments: 10 Pages, 25 References, 10 Figures, 5 Tables
Subjects: Machine Learning (cs.LG)
[98]  arXiv:2103.04511 (replaced) [pdf, other]
Title: An Energy-Saving Snake Locomotion Gait Policy Obtained Using Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)
[99]  arXiv:2104.02726 (replaced) [pdf, other]
Title: Creativity and Machine Learning: A Survey
Comments: 25 pages, 3 figures, 2 tables; uppercase typos corrected; paragraph about char-RNN and folk-RNN revised
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[100]  arXiv:2104.02865 (replaced) [pdf, other]
Title: Quasi-Newton Quasi-Monte Carlo for variational Bayes
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Machine Learning (stat.ML)
[101]  arXiv:2104.03528 (replaced) [pdf, other]
Title: Neural Temporal Point Processes: A Review
Comments: International Joint Conference on Artificial Intelligence (IJCAI) 2021
Subjects: Machine Learning (cs.LG)
[102]  arXiv:2104.05743 (replaced) [pdf, other]
Title: Practical Defences Against Model Inversion Attacks for Split Neural Networks
Comments: ICLR 2021 Workshop on Distributed and Private Machine Learning (DPML 2021)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[103]  arXiv:2104.06050 (replaced) [pdf, other]
Title: Sequential Ski Rental Problem
Comments: Accepted at AAMAS 2021, Added proof of Theorem 3. Updated argument in Theorem 6
Subjects: Machine Learning (cs.LG)
[104]  arXiv:2104.09325 (replaced) [pdf, other]
Title: Modelling the COVID-19 virus evolution with Incremental Machine Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[105]  arXiv:2104.09582 (replaced) [pdf, other]
Title: Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Optimization Approach
Comments: 19 pages, 5 figures
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
[106]  arXiv:1804.07101 (replaced) [pdf, other]
Title: Dictionary learning -- from local towards global and adaptive
Comments: 11 figures, 5 pages per figure including pseudocode
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[107]  arXiv:1904.01385 (replaced) [src]
Title: UAFS: Uncertainty-Aware Feature Selection for Problems with Missing Data
Comments: Withdrawn due to errors in theoretical derivations
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Methodology (stat.ME)
[108]  arXiv:2003.00381 (replaced) [pdf]
Title: Statistical power for cluster analysis
Comments: 32 pages, 11 figures, 3 tables; for code and data see: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[109]  arXiv:2005.08129 (replaced) [pdf, other]
Title: Neural Collaborative Reasoning
Comments: Accepted to the 30th Web Conference (WWW 2021)
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Symbolic Computation (cs.SC)
[110]  arXiv:2005.09260 (replaced) [pdf, other]
Title: Cross-lingual Approaches for Task-specific Dialogue Act Recognition
Comments: Accepted for 17th International Conference on Artificial Intelligence Applications and Innovations (AIAI 2021), 25-27 June
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[111]  arXiv:2006.07976 (replaced) [pdf, other]
Title: Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization
Comments: Accepted in CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[112]  arXiv:2006.16915 (replaced) [pdf, other]
Title: HGKT: Introducing Hierarchical Exercise Graph for Knowledge Tracing
Comments: 10 pages, 11 figures, submitted to SIGIR 2021
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[113]  arXiv:2008.10526 (replaced) [pdf, ps, other]
Title: Stochastic Multi-level Composition Optimization Algorithms with Level-Independent Convergence Rates
Comments: Refined the convergence analysis in Section 3 under weaker assumptions
Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[114]  arXiv:2010.03408 (replaced) [pdf, other]
Title: Machine learning for recovery factor estimation of an oil reservoir: a tool for de-risking at a hydrocarbon asset evaluation
Subjects: Applications (stat.AP); Machine Learning (cs.LG)
[115]  arXiv:2010.04855 (replaced) [pdf, other]
Title: Reproducing Kernel Methods for Nonparametric and Semiparametric Treatment Effects
Comments: Formerly "Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning" (2020)
Subjects: Econometrics (econ.EM); Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[116]  arXiv:2010.09125 (replaced) [pdf, other]
Title: Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering
Comments: Accepted to ICLR 2021 as an Oral paper
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[117]  arXiv:2010.15658 (replaced) [pdf, other]
Title: Compressive Sensing and Neural Networks from a Statistical Learning Perspective
Comments: 29 pages, 4 figures
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
[118]  arXiv:2011.00871 (replaced) [pdf, other]
Title: Machine Learning Lie Structures & Applications to Physics
Comments: 6 pages, 7 figures
Subjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG); High Energy Physics - Phenomenology (hep-ph); Representation Theory (math.RT); Machine Learning (stat.ML)
[119]  arXiv:2011.01709 (replaced) [pdf, other]
Title: Small footprint Text-Independent Speaker Verification for Embedded Systems
Journal-ref: Acoustics, Speech and Signal Processing (ICASSP), 2021 IEEE International Conference
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[120]  arXiv:2011.11201 (replaced) [pdf, other]
Title: Concept Grounding with Modular Action-Capsules in Semantic Video Prediction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[121]  arXiv:2012.01558 (replaced) [pdf, other]
Title: From a Fourier-Domain Perspective on Adversarial Examples to a Wiener Filter Defense for Semantic Segmentation
Comments: Accepted by The International Joint Conference on Neural Network (IJCNN) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[122]  arXiv:2102.01815 (replaced) [pdf, ps, other]
Title: TAD: Trigger Approximation based Black-box Trojan Detection for AI
Comments: 6 body pages
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[123]  arXiv:2103.00070 (replaced) [pdf, ps, other]
Title: Knowledge-aware Zero-Shot Learning: Survey and Perspective
Comments: Accepted by IJCAI'21 Survey Track
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[124]  arXiv:2103.06695 (replaced) [pdf, other]
Title: BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
Comments: IJCNN 2021, 8 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[125]  arXiv:2103.06720 (replaced) [pdf, other]
Title: Variational inference with a quantum computer
Comments: 17 pages, 9 figures; Minor revisions
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
[126]  arXiv:2103.08280 (replaced) [pdf, ps, other]
Title: Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
[127]  arXiv:2103.10681 (replaced) [pdf, other]
Title: Learning the Superpixel in a Non-iterative and Lifelong Manner
Comments: Accept by CVPR2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[128]  arXiv:2103.15017 (replaced) [pdf, other]
Title: H-GAN: the power of GANs in your Hands
Comments: Paper accepted at The International Joint Conference on Neural Networks (IJCNN) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[129]  arXiv:2103.15561 (replaced) [pdf, other]
Title: Pyfectious: An individual-level simulator to discover optimal containment polices for epidemic diseases
Subjects: Populations and Evolution (q-bio.PE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
[130]  arXiv:2104.01086 (replaced) [pdf, other]
Title: Defending Against Image Corruptions Through Adversarial Augmentations
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[131]  arXiv:2104.02466 (replaced) [pdf, other]
Title: A Review of Formal Methods applied to Machine Learning
Subjects: Programming Languages (cs.PL); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
[132]  arXiv:2104.03002 (replaced) [pdf, other]
Title: CNN Based Segmentation of Infarcted Regions in Acute Cerebral Stroke Patients From Computed Tomography Perfusion Imaging
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[133]  arXiv:2104.03017 (replaced) [pdf, other]
Title: Utilizing Self-supervised Representations for MOS Prediction
Comments: Submitted to Interspeech 2021. We acknowledge the support of AWS Machine Learning Research Awards program
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[134]  arXiv:2104.09420 (replaced) [pdf, other]
Title: Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis
Comments: Accepted by NAACL 2021. Code is available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[135]  arXiv:2104.09648 (replaced) [pdf, other]
Title: Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation
Comments: 11 pages, 5 figures, Published at MICCAI Brainles 2020
Journal-ref: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries (2021) 388-397
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[136]  arXiv:2104.09874 (replaced) [pdf, other]
Title: Boosting Masked Face Recognition with Multi-Task ArcFace
Comments: 6 pages, 4 figures. The paper is under consideration at Pattern Recognition Letters
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[137]  arXiv:2104.09958 (replaced) [pdf, other]
Title: GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[138]  arXiv:2104.10095 (replaced) [pdf, other]
Title: Turning Channel Noise into an Accelerator for Over-the-Air Principal Component Analysis
Comments: 30 pages,9 figures
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
[ total of 138 entries: 1-138 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2104, contact, help  (Access key information)