We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 149 entries: 1-149 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 15 Jan 21

[1]  arXiv:2101.05304 [pdf, other]
Title: Spatial-Temporal Convolutional Network for Spread Prediction of COVID-19
Comments: IEEE BigData 2020
Subjects: Machine Learning (cs.LG)

In this work we present a spatial-temporal convolutional neural network for predicting future COVID-19 related symptoms severity among a population, per region, given its past reported symptoms. This can help approximate the number of future Covid-19 patients in each region, thus enabling a faster response, e.g., preparing the local hospital or declaring a local lockdown where necessary. Our model is based on a national symptom survey distributed in Israel and can predict symptoms severity for different regions daily. The model includes two main parts - (1) learned region-based survey responders profiles used for aggregating questionnaires data into features (2) Spatial-Temporal 3D convolutional neural network which uses the above features to predict symptoms progression.

[2]  arXiv:2101.05317 [pdf, other]
Title: Learning and Fast Adaptation for Grid Emergency Control via Deep Meta Reinforcement Learning
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY)

As power systems are undergoing a significant transformation with more uncertainties, less inertia and closer to operation limits, there is increasing risk of large outages. Thus, there is an imperative need to enhance grid emergency control to maintain system reliability and security. Towards this end, great progress has been made in developing deep reinforcement learning (DRL) based grid control solutions in recent years. However, existing DRL-based solutions have two main limitations: 1) they cannot handle well with a wide range of grid operation conditions, system parameters, and contingencies; 2) they generally lack the ability to fast adapt to new grid operation conditions, system parameters, and contingencies, limiting their applicability for real-world applications. In this paper, we mitigate these limitations by developing a novel deep meta reinforcement learning (DMRL) algorithm. The DMRL combines the meta strategy optimization together with DRL, and trains policies modulated by a latent space that can quickly adapt to new scenarios. We test the developed DMRL algorithm on the IEEE 300-bus system. We demonstrate fast adaptation of the meta-trained DRL polices with latent variables to new operating conditions and scenarios using the proposed method and achieve superior performance compared to the state-of-the-art DRL and model predictive control (MPC) methods.

[3]  arXiv:2101.05328 [pdf, other]
Title: Uniform Error and Posterior Variance Bounds for Gaussian Process Regression with Application to Safe Control
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)

In application areas where data generation is expensive, Gaussian processes are a preferred supervised learning model due to their high data-efficiency. Particularly in model-based control, Gaussian processes allow the derivation of performance guarantees using probabilistic model error bounds. To make these approaches applicable in practice, two open challenges must be solved i) Existing error bounds rely on prior knowledge, which might not be available for many real-world tasks. (ii) The relationship between training data and the posterior variance, which mainly drives the error bound, is not well understood and prevents the asymptotic analysis. This article addresses these issues by presenting a novel uniform error bound using Lipschitz continuity and an analysis of the posterior variance function for a large class of kernels. Additionally, we show how these results can be used to guarantee safe control of an unknown dynamical system and provide numerical illustration examples.

[4]  arXiv:2101.05346 [pdf, other]
Title: X-CAL: Explicit Calibration for Survival Analysis
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Survival analysis models the distribution of time until an event of interest, such as discharge from the hospital or admission to the ICU. When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 2020] which computes the squared difference between the observed and predicted number of events within different time intervals. Classically, calibration is addressed in post-training analysis. We develop explicit calibration (X-CAL), which turns D-CALIBRATION into a differentiable objective that can be used in survival modeling alongside maximum likelihood estimation and other objectives. X-CAL allows practitioners to directly optimize calibration and strike a desired balance between predictive power and calibration. In our experiments, we fit a variety of shallow and deep models on simulated data, a survival dataset based on MNIST, on length-of-stay prediction using MIMIC-III data, and on brain cancer data from The Cancer Genome Atlas. We show that the models we study can be miscalibrated. We give experimental evidence on these datasets that X-CAL improves D-CALIBRATION without a large decrease in concordance or likelihood.

[5]  arXiv:2101.05348 [pdf, other]
Title: Gaussian Mixture Graphical Lasso with Application to Edge Detection in Brain Networks
Subjects: Machine Learning (cs.LG)

Sparse inverse covariance estimation (i.e., edge de-tection) is an important research problem in recent years, wherethe goal is to discover the direct connections between a set ofnodes in a networked system based upon the observed nodeactivities. Existing works mainly focus on unimodal distributions,where it is usually assumed that the observed activities aregenerated from asingleGaussian distribution (i.e., one graph).However, this assumption is too strong for many real-worldapplications. In many real-world applications (e.g., brain net-works), the node activities usually exhibit much more complexpatterns that are difficult to be captured by one single Gaussiandistribution. In this work, we are inspired by Latent DirichletAllocation (LDA) [4] and consider modeling the edge detectionproblem as estimating a mixture ofmultipleGaussian distribu-tions, where each corresponds to a separate sub-network. Toaddress this problem, we propose a novel model called GaussianMixture Graphical Lasso (MGL). It learns the proportionsof signals generated by each mixture component and theirparameters iteratively via an EM framework. To obtain moreinterpretable networks, MGL imposes a special regularization,called Mutual Exclusivity Regularization (MER), to minimize theoverlap between different sub-networks. MER also addresses thecommon issues in read-world data sets,i.e., noisy observationsand small sample size. Through the extensive experiments onsynthetic and real brain data sets, the results demonstrate thatMGL can effectively discover multiple connectivity structuresfrom the observed node activities

[6]  arXiv:2101.05357 [pdf, other]
Title: Towards Creating a Deployable Grasp Type Probability Estimator for a Prosthetic Hand
Journal-ref: CyPhy 2019, WESE 2019. Lecture Notes in Computer Science, vol 11971. Springer, Cham
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)

For lower arm amputees, prosthetic hands promise to restore most of physical interaction capabilities. This requires to accurately predict hand gestures capable of grabbing varying objects and execute them timely as intended by the user. Current approaches often rely on physiological signal inputs such as Electromyography (EMG) signal from residual limb muscles to infer the intended motion. However, limited signal quality, user diversity and high variability adversely affect the system robustness. Instead of solely relying on EMG signals, our work enables augmenting EMG intent inference with physical state probability through machine learning and computer vision method. To this end, we: (1) study state-of-the-art deep neural network architectures to select a performant source of knowledge transfer for the prosthetic hand, (2) use a dataset containing object images and probability distribution of grasp types as a new form of labeling where instead of using absolute values of zero and one as the conventional classification labels, our labels are a set of probabilities whose sum is 1. The proposed method generates probabilistic predictions which could be fused with EMG prediction of probabilities over grasps by using the visual information from the palm camera of a prosthetic hand. Our results demonstrate that InceptionV3 achieves highest accuracy with 0.95 angular similarity followed by 1.4 MobileNetV2 with 0.93 at ~20% the amount of operations.

[7]  arXiv:2101.05360 [pdf, other]
Title: Preferential Mixture-of-Experts: Interpretable Models that Rely on Human Expertise as much as Possible
Comments: 10 pages, 5 figures, 4 tables, AMIA 2021 Virtual Informatics Summit
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We propose Preferential MoE, a novel human-ML mixture-of-experts model that augments human expertise in decision making with a data-based classifier only when necessary for predictive performance. Our model exhibits an interpretable gating function that provides information on when human rules should be followed or avoided. The gating function is maximized for using human-based rules, and classification errors are minimized. We propose solving a coupled multi-objective problem with convex subproblems. We develop approximate algorithms and study their performance and convergence. Finally, we demonstrate the utility of Preferential MoE on two clinical applications for the treatment of Human Immunodeficiency Virus (HIV) and management of Major Depressive Disorder (MDD).

[8]  arXiv:2101.05363 [pdf, other]
Title: NetCut: Real-Time DNN Inference Using Layer Removal
Subjects: Machine Learning (cs.LG); Performance (cs.PF)

Deep Learning plays a significant role in assisting humans in many aspects of their lives. As these networks tend to get deeper over time, they extract more features to increase accuracy at the cost of additional inference latency. This accuracy-performance trade-off makes it more challenging for Embedded Systems, as resource-constrained processors with strict deadlines, to deploy them efficiently. This can lead to selection of networks that can prematurely meet a specified deadline with excess slack time that could have potentially contributed to increased accuracy.
In this work, we propose: (i) the concept of layer removal as a means of constructing TRimmed Networks (TRNs) that are based on removing problem-specific features of a pretrained network used in transfer learning, and (ii) NetCut, a methodology based on an empirical or an analytical latency estimator, which only proposes and retrains TRNs that can meet the application's deadline, hence reducing the exploration time significantly.
We demonstrate that TRNs can expand the Pareto frontier that trades off latency and accuracy to provide networks that can meet arbitrary deadlines with potential accuracy improvement over off-the-shelf networks. Our experimental results show that such utilization of TRNs, while transferring to a simpler dataset, in combination with NetCut, can lead to the proposal of networks that can achieve relative accuracy improvement of up to 10.43% among existing off-the-shelf neural architectures while meeting a specific deadline, and 27x speedup in exploration time.

[9]  arXiv:2101.05371 [pdf]
Title: Anomaly Detection Support Using Process Classification
Comments: 14 pages, 6 figures
Journal-ref: Proceedings of the 5th International Conference on Software Security and Assurance (ICSSA 2019), 2019, 27-40
Subjects: Machine Learning (cs.LG)

Anomaly detection systems need to consider a lot of information when scanning for anomalies. One example is the context of the process in which an anomaly might occur, because anomalies for one process might not be anomalies for a different one. Therefore data -- such as system events -- need to be assigned to the program they originate from. This paper investigates whether it is possible to infer from a list of system events the program whose behavior caused the occurrence of these system events. To that end, we model transition probabilities between non-equivalent events and apply the $k$-nearest neighbors algorithm. This system is evaluated on non-malicious, real-world data using four different evaluation scores. Our results suggest that the approach proposed in this paper is capable of correctly inferring program names from system events.

[10]  arXiv:2101.05388 [pdf, other]
Title: Evaluating Soccer Player: from Live Camera to Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)

Scientifically evaluating soccer players represents a challenging Machine Learning problem. Unfortunately, most existing answers have very opaque algorithm training procedures; relevant data are scarcely accessible and almost impossible to generate. In this paper, we will introduce a two-part solution: an open-source Player Tracking model and a new approach to evaluate these players based solely on Deep Reinforcement Learning, without human data training nor guidance. Our tracking model was trained in a supervised fashion on datasets we will also release, and our Evaluation Model relies only on simulations of virtual soccer games. Combining those two architectures allows one to evaluate Soccer Players directly from a live camera without large datasets constraints. We term our new approach Expected Discounted Goal (EDG), as it represents the number of goals a team can score or concede from a particular state. This approach leads to more meaningful results than the existing ones that are based on real-world data, and could easily be extended to other sports.

[11]  arXiv:2101.05390 [pdf, ps, other]
Title: Quantitative Rates and Fundamental Obstructions to Non-Euclidean Universal Approximation with Deep Narrow Feed-Forward Networks
Subjects: Machine Learning (cs.LG); Functional Analysis (math.FA); General Topology (math.GN); Geometric Topology (math.GT)

By incorporating structured pairs of non-trainable input and output layers, the universal approximation property of feed-forward have recently been extended across a broad range of non-Euclidean input spaces X and output spaces Y. We quantify the number of narrow layers required for these "deep geometric feed-forward neural networks" (DGNs) to approximate any continuous function in $C(X,Y)$, uniformly on compacts. The DGN architecture is then extended to accommodate complete Riemannian manifolds, where the input and output layers are only defined locally, and we obtain local analogs of our results. In this case, we find that both the global and local universal approximation guarantees can only coincide when approximating null-homotopic functions. Consequently, we show that if Y is a compact Riemannian manifold, then there exists a function that cannot be uniformly approximated on large compact subsets of X. Nevertheless, we obtain lower-bounds of the maximum diameter of any geodesic ball in X wherein our local universal approximation results hold. Applying our results, we build universal approximators between spaces of non-degenerate Gaussian measures. We also obtain a quantitative version of the universal approximation theorem for classical deep narrow feed-forward networks with general activation functions.

[12]  arXiv:2101.05397 [pdf, other]
Title: Should Ensemble Members Be Calibrated?
Authors: Xixin Wu, Mark Gales
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Underlying the use of statistical approaches for a wide range of applications is the assumption that the probabilities obtained from a statistical model are representative of the "true" probability that event, or outcome, will occur. Unfortunately, for modern deep neural networks this is not the case, they are often observed to be poorly calibrated. Additionally, these deep learning approaches make use of large numbers of model parameters, motivating the use of Bayesian, or ensemble approximation, approaches to handle issues with parameter estimation. This paper explores the application of calibration schemes to deep ensembles from both a theoretical perspective and empirically on a standard image classification task, CIFAR-100. The underlying theoretical requirements for calibration, and associated calibration criteria, are first described. It is shown that well calibrated ensemble members will not necessarily yield a well calibrated ensemble prediction, and if the ensemble prediction is well calibrated its performance cannot exceed that of the average performance of the calibrated ensemble members. On CIFAR-100 the impact of calibration for ensemble prediction, and associated calibration is evaluated. Additionally the situation where multiple different topologies are combined together is discussed.

[13]  arXiv:2101.05428 [pdf, other]
Title: Federated Learning: Opportunities and Challenges
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated Learning (FL) is a concept first introduced by Google in 2016, in which multiple devices collaboratively learn a machine learning model without sharing their private data under the supervision of a central server. This offers ample opportunities in critical domains such as healthcare, finance etc, where it is risky to share private user information to other organisations or devices. While FL appears to be a promising Machine Learning (ML) technique to keep the local data private, it is also vulnerable to attacks like other ML models. Given the growing interest in the FL domain, this report discusses the opportunities and challenges in federated learning.

[14]  arXiv:2101.05453 [pdf, ps, other]
Title: On the quantization of recurrent neural networks
Subjects: Machine Learning (cs.LG)

Integer quantization of neural networks can be defined as the approximation of the high precision computation of the canonical neural network formulation, using reduced integer precision. It plays a significant role in the efficient deployment and execution of machine learning (ML) systems, reducing memory consumption and leveraging typically faster computations. In this work, we present an integer-only quantization strategy for Long Short-Term Memory (LSTM) neural network topologies, which themselves are the foundation of many production ML systems. Our quantization strategy is accurate (e.g. works well with quantization post-training), efficient and fast to execute (utilizing 8 bit integer weights and mostly 8 bit activations), and is able to target a variety of hardware (by leveraging instructions sets available in common CPU architectures, as well as available neural accelerators).

[15]  arXiv:2101.05467 [pdf, other]
Title: Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The drastic increase of data quantity often brings the severe decrease of data quality, such as incorrect label annotations, which poses a great challenge for robustly training Deep Neural Networks (DNNs). Existing learning \mbox{methods} with label noise either employ ad-hoc heuristics or restrict to specific noise assumptions. However, more general situations, such as instance-dependent label noise, have not been fully explored, as scarce studies focus on their label corruption process. By categorizing instances into confusing and unconfusing instances, this paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. The resultant model can be realized by DNNs, where the training procedure is accomplished by employing an alternating optimization algorithm. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness over state-of-the-art counterparts.

[16]  arXiv:2101.05471 [pdf, other]
Title: Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration
Comments: 44 Pages. arXiv admin note: substantial text overlap with arXiv:1811.09358
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)

Adam is one of the most influential adaptive stochastic algorithms for training deep neural networks, which has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, \textit{etc.}, have been tried to promote Adam-type algorithms to converge. In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam for solving large-scale non-convex stochastic optimization. This observation coupled with this sufficient condition gives much deeper interpretations on the divergence of Adam. On the other hand, in practice, mini-Adam and distributed-Adam are widely used without theoretical guarantee, we further give an analysis on how will the batch size or the number of nodes in the distributed system will affect the convergence of Adam, which theoretically shows that mini-batch and distributed Adam can be linearly accelerated by using a larger mini-batch size or more number of nodes. At last, we apply the generic Adam and mini-batch Adam with a sufficient condition for solving the counterexample and training several different neural networks on various real-world datasets. Experimental results are exactly in accord with our theoretical analysis.

[17]  arXiv:2101.05484 [pdf]
Title: 4D Attention-based Neural Network for EEG Emotion Recognition
Subjects: Machine Learning (cs.LG)

Electroencephalograph (EEG) emotion recognition is a significant task in the brain-computer interface field. Although many deep learning methods are proposed recently, it is still challenging to make full use of the information contained in different domains of EEG signals. In this paper, we present a novel method, called four-dimensional attention-based neural network (4D-aNN) for EEG emotion recognition. First, raw EEG signals are transformed into 4D spatial-spectral-temporal representations. Then, the proposed 4D-aNN adopts spectral and spatial attention mechanisms to adaptively assign the weights of different brain regions and frequency bands, and a convolutional neural network (CNN) is utilized to deal with the spectral and spatial information of the 4D representations. Moreover, a temporal attention mechanism is integrated into a bidirectional Long Short-Term Memory (LSTM) to explore temporal dependencies of the 4D representations. Our model achieves state-of-the-art performance on the SEED dataset under intra-subject splitting. The experimental results have shown the effectiveness of the attention mechanisms in different domains for EEG emotion recognition.

[18]  arXiv:2101.05486 [pdf, other]
Title: Label Contrastive Coding based Graph Neural Network for Graph Classification
Comments: Accept by DASFAA'21
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Graph classification is a critical research problem in many applications from different domains. In order to learn a graph classification model, the most widely used supervision component is an output layer together with classification loss (e.g.,cross-entropy loss together with softmax or margin loss). In fact, the discriminative information among instances are more fine-grained, which can benefit graph classification tasks. In this paper, we propose the novel Label Contrastive Coding based Graph Neural Network (LCGNN) to utilize label information more effectively and comprehensively. LCGNN still uses the classification loss to ensure the discriminability of classes. Meanwhile, LCGNN leverages the proposed Label Contrastive Loss derived from self-supervised learning to encourage instance-level intra-class compactness and inter-class separability. To power the contrastive learning, LCGNN introduces a dynamic label memory bank and a momentum updated encoder. Our extensive evaluations with eight benchmark graph datasets demonstrate that LCGNN can outperform state-of-the-art graph classification models. Experimental results also verify that LCGNN can achieve competitive performance with less training data because LCGNN exploits label information comprehensively.

[19]  arXiv:2101.05490 [pdf, other]
Title: Neural networks behave as hash encoders: An empirical study
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models: (1) {\it determinism}: almost every linear region contains at most one training example. We can therefore represent almost every training example by a unique activation pattern, which is parameterized by a {\it neural code}; and (2) {\it categorization}: according to the neural code, simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data. These encoding properties surprisingly suggest that {\it normal neural networks well-trained for classification behave as hash encoders without any extra efforts.} In addition, the encoding properties exhibit variability in different scenarios. {Further experiments demonstrate that {\it model size}, {\it training time}, {\it training sample size}, {\it regularization}, and {\it label noise} contribute in shaping the encoding properties, while the impacts of the first three are dominant.} We then define an {\it activation hash phase chart} to represent the space expanded by {model size}, training time, training sample size, and the encoding properties, which is divided into three canonical regions: {\it under-expressive regime}, {\it critically-expressive regime}, and {\it sufficiently-expressive regime}. The source code package is available at \url{https://github.com/LeavesLei/activation-code}.

[20]  arXiv:2101.05500 [pdf, other]
Title: Joint Dimensionality Reduction for Separable Embedding Estimation
Subjects: Machine Learning (cs.LG)

Low-dimensional embeddings for data from disparate sources play critical roles in multi-modal machine learning, multimedia information retrieval, and bioinformatics. In this paper, we propose a supervised dimensionality reduction method that learns linear embeddings jointly for two feature vectors representing data of different modalities or data from distinct types of entities. We also propose an efficient feature selection method that complements, and can be applied prior to, our joint dimensionality reduction method. Assuming that there exist true linear embeddings for these features, our analysis of the error in the learned linear embeddings provides theoretical guarantees that the dimensionality reduction method accurately estimates the true embeddings when certain technical conditions are satisfied and the number of samples is sufficiently large. The derived sample complexity results are echoed by numerical experiments. We apply the proposed dimensionality reduction method to gene-disease association, and predict unknown associations using kernel regression on the dimension-reduced feature vectors. Our approach compares favorably against other dimensionality reduction methods, and against a state-of-the-art method of bilinear regression for predicting gene-disease associations.

[21]  arXiv:2101.05504 [pdf, other]
Title: Reliability Check via Weight Similarity in Privacy-Preserving Multi-Party Machine Learning
Subjects: Machine Learning (cs.LG)

Multi-party machine learning is a paradigm in which multiple participants collaboratively train a machine learning model to achieve a common learning objective without sharing their privately owned data. The paradigm has recently received a lot of attention from the research community aimed at addressing its associated privacy concerns. In this work, we focus on addressing the concerns of data privacy, model privacy, and data quality associated with privacy-preserving multi-party machine learning, i.e., we present a scheme for privacy-preserving collaborative learning that checks the participants' data quality while guaranteeing data and model privacy. In particular, we propose a novel metric called weight similarity that is securely computed and used to check whether a participant can be categorized as a reliable participant (holds good quality data) or not. The problems of model and data privacy are tackled by integrating homomorphic encryption in our scheme and uploading encrypted weights, which prevent leakages to the server and malicious participants, respectively. The analytical and experimental evaluations of our scheme demonstrate that it is accurate and ensures data and model privacy.

[22]  arXiv:2101.05507 [pdf, other]
Title: Evaluating the Robustness of Collaborative Agents
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA)

In order for agents trained by deep reinforcement learning to work alongside humans in realistic settings, we will need to ensure that the agents are \emph{robust}. Since the real world is very diverse, and human behavior often changes in response to agent deployment, the agent will likely encounter novel situations that have never been seen during training. This results in an evaluation challenge: if we cannot rely on the average training or validation reward as a metric, then how can we effectively evaluate robustness? We take inspiration from the practice of \emph{unit testing} in software engineering. Specifically, we suggest that when designing AI agents that collaborate with humans, designers should search for potential edge cases in \emph{possible partner behavior} and \emph{possible states encountered}, and write tests which check that the behavior of the agent in these edge cases is reasonable. We apply this methodology to build a suite of unit tests for the Overcooked-AI environment, and use this test suite to evaluate three proposals for improving robustness. We find that the test suite provides significant insight into the effects of these proposals that were generally not revealed by looking solely at the average validation reward.

[23]  arXiv:2101.05514 [pdf, other]
Title: Entangled Kernels -- Beyond Separability
Subjects: Machine Learning (cs.LG); Quantum Physics (quant-ph); Machine Learning (stat.ML)

We consider the problem of operator-valued kernel learning and investigate the possibility of going beyond the well-known separable kernels. Borrowing tools and concepts from the field of quantum computing, such as partial trace and entanglement, we propose a new view on operator-valued kernels and define a general family of kernels that encompasses previously known operator-valued kernels, including separable and transformable kernels. Within this framework, we introduce another novel class of operator-valued kernels called entangled kernels that are not separable. We propose an efficient two-step algorithm for this framework, where the entangled kernel is learned based on a novel extension of kernel alignment to operator-valued kernels. We illustrate our algorithm with an application to supervised dimensionality reduction, and demonstrate its effectiveness with both artificial and real data for multi-output regression.

[24]  arXiv:2101.05519 [pdf, other]
Title: BiGCN: A Bi-directional Low-Pass Filtering Graph Neural Network
Subjects: Machine Learning (cs.LG)

Graph convolutional networks have achieved great success on graph-structured data. Many graph convolutional networks can be regarded as low-pass filters for graph signals. In this paper, we propose a new model, BiGCN, which represents a graph neural network as a bi-directional low-pass filter. Specifically, we not only consider the original graph structure information but also the latent correlation between features, thus BiGCN can filter the signals along with both the original graph and a latent feature-connection graph. Our model outperforms previous graph neural networks in the tasks of node classification and link prediction on most of the benchmark datasets, especially when we add noise to the node features.

[25]  arXiv:2101.05536 [pdf, other]
Title: Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias
Comments: NeurIPS 2020 Workshop : "Beyond Backpropagation Novel Ideas for Training Neural Architectures". arXiv admin note: substantial text overlap with arXiv:2006.03824
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Equilibrium Propagation (EP) is a biologically-inspired counterpart of Backpropagation Through Time (BPTT) which, owing to its strong theoretical guarantees and the locality in space of its learning rule, fosters the design of energy-efficient hardware dedicated to learning. In practice, however, EP does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP, including architectures with distinct forward and backward connections. These results highlight EP as a scalable approach to compute error gradients in deep neural networks, thereby motivating its hardware implementation.

[26]  arXiv:2101.05544 [pdf, other]
Title: DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation
Comments: Published as a conference paper at ICLR 2021. 9 main pages, 13 figures, 12 tables
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances. In this paper, we argue that learning strategies for deep ensembles need to tackle the trade-off between ensemble diversity and individual accuracies. Motivated by arguments from information theory and leveraging recent advances in neural estimation of conditional mutual information, we introduce a novel training criterion called DICE: it increases diversity by reducing spurious correlations among features. The main idea is that features extracted from pairs of members should only share information useful for target class prediction without being conditionally redundant. Therefore, besides the classification loss with information bottleneck, we adversarially prevent features from being conditionally predictable from each other. We manage to reduce simultaneous errors while protecting class information. We obtain state-of-the-art accuracy results on CIFAR-10/100: for example, an ensemble of 5 networks trained with DICE matches an ensemble of 7 networks trained independently. We further analyze the consequences on calibration, uncertainty estimation, out-of-distribution detection and online co-distillation.

[27]  arXiv:2101.05605 [pdf, other]
Title: A Physics-Informed Machine Learning Model for Porosity Analysis in Laser Powder Bed Fusion Additive Manufacturing
Comments: 14 pages
Subjects: Machine Learning (cs.LG)

To control part quality, it is critical to analyze pore generation mechanisms, laying theoretical foundation for future porosity control. Current porosity analysis models use machine setting parameters, such as laser angle and part pose. However, these setting-based models are machine dependent, hence they often do not transfer to analysis of porosity for a different machine. To address the first problem, a physics-informed, data-driven model (PIM), which instead of directly using machine setting parameters to predict porosity levels of printed parts, it first interprets machine settings into physical effects, such as laser energy density and laser radiation pressure. Then, these physical, machine independent effects are used to predict porosity levels according to pass, flag, fail categories instead of focusing on quantitative pore size prediction. With six learning methods evaluation, PIM proved to achieve good performances with prediction error of 10$\sim$26%. Finally, pore-encouraging influence and pore-suppressing influence were analyzed for quality analysis.

[28]  arXiv:2101.05608 [pdf]
Title: Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Efficient processing of large-scale time series data is an intricate problem in machine learning. Conventional sensor signal processing pipelines with hand engineered feature extraction often involve huge computational cost with high dimensional data. Deep recurrent neural networks have shown promise in automated feature learning for improved time-series processing. However, generic deep recurrent models grow in scale and depth with increased complexity of the data. This is particularly challenging in presence of high dimensional data with temporal and spatial characteristics. Consequently, this work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to efficiently process complex multi-dimensional time series data with spatial information. The cellular recurrent architecture in the proposed model allows for location-aware synchronous processing of time series data from spatially distributed sensor signal sources. Extensive trainable parameter sharing due to cellularity in the proposed architecture ensures efficiency in the use of recurrent processing units with high-dimensional inputs. This study also investigates the versatility of the proposed DCRNN model for classification of multi-class time series data from different application domains. Consequently, the proposed DCRNN architecture is evaluated using two time-series datasets: a multichannel scalp EEG dataset for seizure detection, and a machine fault detection dataset obtained in-house. The results suggest that the proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.

[29]  arXiv:2101.05612 [pdf, other]
Title: A SOM-based Gradient-Free Deep Learning Method with Convergence Analysis
Subjects: Machine Learning (cs.LG)

As gradient descent method in deep learning causes a series of questions, this paper proposes a novel gradient-free deep learning structure. By adding a new module into traditional Self-Organizing Map and introducing residual into the map, a Deep Valued Self-Organizing Map network is constructed. And analysis about the convergence performance of such a deep Valued Self-Organizing Map network is proved in this paper, which gives an inequality about the designed parameters with the dimension of inputs and the loss of prediction.

[30]  arXiv:2101.05615 [pdf, other]
Title: FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
Subjects: Machine Learning (cs.LG); Performance (cs.PF)

Deep learning models typically use single-precision (FP32) floating point data types for representing activations and weights, but a slew of recent research work has shown that computations with reduced-precision data types (FP16, 16-bit integers, 8-bit integers or even 4- or 2-bit integers) are enough to achieve same accuracy as FP32 and are much more efficient. Therefore, we designed fbgemm, a high-performance kernel library, from ground up to perform high-performance quantized inference on current generation CPUs. fbgemm achieves efficiency by fusing common quantization operations with a high-performance gemm implementation and by shape- and size-specific kernel code generation at runtime. The library has been deployed at Facebook, where it delivers greater than 2x performance gains with respect to our current production baseline.

[31]  arXiv:2101.05620 [pdf]
Title: A Framework for Assurance of Medication Safety using Machine Learning
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

Medication errors continue to be the leading cause of avoidable patient harm in hospitals. This paper sets out a framework to assure medication safety that combines machine learning and safety engineering methods. It uses safety analysis to proactively identify potential causes of medication error, based on expert opinion. As healthcare is now data rich, it is possible to augment safety analysis with machine learning to discover actual causes of medication error from the data, and to identify where they deviate from what was predicted in the safety analysis. Combining these two views has the potential to enable the risk of medication errors to be managed proactively and dynamically. We apply the framework to a case study involving thoracic surgery, e.g. oesophagectomy, where errors in giving beta-blockers can be critical to control atrial fibrillation. This case study combines a HAZOP-based safety analysis method known as SHARD with Bayesian network structure learning and process mining to produce the analysis results, showing the potential of the framework for ensuring patient safety, and for transforming the way that safety is managed in complex healthcare environments.

[32]  arXiv:2101.05623 [pdf, other]
Title: Design of borehole resistivity measurement acquisition systems using deep learning
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Numerical Analysis (math.NA)

Borehole resistivity measurements recorded with logging-while-drilling (LWD) instruments are widely used for characterizing the earth's subsurface properties. They facilitate the extraction of natural resources such as oil and gas. LWD instruments require real-time inversions of electromagnetic measurements to estimate the electrical properties of the earth's subsurface near the well and possibly correct the well trajectory. Deep Neural Network (DNN)-based methods are suitable for the rapid inversion of borehole resistivity measurements as they approximate the forward and inverse problem offline during the training phase and they only require a fraction of a second for the evaluation (aka prediction). However, the inverse problem generally admits multiple solutions. DNNs with traditional loss functions based on data misfit are ill-equipped for solving an inverse problem. This can be partially overcome by adding regularization terms to a loss function specifically designed for encoder-decoder architectures. But adding regularization seriously limits the number of possible solutions to a set of a priori desirable physical solutions. To avoid this, we use a two-step loss function without any regularization. In addition, to guarantee an inverse solution, we need a carefully selected measurement acquisition system with a sufficient number of measurements. In this work, we propose a DNN-based iterative algorithm for designing such a measurement acquisition system. We illustrate our DNN-based iterative algorithm via several synthetic examples. Numerical results show that the obtained measurement acquisition system is sufficient to identify and characterize both resistive and conductive layers above and below the logging instrument. Numerical results are promising, although further improvements are required to make our method amenable for industrial purposes.

[33]  arXiv:2101.05624 [pdf, other]
Title: Adversarially robust and explainable model compression with on-device personalization for NLP applications
Subjects: Machine Learning (cs.LG)

On-device Deep Neural Networks (DNNs) have recently gained more attention due to the increasing computing power of the mobile devices and the number of applications in Computer Vision (CV), Natural Language Processing (NLP), and Internet of Things (IoTs). Unfortunately, the existing efficient convolutional neural network (CNN) architectures designed for CV tasks are not directly applicable to NLP tasks and the tiny Recurrent Neural Network (RNN) architectures have been designed primarily for IoT applications. In NLP applications, although model compression has seen initial success in on-device text classification, there are at least three major challenges yet to be addressed: adversarial robustness, explainability, and personalization. Here we attempt to tackle these challenges by designing a new training scheme for model compression and adversarial robustness, including the optimization of an explainable feature mapping objective, a knowledge distillation objective, and an adversarially robustness objective. The resulting compressed model is personalized using on-device private training data via fine-tuning. We perform extensive experiments to compare our approach with both compact RNN (e.g., FastGRNN) and compressed RNN (e.g., PRADO) architectures in both natural and adversarial NLP test settings.

[34]  arXiv:2101.05631 [pdf]
Title: Parkinson's Disease Diagnosis Using Deep Learning
Authors: Mohamad Alissa
Comments: Master Research Project
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Parkinson's Disease (PD) is a chronic, degenerative disorder which leads to a range of motor and cognitive symptoms. PD diagnosis is a challenging task since its symptoms are very similar to other diseases such as normal ageing and essential tremor. Much research has been applied to diagnosing this disease. This project aims to automate the PD diagnosis process using deep learning, Recursive Neural Networks (RNN) and Convolutional Neural Networks (CNN), to differentiate between healthy and PD patients. Besides that, since different datasets may capture different aspects of this disease, this project aims to explore which PD test is more effective in the discrimination process by analysing different imaging and movement datasets (notably cube and spiral pentagon datasets). In addition, this project evaluates which dataset type, imaging or time series, is more effective in diagnosing PD.

[35]  arXiv:2101.05633 [pdf, other]
Title: Enhanced Audit Techniques Empowered by the Reinforcement Learning Pertaining to IFRS 16 Lease
Authors: Byungryul Choi
Comments: for codes, please refer to this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The purpose of accounting audit is to have clear understanding on the financial activities of a company, which can be enhanced by machine learning or reinforcement learning as numeric analysis better than manual analysis can be made. For the purpose of assessment on the relevance, completeness and accuracy of the information produced by entity pertaining to the newly implemented International Financial Reporting Standard 16 Lease (IFRS 16) is one of such candidates as its characteristic of requiring the understanding on the nature of contracts and its complete analysis from listing up without omission, which can be enhanced by the digitalization of contracts for the purpose of creating the lists, still leaving the need of auditing cash flows of companies for the possible omission due to the potential error at the stage of data collection, especially for entities with various short or middle term business sites and related leases, such as construction entities.
The implementation of the reinforcement learning and its well-known code is to be made for the purpose of drawing the possibility and utilizability of interpreters from domain knowledge to numerical system, also can be called 'gamification interpreter' or 'numericalization interpreter' which can be referred or compared to the extrapolation with nondimensional numbers, such as Froude Number, in physics, which was a source of inspiration at this study. Studies on the interpreters can be able to empower the utilizability of artificial general intelligence in domain and commercial area.

[36]  arXiv:2101.05639 [pdf]
Title: Untargeted, Targeted and Universal Adversarial Attacks and Defenses on Time Series
Comments: Published at IJCNN 2020
Subjects: Machine Learning (cs.LG)

Deep learning based models are vulnerable to adversarial attacks. These attacks can be much more harmful in case of targeted attacks, where an attacker tries not only to fool the deep learning model, but also to misguide the model to predict a specific class. Such targeted and untargeted attacks are specifically tailored for an individual sample and require addition of an imperceptible noise to the sample. In contrast, universal adversarial attack calculates a special imperceptible noise which can be added to any sample of the given dataset so that, the deep learning model is forced to predict a wrong class. To the best of our knowledge these targeted and universal attacks on time series data have not been studied in any of the previous works. In this work, we have performed untargeted, targeted and universal adversarial attacks on UCR time series datasets. Our results show that deep learning based time series classification models are vulnerable to these attacks. We also show that universal adversarial attacks have good generalization property as it need only a fraction of the training data. We have also performed adversarial training based adversarial defense. Our results show that models trained adversarially using Fast gradient sign method (FGSM), a single step attack, are able to defend against FGSM as well as Basic iterative method (BIM), a popular iterative attack.

[37]  arXiv:2101.05640 [pdf, other]
Title: Continuous Deep Q-Learning with Simulator for Stabilization of Uncertain Discrete-Time Systems
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)

Applications of reinforcement learning (RL) to stabilization problems of real systems are restricted since an agent needs many experiences to learn an optimal policy and may determine dangerous actions during its exploration. If we know a mathematical model of a real system, a simulator is useful because it predicates behaviors of the real system using the mathematical model with a given system parameter vector. We can collect many experiences more efficiently than interactions with the real system. However, it is difficult to identify the system parameter vector accurately. If we have an identification error, experiences obtained by the simulator may degrade the performance of the learned policy. Thus, we propose a practical RL algorithm that consists of two stages. At the first stage, we choose multiple system parameter vectors. Then, we have a mathematical model for each system parameter vector, which is called a virtual system. We obtain optimal Q-functions for multiple virtual systems using the continuous deep Q-learning algorithm. At the second stage, we represent a Q-function for the real system by a linear approximated function whose basis functions are optimal Q-functions learned at the first stage. The agent learns the Q-function through interactions with the real system online. By numerical simulations, we show the usefulness of our proposed method.

[38]  arXiv:2101.05661 [pdf, other]
Title: A Pipeline for Vision-Based On-Orbit Proximity Operations Using Deep Learning and Synthetic Imagery
Comments: Accepted to IEEE Aerospace Conference 2021. 14 pages, 11 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)

Deep learning has become the gold standard for image processing over the past decade. Simultaneously, we have seen growing interest in orbital activities such as satellite servicing and debris removal that depend on proximity operations between spacecraft. However, two key challenges currently pose a major barrier to the use of deep learning for vision-based on-orbit proximity operations. Firstly, efficient implementation of these techniques relies on an effective system for model development that streamlines data curation, training, and evaluation. Secondly, a scarcity of labeled training data (images of a target spacecraft) hinders creation of robust deep learning models. This paper presents an open-source deep learning pipeline, developed specifically for on-orbit visual navigation applications, that addresses these challenges. The core of our work consists of two custom software tools built on top of a cloud architecture that interconnects all stages of the model development process. The first tool leverages Blender, an open-source 3D graphics toolset, to generate labeled synthetic training data with configurable model poses (positions and orientations), lighting conditions, backgrounds, and commonly observed in-space image aberrations. The second tool is a plugin-based framework for effective dataset curation and model training; it provides common functionality like metadata generation and remote storage access to all projects while giving complete independence to project-specific code. Time-consuming, graphics-intensive processes such as synthetic image generation and model training run on cloud-based computational resources which scale to any scope and budget and allow development of even the largest datasets and models from any machine. The presented system has been used in the Texas Spacecraft Laboratory with marked benefits in development speed and quality.

[39]  arXiv:2101.05673 [pdf, other]
Title: Analysis of hidden feedback loops in continuous machine learning systems
Authors: Anton Khritankov
Comments: 6 pages, 5 figures
Journal-ref: Soft. Qual.: Fut. Persp. on Soft. Eng. Q. SWQD 2021. LNBIP, V. 404
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE)

In this concept paper, we discuss intricacies of specifying and verifying the quality of continuous and lifelong learning artificial intelligence systems as they interact with and influence their environment causing a so-called concept drift. We signify a problem of implicit feedback loops, demonstrate how they intervene with user behavior on an exemplary housing prices prediction system. Based on a preliminary model, we highlight conditions when such feedback loops arise and discuss possible solution approaches.

[40]  arXiv:2101.05684 [pdf, other]
Title: Generating coherent spontaneous speech and gesture from text
Comments: 3 pages, 2 figures, published at the ACM International Conference on Intelligent Virtual Agents (IVA) 2020
Journal-ref: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents (IVA '20), 2020, 3 pages
Subjects: Machine Learning (cs.LG); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Embodied human communication encompasses both verbal (speech) and non-verbal information (e.g., gesture and head movements). Recent advances in machine learning have substantially improved the technologies for generating synthetic versions of both of these types of data: On the speech side, text-to-speech systems are now able to generate highly convincing, spontaneous-sounding speech using unscripted speech audio as the source material. On the motion side, probabilistic motion-generation methods can now synthesise vivid and lifelike speech-driven 3D gesticulation. In this paper, we put these two state-of-the-art technologies together in a coherent fashion for the first time. Concretely, we demonstrate a proof-of-concept system trained on a single-speaker audio and motion-capture dataset, that is able to generate both speech and full-body gestures together from text input. In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data. We illustrate our results by visualising gesture spaces and text-speech-gesture alignments, and through a demonstration video at https://simonalexanderson.github.io/IVA2020 .

[41]  arXiv:2101.05775 [pdf, other]
Title: $\text{O}^2$PF: Oversampling via Optimum-Path Forest for Breast Cancer Detection
Comments: 6 pages, 3 figures. 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS)
Subjects: Machine Learning (cs.LG)

Breast cancer is among the most deadly diseases, distressing mostly women worldwide. Although traditional methods for detection have presented themselves as valid for the task, they still commonly present low accuracies and demand considerable time and effort from professionals. Therefore, a computer-aided diagnosis (CAD) system capable of providing early detection becomes hugely desirable. In the last decade, machine learning-based techniques have been of paramount importance in this context, since they are capable of extracting essential information from data and reasoning about it. However, such approaches still suffer from imbalanced data, specifically on medical issues, where the number of healthy people samples is, in general, considerably higher than the number of patients. Therefore this paper proposes the $\text{O}^2$PF, a data oversampling method based on the unsupervised Optimum-Path Forest Algorithm. Experiments conducted over the full oversampling scenario state the robustness of the model, which is compared against three well-established oversampling methods considering three breast cancer and three general-purpose tasks for medical issues datasets.

[42]  arXiv:2101.05778 [pdf, other]
Title: Topological Deep Learning
Comments: 28 pages, 14 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

This work introduces the Topological CNN (TCNN), which encompasses several topologically defined convolutional methods. Manifolds with important relationships to the natural image space are used to parameterize image filters which are used as convolutional weights in a TCNN. These manifolds also parameterize slices in layers of a TCNN across which the weights are localized. We show evidence that TCNNs learn faster, on less data, with fewer learned parameters, and with greater generalizability and interpretability than conventional CNNs. We introduce and explore TCNN layers for both image and video data. We propose extensions to 3D images and 3D video.

[43]  arXiv:2101.05779 [pdf, other]
Title: Structured Prediction as Translation between Augmented Natural Languages
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as a translation task between augmented natural languages, from which the task-relevant information can be easily extracted. Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL-2005 and CoNLL-2012). We accomplish this while using the same architecture and hyperparameters for all tasks and even when training a single model to solve all tasks at the same time (multi-task learning). Finally, we show that our framework can also significantly improve the performance in a low-resource regime, thanks to better use of label semantics.

[44]  arXiv:2101.05795 [pdf, ps, other]
Title: A Metaheuristic-Driven Approach to Fine-Tune Deep Boltzmann Machines
Comments: 30 pages, 7 figures
Journal-ref: Applied Soft Computing 97 (2020): 105717
Subjects: Machine Learning (cs.LG)

Deep learning techniques, such as Deep Boltzmann Machines (DBMs), have received considerable attention over the past years due to the outstanding results concerning a variable range of domains. One of the main shortcomings of these techniques involves the choice of their hyperparameters, since they have a significant impact on the final results. This work addresses the issue of fine-tuning hyperparameters of Deep Boltzmann Machines using metaheuristic optimization techniques with different backgrounds, such as swarm intelligence, memory- and evolutionary-based approaches. Experiments conducted in three public datasets for binary image reconstruction showed that metaheuristic techniques can obtain reasonable results.

Cross-lists for Fri, 15 Jan 21

[45]  arXiv:2101.05272 (cross-list from cs.HC) [pdf, other]
Title: Real or Virtual? Using Brain Activity Patterns to differentiate Attended Targets during Augmented Reality Scenarios
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Augmented Reality is the fusion of virtual components and our real surroundings. The simultaneous visibility of generated and natural objects often requires users to direct their selective attention to a specific target that is either real or virtual. In this study, we investigated whether this target is real or virtual by using machine learning techniques to classify electroencephalographic (EEG) data collected in Augmented Reality scenarios. A shallow convolutional neural net classified 3 second data windows from 20 participants in a person-dependent manner with an average accuracy above 70\% if the testing data and training data came from different trials. Person-independent classification was possible above chance level for 6 out of 20 participants. Thus, the reliability of such a Brain-Computer Interface is high enough for it to be treated as a useful input mechanism for Augmented Reality applications.

[46]  arXiv:2101.05273 (cross-list from cs.HC) [pdf, other]
Title: AutoDS: Towards Human-Centered Automation of Data Science
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Data science (DS) projects often follow a lifecycle that consists of laborious tasks for data scientists and domain experts (e.g., data exploration, model training, etc.). Only till recently, machine learning(ML) researchers have developed promising automation techniques to aid data workers in these tasks. This paper introduces AutoDS, an automated machine learning (AutoML) system that aims to leverage the latest ML automation techniques to support data science projects. Data workers only need to upload their dataset, then the system can automatically suggest ML configurations, preprocess data, select algorithm, and train the model. These suggestions are presented to the user via a web-based graphical user interface and a notebook-based programming user interface.
We studied AutoDS with 30 professional data scientists, where one group used AutoDS, and the other did not, to complete a data science project. As expected, AutoDS improves productivity; Yet surprisingly, we find that the models produced by the AutoDS group have higher quality and less errors, but lower human confidence scores. We reflect on the findings by presenting design implications for incorporating automation techniques into human work in the data science lifecycle.

[47]  arXiv:2101.05303 (cross-list from cs.AI) [pdf, other]
Title: Understanding the Effect of Out-of-distribution Examples and Interactive Explanations on Human-AI Decision Making
Comments: 42 pages, 22 figures
Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Although AI holds promise for improving human decision making in societally critical domains, it remains an open question how human-AI teams can reliably outperform AI alone and human alone in challenging prediction tasks (also known as complementary performance). We explore two directions to understand the gaps in achieving complementary performance. First, we argue that the typical experimental setup limits the potential of human-AI teams. To account for lower AI performance out-of-distribution than in-distribution because of distribution shift, we design experiments with different distribution types and investigate human performance for both in-distribution and out-of-distribution examples. Second, we develop novel interfaces to support interactive explanations so that humans can actively engage with AI assistance. Using in-person user study and large-scale randomized experiments across three tasks, we demonstrate a clear difference between in-distribution and out-of-distribution, and observe mixed results for interactive explanations: while interactive explanations improve human perception of AI assistance's usefulness, they may magnify human biases and lead to limited performance improvement. Overall, our work points out critical challenges and future directions towards complementary performance.

[48]  arXiv:2101.05307 (cross-list from cs.CV) [pdf, other]
Title: Explainability of vision-based autonomous driving systems: Review and challenges
Comments: submitted to IJCV
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

This survey reviews explainability methods for vision-based self-driving systems. The concept of explainability has several facets and the need for explainability is strong in driving, a safety-critical application. Gathering contributions from several research fields, namely computer vision, deep learning, autonomous driving, explainable AI (X-AI), this survey tackles several points. First, it discusses definitions, context, and motivation for gaining more interpretability and explainability from self-driving systems. Second, major recent state-of-the-art approaches to develop self-driving systems are quickly presented. Third, methods providing explanations to a black-box self-driving system in a post-hoc fashion are comprehensively organized and detailed. Fourth, approaches from the literature that aim at building more interpretable self-driving systems by design are presented and discussed in detail. Finally, remaining open-challenges and potential future research directions are identified and examined.

[49]  arXiv:2101.05313 (cross-list from eess.AS) [pdf, other]
Title: Whispered and Lombard Neural Speech Synthesis
Comments: To appear in SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

It is desirable for a text-to-speech system to take into account the environment where synthetic speech is presented, and provide appropriate context-dependent output to the user. In this paper, we present and compare various approaches for generating different speaking styles, namely, normal, Lombard, and whisper speech, using only limited data. The following systems are proposed and assessed: 1) Pre-training and fine-tuning a model for each style. 2) Lombard and whisper speech conversion through a signal processing based approach. 3) Multi-style generation using a single model based on a speaker verification model. Our mean opinion score and AB preference listening tests show that 1) we can generate high quality speech through the pre-training/fine-tuning approach for all speaking styles. 2) Although our speaker verification (SV) model is not explicitly trained to discriminate different speaking styles, and no Lombard and whisper voice is used for pre-training this system, the SV model can be used as a style encoder for generating different style embeddings as input for the Tacotron system. We also show that the resulting synthetic Lombard speech has a significant positive impact on intelligibility gain.

[50]  arXiv:2101.05339 (cross-list from cond-mat.mtrl-sci) [pdf, other]
Title: Accelerating the screening of amorphous polymer electrolytes by learning to reduce random and systematic errors in molecular dynamics simulations
Comments: 25 pages, 5 figures + supplementary information
Subjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Machine learning has been widely adopted to accelerate the screening of materials. Most existing studies implicitly assume that the training data are generated through a deterministic, unbiased process, but this assumption might not hold for the simulation of some complex materials. In this work, we aim to screen amorphous polymer electrolytes which are promising candidates for the next generation lithium-ion battery technology but extremely expensive to simulate due to their structural complexity. We demonstrate that a multi-task graph neural network can learn from a large amount of noisy, biased data and a small number of unbiased data and reduce both random and systematic errors in predicting the transport properties of polymer electrolytes. This observation allows us to achieve accurate predictions on the properties of complex materials by learning to reduce errors in the training data, instead of running repetitive, expensive simulations which is conventionally used to reduce simulation errors. With this approach, we screen a space of 6247 polymer electrolytes, orders of magnitude larger than previous computational studies. We also find a good extrapolation performance to the top polymers from a larger space of 53362 polymers and 31 experimentally-realized polymers. The strategy employed in this work may be applicable to a broad class of material discovery problems that involve the simulation of complex, amorphous materials.

[51]  arXiv:2101.05400 (cross-list from cs.CL) [pdf, other]
Title: Machine-Assisted Script Curation
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We describe Machine-Aided Script Curator (MASC), a system for human-machine collaborative script authoring. Scripts produced with MASC include (1) English descriptions of sub-events that comprise a larger, complex event; (2) event types for each of those events; (3) a record of entities expected to participate in multiple sub-events; and (4) temporal sequencing between the sub-events. MASC automates portions of the script creation process with suggestions for event types, links to Wikidata, and sub-events that may have been forgotten. We illustrate how these automations are useful to the script writer with a few case-study scripts.

[52]  arXiv:2101.05402 (cross-list from math.ST) [pdf, other]
Title: Optimal Clustering in Anisotropic Gaussian Mixture Models
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the clustering task under anisotropic Gaussian Mixture Models where the covariance matrices from different clusters are unknown and are not necessarily the identical matrix. We characterize the dependence of signal-to-noise ratios on the cluster centers and covariance matrices and obtain the minimax lower bound for the clustering problem. In addition, we propose a computationally feasible procedure and prove it achieves the optimal rate within a few iterations. The proposed procedure is a hard EM type algorithm, and it can also be seen as a variant of the Lloyd's algorithm that is adjusted to the anisotropic covariance matrices.

[53]  arXiv:2101.05403 (cross-list from cs.CV) [pdf]
Title: Image deblurring based on lightweight multi-information fusion network
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recently, deep learning based image deblurring has been well developed. However, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from high computational burden. To solve this problem, we propose a lightweight multiinformation fusion network (LMFN) for image deblurring. The proposed LMFN is designed as an encoder-decoder architecture. In the encoding stage, the image feature is reduced to various smallscale spaces for multi-scale information extraction and fusion without a large amount of information loss. Then, a distillation network is used in the decoding stage, which allows the network benefit the most from residual learning while remaining sufficiently lightweight. Meanwhile, an information fusion strategy between distillation modules and feature channels is also carried out by attention mechanism. Through fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring result with smaller number of parameters and outperforms existing methods in model complexity.

[54]  arXiv:2101.05404 (cross-list from cond-mat.quant-gas) [pdf, other]
Title: Machine-learning enhanced dark soliton detection in Bose-Einstein condensates
Comments: 17 pages, 5 figures
Subjects: Quantum Gases (cond-mat.quant-gas); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Quantum Physics (quant-ph)

Most data in cold-atom experiments comes from images, the analysis of which is limited by our preconceptions of the patterns that could be present in the data. We focus on the well-defined case of detecting dark solitons -- appearing as local density depletions in a BEC -- using a methodology that is extensible to the general task of pattern recognition in images of cold atoms. Studying soliton dynamics over a wide range of parameters requires the analysis of large datasets, making the existing human-inspection-based methodology a significant bottleneck. Here we describe an automated classification and positioning system for identifying localized excitations in atomic Bose-Einstein condensates (BECs) utilizing deep convolutional neural networks to eliminate the need for human image examination. Furthermore, we openly publish our labeled dataset of dark solitons, the first of its kind, for further machine learning research.

[55]  arXiv:2101.05405 (cross-list from cs.CR) [pdf, other]
Title: Privacy Analysis in Language Models via Training Data Leakage Report
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent advances in neural network based language models lead to successful deployments of such models, improving user experience in various applications. It has been demonstrated that strong performance of language models may come along with the ability to memorize rare training samples, which poses serious privacy threats in case the model training is conducted on confidential user content. This necessitates privacy monitoring techniques to minimize the chance of possible privacy breaches for the models deployed in practice. In this work, we introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model. We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data. Our metrics further enable comparing different models trained on the same data in terms of privacy. We demonstrate our approach through extensive numerical studies on real-world datasets such as email and forum conversations. We further illustrate how the proposed metrics can be utilized to investigate the efficacy of mitigations like differentially private training or API hardening.

[56]  arXiv:2101.05415 (cross-list from cs.LO) [pdf, other]
Title: Analysis of E-commerce Ranking Signals via Signal Temporal Logic
Authors: Tommaso Dreossi (Amazon Search), Giorgio Ballardin (Amazon Search), Parth Gupta (Amazon Search), Jan Bakus (Amazon Search), Yu-Hsiang Lin (Amazon Search), Vamsi Salaka (Amazon Search)
Comments: In Proceedings SNR 2020, arXiv:2101.05256
Journal-ref: EPTCS 331, 2021, pp. 33-42
Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

The timed position of documents retrieved by learning to rank models can be seen as signals. Signals carry useful information such as drop or rise of documents over time or user behaviors. In this work, we propose to use the logic formalism called Signal Temporal Logic (STL) to characterize document behaviors in ranking accordingly to the specified formulas. Our analysis shows that interesting document behaviors can be easily formalized and detected thanks to STL formulas. We validate our idea on a dataset of 100K product signals. Through the presented framework, we uncover interesting patterns, such as cold start, warm start, spikes, and inspect how they affect our learning to ranks models.

[57]  arXiv:2101.05442 (cross-list from eess.IV) [pdf, other]
Title: Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans
Comments: Accepted by AAAI 2021, COVID-19, Neural Architecture Search, AutoML
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people's lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may result in low quality models as the real CT scans are 3D images. Besides, the reported results span a broad spectrum on different datasets with a relatively unfair comparison. In this paper, we first use three state-of-the-art 3D models (ResNet3D101, DenseNet3D121, and MC3\_18) to establish the baseline performance on the three publicly available chest CT scan datasets. Then we propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification with the Gumbel Softmax technique to improve the searching efficiency. We further exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results. The experimental results show that our automatically searched models (CovidNet3D) outperform the baseline human-designed models on the three datasets with tens of times smaller model size and higher accuracy. Furthermore, the results also verify that CAM can be well applied in CovidNet3D for COVID-19 datasets to provide interpretability for medical diagnosis.

[58]  arXiv:2101.05457 (cross-list from cs.NE) [pdf, other]
Title: A Multiple Classifier Approach for Concatenate-Designed Neural Networks
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

This article introduces a multiple classifier method to improve the performance of concatenate-designed neural networks, such as ResNet and DenseNet, with the purpose to alleviate the pressure on the final classifier. We give the design of the classifiers, which collects the features produced between the network sets, and present the constituent layers and the activation function for the classifiers, to calculate the classification score of each classifier. We use the L2 normalization method to obtain the classifier score instead of the Softmax normalization. We also determine the conditions that can enhance convergence. As a result, the proposed classifiers are able to improve the accuracy in the experimental cases significantly, and show that the method not only has better performance than the original models, but also produces faster convergence. Moreover, our classifiers are general and can be applied to all classification related concatenate-designed network models.

[59]  arXiv:2101.05477 (cross-list from math.ST) [pdf, other]
Title: Optimal network online change point localisation
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG)

We study the problem of online network change point detection. In this setting, a collection of independent Bernoulli networks is collected sequentially, and the underlying distributions change when a change point occurs. The goal is to detect the change point as quickly as possible, if it exists, subject to a constraint on the number or probability of false alarms. In this paper, on the detection delay, we establish a minimax lower bound and two upper bounds based on NP-hard algorithms and polynomial-time algorithms, i.e., \[ \mbox{detection delay} \begin{cases} \gtrsim \log(1/\alpha) \frac{\max\{r^2/n, \, 1\}}{\kappa_0^2 n \rho},\\ \lesssim \log(\Delta/\alpha) \frac{\max\{r^2/n, \, \log(r)\}}{\kappa_0^2 n \rho}, & \mbox{with NP-hard algorithms},\\ \lesssim \log(\Delta/\alpha) \frac{r}{\kappa_0^2 n \rho}, & \mbox{with polynomial-time algorithms}, \end{cases} \] where $\kappa_0, n, \rho, r$ and $\alpha$ are the normalised jump size, network size, entrywise sparsity, rank sparsity and the overall Type-I error upper bound. All the model parameters are allowed to vary as $\Delta$, the location of the change point, diverges. The polynomial-time algorithms are novel procedures that we propose in this paper, designed for quick detection under two different forms of Type-I error control. The first is based on controlling the overall probability of a false alarm when there are no change points, and the second is based on specifying a lower bound on the expected time of the first false alarm. Extensive experiments show that, under different scenarios and the aforementioned forms of Type-I error control, our proposed approaches outperform state-of-the-art methods.

[60]  arXiv:2101.05479 (cross-list from cs.CV) [pdf, other]
Title: Understanding the Role of Scene Graphs in Visual Question Answering
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Visual Question Answering (VQA) is of tremendous interest to the research community with important applications such as aiding visually impaired users and image-based search. In this work, we explore the use of scene graphs for solving the VQA task. We conduct experiments on the GQA dataset which presents a challenging set of questions requiring counting, compositionality and advanced reasoning capability, and provides scene graphs for a large number of images. We adopt image + question architectures for use with scene graphs, evaluate various scene graph generation techniques for unseen images, propose a training curriculum to leverage human-annotated and auto-generated scene graphs, and build late fusion architectures to learn from multiple image representations. We present a multi-faceted study into the use of scene graphs for VQA, making this work the first of its kind.

[61]  arXiv:2101.05499 (cross-list from cs.CL) [pdf, other]
Title: ECOL: Early Detection of COVID Lies Using Content, Prior Knowledge and Source Information
Comments: to be published in Constraint-2021 Workshop @ AAAI
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Social media platforms are vulnerable to fake news dissemination, which causes negative consequences such as panic and wrong medication in the healthcare domain. Therefore, it is important to automatically detect fake news in an early stage before they get widely spread. This paper analyzes the impact of incorporating content information, prior knowledge, and credibility of sources into models for the early detection of fake news. We propose a framework modeling those features by using BERT language model and external sources, namely Simple English Wikipedia and source reliability tags. The conducted experiments on CONSTRAINT datasets demonstrated the benefit of integrating these features for the early detection of fake news in the healthcare domain.

[62]  arXiv:2101.05510 (cross-list from cs.SI) [pdf, other]
Title: Signal Processing on Higher-Order Networks: Livin' on the Edge ... and Beyond
Comments: 38 pages; 7 figures
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Physics and Society (physics.soc-ph); Machine Learning (stat.ML)

This tutorial paper presents a didactic treatment of the emerging topic of signal processing on higher-order networks. Drawing analogies from discrete and graph signal processing, we introduce the building blocks for processing data on simplicial complexes and hypergraphs, two common abstractions of higher-order networks that can incorporate polyadic relationships.We provide basic introductions to simplicial complexes and hypergraphs, making special emphasis on the concepts needed for processing signals on them. Leveraging these concepts, we discuss Fourier analysis, signal denoising, signal interpolation, node embeddings, and non-linear processing through neural networks in these two representations of polyadic relational structures. In the context of simplicial complexes, we specifically focus on signal processing using the Hodge Laplacian matrix, a multi-relational operator that leverages the special structure of simplicial complexes and generalizes desirable properties of the Laplacian matrix in graph signal processing. For hypergraphs, we present both matrix and tensor representations, and discuss the trade-offs in adopting one or the other. We also highlight limitations and potential research avenues, both to inform practitioners and to motivate the contribution of new researchers to the area.

[63]  arXiv:2101.05537 (cross-list from eess.SY) [pdf, other]
Title: Optimal Energy Shaping via Neural Approximators
Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Dynamical Systems (math.DS)

We introduce optimal energy shaping as an enhancement of classical passivity-based control methods. A promising feature of passivity theory, alongside stability, has traditionally been claimed to be intuitive performance tuning along the execution of a given task. However, a systematic approach to adjust performance within a passive control framework has yet to be developed, as each method relies on few and problem-specific practical insights. Here, we cast the classic energy-shaping control design process in an optimal control framework; once a task-dependent performance metric is defined, an optimal solution is systematically obtained through an iterative procedure relying on neural networks and gradient-based optimization. The proposed method is validated on state-regulation tasks.

[64]  arXiv:2101.05546 (cross-list from q-bio.GN) [pdf]
Title: Feature reduction for machine learning on molecular features: The GeneScore
Comments: 11 pages, 9 figures, 4 tables
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)

We present the GeneScore, a concept of feature reduction for Machine Learning analysis of biomedical data. Using expert knowledge, the GeneScore integrates different molecular data types into a single score. We show that the GeneScore is superior to a binary matrix in the classification of cancer entities from SNV, Indel, CNV, gene fusion and gene expression data. The GeneScore is a straightforward way to facilitate state-of-the-art analysis, while making use of the available scientific knowledge on the nature of molecular data features used.

[65]  arXiv:2101.05555 (cross-list from math.NA) [pdf, other]
Title: Non-intrusive surrogate modeling for parametrized time-dependent PDEs using convolutional autoencoders
Subjects: Numerical Analysis (math.NA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

This work presents a non-intrusive surrogate modeling scheme based on machine learning technology for predictive modeling of complex systems, described by parametrized time-dependent PDEs. For these problems, typical finite element approaches involve the spatiotemporal discretization of the PDE and the solution of the corresponding linear system of equations at each time step. Instead, the proposed method utilizes a convolutional autoencoder in conjunction with a feed forward neural network to establish a low-cost and accurate mapping from the problem's parametric space to its solution space. For this purpose, time history response data are collected by solving the high-fidelity model via FEM for a reduced set of parameter values. Then, by applying the convolutional autoencoder to this data set, a low-dimensional representation of the high-dimensional solution matrices is provided by the encoder, while the reconstruction map is obtained by the decoder. Using the latent representation given by the encoder, a feed-forward neural network is efficiently trained to map points from the problem's parametric space to the compressed version of the respective solution matrices. This way, the encoded response of the system at new parameter values is given by the neural network, while the entire response is delivered by the decoder. This approach effectively bypasses the need to serially formulate and solve the system's governing equations at each time increment, thus resulting in a significant cost reduction and rendering the method ideal for problems requiring repeated model evaluations or 'real-time' computations. The elaborated methodology is demonstrated on the stochastic analysis of time-dependent PDEs solved with the Monte Carlo method, however, it can be straightforwardly applied to other similar-type problems, such as sensitivity analysis, design optimization, etc.

[66]  arXiv:2101.05564 (cross-list from cs.CV) [pdf, other]
Title: FabricNet: A Fiber Recognition Architecture Using Ensemble ConvNets
Comments: Accepted in IEEE Access
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Fabric is a planar material composed of textile fibers. Textile fibers are generated from many natural sources; including plants, animals, minerals, and even, it can be synthetic. A particular fabric may contain different types of fibers that pass through a complex production process. Fiber identification is usually carried out through chemical tests and microscopic tests. However, these testing processes are complicated as well as time-consuming. We propose FabricNet, a pioneering approach for the image-based textile fiber recognition system, which may have a revolutionary impact from individual to the industrial fiber recognition process. The FabricNet can recognize a large scale of fibers by only utilizing a surface image of fabric. The recognition system is constructed using a distinct category of class-based ensemble convolutional neural network (CNN) architecture. The experiment is conducted on recognizing 50 different types of textile fibers. This experiment includes a significantly large number of unique textile fibers than previous research endeavors to the best of our knowledge. We experiment with popular CNN architectures that include Inception, ResNet, VGG, MobileNet, DenseNet, and Xception. Finally, the experimental results demonstrate that FabricNet outperforms the state-of-the-art popular CNN architectures by reaching an accuracy of 84% and F1-score of 90%.

[67]  arXiv:2101.05593 (cross-list from cs.CL) [pdf, ps, other]
Title: On the Temporality of Priors in Entity Linking
Journal-ref: 2020 European Conference on Information Retrieval
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Entity linking is a fundamental task in natural language processing which deals with the lexical ambiguity in texts. An important component in entity linking approaches is the mention-to-entity prior probability. Even though there is a large number of works in entity linking, the existing approaches do not explicitly consider the time aspect, specifically the temporality of an entity's prior probability. We posit that this prior probability is temporal in nature and affects the performance of entity linking systems. In this paper we systematically study the effect of the prior on the entity linking performance over the temporal validity of both texts and KBs.

[68]  arXiv:2101.05611 (cross-list from cs.IR) [pdf, other]
Title: TrNews: Heterogeneous User-Interest Transfer Learning for News Recommendation
Comments: EACL 2021
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

We investigate how to solve the cross-corpus news recommendation for unseen users in the future. This is a problem where traditional content-based recommendation techniques often fail. Luckily, in real-world recommendation services, some publisher (e.g., Daily news) may have accumulated a large corpus with lots of consumers which can be used for a newly deployed publisher (e.g., Political news). To take advantage of the existing corpus, we propose a transfer learning model (dubbed as TrNews) for news recommendation to transfer the knowledge from a source corpus to a target corpus. To tackle the heterogeneity of different user interests and of different word distributions across corpora, we design a translator-based transfer-learning strategy to learn a representation mapping between source and target corpora. The learned translator can be used to generate representations for unseen users in the future. We show through experiments on real-world datasets that TrNews is better than various baselines in terms of four metrics. We also show that our translator is effective among existing transfer strategies.

[69]  arXiv:2101.05625 (cross-list from cs.IR) [pdf, other]
Title: Learning Student Interest Trajectory for MOOCThread Recommendation
Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY); Machine Learning (cs.LG)

In recent years, Massive Open Online Courses (MOOCs) have witnessed immense growth in popularity. Now, due to the recent Covid19 pandemic situation, it is important to push the limits of online education. Discussion forums are primary means of interaction among learners and instructors. However, with growing class size, students face the challenge of finding useful and informative discussion forums. This problem can be solved by matching the interest of students with thread contents. The fundamental challenge is that the student interests drift as they progress through the course, and forum contents evolve as students or instructors update them. In our paper, we propose to predict future interest trajectories of students. Our model consists of two key operations: 1) Update operation and 2) Projection operation. Update operation models the inter-dependency between the evolution of student and thread using coupled Recurrent Neural Networks when the student posts on the thread. The projection operation learns to estimate future embedding of students and threads. For students, the projection operation learns the drift in their interests caused by the change in the course topic they study. The projection operation for threads exploits how different posts induce varying interest levels in a student according to the thread structure. Extensive experimentation on three real-world MOOC datasets shows that our model significantly outperforms other baselines for thread recommendation.

[70]  arXiv:2101.05626 (cross-list from cs.IR) [pdf, other]
Title: Eating Garlic Prevents COVID-19 Infection: Detecting Misinformation on the Arabic Content of Twitter
Comments: 18 pages, 4 figures
Subjects: Information Retrieval (cs.IR); Computers and Society (cs.CY); Machine Learning (cs.LG); Social and Information Networks (cs.SI)

The rapid growth of social media content during the current pandemic provides useful tools for disseminating information which has also become a root for misinformation. Therefore, there is an urgent need for fact-checking and effective techniques for detecting misinformation in social media. In this work, we study the misinformation in the Arabic content of Twitter. We construct a large Arabic dataset related to COVID-19 misinformation and gold-annotate the tweets into two categories: misinformation or not. Then, we apply eight different traditional and deep machine learning models, with different features including word embeddings and word frequency. The word embedding models (\textsc{FastText} and word2vec) exploit more than two million Arabic tweets related to COVID-19. Experiments show that optimizing the area under the curve (AUC) improves the models' performance and the Extreme Gradient Boosting (XGBoost) presents the highest accuracy in detecting COVID-19 misinformation online.

[71]  arXiv:2101.05634 (cross-list from cs.CL) [pdf, other]
Title: Better Together -- An Ensemble Learner for Combining the Results of Ready-made Entity Linking Systems
Comments: SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Entity linking (EL) is the task of automatically identifying entity mentions in text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. Throughout the past decade, a plethora of EL systems and pipelines have become available, where performance of individual systems varies heavily across corpora, languages or domains. Linking performance varies even between different mentions in the same text corpus, where, for instance, some EL approaches are better able to deal with short surface forms while others may perform better when more context information is available. To this end, we argue that performance may be optimised by exploiting results from distinct EL systems on the same corpus, thereby leveraging their individual strengths on a per-mention basis. In this paper, we introduce a supervised approach which exploits the output of multiple ready-made EL systems by predicting the correct link on a per-mention basis. Experimental results obtained on existing ground truth datasets and exploiting three state-of-the-art EL systems show the effectiveness of our approach and its capacity to significantly outperform the individual EL systems as well as a set of baseline methods.

[72]  arXiv:2101.05641 (cross-list from cs.IR) [pdf, other]
Title: $C^3DRec$: Cloud-Client Cooperative Deep Learning for Temporal Recommendation in the Post-GDPR Era
Authors: Jialiang Han, Yun Ma
Subjects: Information Retrieval (cs.IR); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Mobile devices enable users to retrieve information at any time and any place. Considering the occasional requirements and fragmentation usage pattern of mobile users, temporal recommendation techniques are proposed to improve the efficiency of information retrieval on mobile devices by means of accurately recommending items via learning temporal interests with short-term user interaction behaviors. However, the enforcement of privacy-preserving laws and regulations, such as GDPR, may overshadow the successful practice of temporal recommendation. The reason is that state-of-the-art recommendation systems require to gather and process the user data in centralized servers but the interaction behaviors data used for temporal recommendation are usually non-transactional data that are not allowed to gather without the explicit permission of users according to GDPR. As a result, if users do not permit services to gather their interaction behaviors data, the temporal recommendation fails to work. To realize the temporal recommendation in the post-GDPR era, this paper proposes $C^3DRec$, a cloud-client cooperative deep learning framework of mining interaction behaviors for recommendation while preserving user privacy. $C^3DRec$ constructs a global recommendation model on centralized servers using data collected before GDPR and fine-tunes the model directly on individual local devices using data collected after GDPR. We design two modes to accomplish the recommendation, i.e. pull mode where candidate items are pulled down onto the devices and fed into the local model to get recommended items, and push mode where the output of the local model is pushed onto the server and combined with candidate items to get recommended ones. Evaluation results show that $C^3DRec$ achieves comparable recommendation accuracy to the centralized approaches, with minimal privacy concern.

[73]  arXiv:2101.05646 (cross-list from cs.CR) [pdf]
Title: Malicious Code Detection: Run Trace Output Analysis by LSTM
Comments: 11 pages, 5 figures, 5 tables, accepted to IEEE Access
Subjects: Cryptography and Security (cs.CR); Computation and Language (cs.CL); Machine Learning (cs.LG)

Malicious software threats and their detection have been gaining importance as a subdomain of information security due to the expansion of ICT applications in daily settings. A major challenge in designing and developing anti-malware systems is the coverage of the detection, particularly the development of dynamic analysis methods that can detect polymorphic and metamorphic malware efficiently. In the present study, we propose a methodological framework for detecting malicious code by analyzing run trace outputs by Long Short-Term Memory (LSTM). We developed models of run traces of malicious and benign Portable Executable (PE) files. We created our dataset from run trace outputs obtained from dynamic analysis of PE files. The obtained dataset was in the instruction format as a sequence and was called Instruction as a Sequence Model (ISM). By splitting the first dataset into basic blocks, we obtained the second one called Basic Block as a Sequence Model (BSM). The experiments showed that the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%, while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.

[74]  arXiv:2101.05652 (cross-list from cs.NE) [pdf, other]
Title: A Nature-Inspired Feature Selection Approach based on Hypercomplex Information
Comments: 17 pages, 7 figures
Journal-ref: APPLIED SOFT COMPUTING; v. 94, SEP 2020
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Feature selection for a given model can be transformed into an optimization task. The essential idea behind it is to find the most suitable subset of features according to some criterion. Nature-inspired optimization can mitigate this problem by producing compelling yet straightforward solutions when dealing with complicated fitness functions. Additionally, new mathematical representations, such as quaternions and octonions, are being used to handle higher-dimensional spaces. In this context, we are introducing a meta-heuristic optimization framework in a hypercomplex-based feature selection, where hypercomplex numbers are mapped to real-valued solutions and then transferred onto a boolean hypercube by a sigmoid function. The intended hypercomplex feature selection is tested for several meta-heuristic algorithms and hypercomplex representations, achieving results comparable to some state-of-the-art approaches. The good results achieved by the proposed approach make it a promising tool amongst feature selection research.

[75]  arXiv:2101.05656 (cross-list from cs.CL) [pdf, other]
Title: On Informative Tweet Identification For Tracking Mass Events
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Twitter has been heavily used as an important channel for communicating and discussing about events in real-time. In such major events, many uninformative tweets are also published rapidly by many users, making it hard to follow the events. In this paper, we address this problem by investigating machine learning methods for automatically identifying informative tweets among those that are relevant to a target event. We examine both traditional approaches with a rich set of handcrafted features and state of the art approaches with automatically learned features. We further propose a hybrid model that leverages both the handcrafted features and the automatically learned ones. Our experiments on several large datasets of real-world events show that the latter approaches significantly outperform the former and our proposed model performs the best, suggesting highly effective mechanisms for tracking mass events.

[76]  arXiv:2101.05657 (cross-list from math.OC) [pdf, other]
Title: No-go Theorem for Acceleration in the Hyperbolic Plane
Comments: 12 pages
Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)

In recent years there has been significant effort to adapt the key tools and ideas in convex optimization to the Riemannian setting. One key challenge has remained: Is there a Nesterov-like accelerated gradient method for geodesically convex functions on a Riemannian manifold? Recent work has given partial answers and the hope was that this ought to be possible.
Here we dash these hopes. We prove that in a noisy setting, there is no analogue of accelerated gradient descent for geodesically convex functions on the hyperbolic plane. Our results apply even when the noise is exponentially small. The key intuition behind our proof is short and simple: In negatively curved spaces, the volume of a ball grows so fast that information about the past gradients is not useful in the future.

[77]  arXiv:2101.05679 (cross-list from stat.ML) [pdf, other]
Title: Convex Smoothed Autoencoder-Optimal Transport model
Authors: Aratrika Mustafi
Comments: 26 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Generative modelling is a key tool in unsupervised machine learning which has achieved stellar success in recent years. Despite this huge success, even the best generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) come with their own shortcomings, mode collapse and mode mixture being the two most prominent problems. In this paper we develop a new generative model capable of generating samples which resemble the observed data, and is free from mode collapse and mode mixture. Our model is inspired by the recently proposed Autoencoder-Optimal Transport (AE-OT) model and tries to improve on it by addressing the problems faced by the AE-OT model itself, specifically with respect to the sample generation algorithm. Theoretical results concerning the bound on the error in approximating the non-smooth Brenier potential by its smoothed estimate, and approximating the discontinuous optimal transport map by a smoothed optimal transport map estimate have also been established in this paper.

[78]  arXiv:2101.05781 (cross-list from cs.CR) [pdf, other]
Title: Time-Based CAN Intrusion Detection Benchmark
Authors: Deborah H. Blevins (1), Pablo Moriano (2), Robert A. Bridges (2), Miki E. Verma (2), Michael D. Iannacone (2), Samuel C Hollifield (2) ((1) University of Kentucky, (2) Oak Ridge National Laboratory)
Comments: 7 pages, 2 figures
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Modern vehicles are complex cyber-physical systems made of hundreds of electronic control units (ECUs) that communicate over controller area networks (CANs). This inherited complexity has expanded the CAN attack surface which is vulnerable to message injection attacks. These injections change the overall timing characteristics of messages on the bus, and thus, to detect these malicious messages, time-based intrusion detection systems (IDSs) have been proposed. However, time-based IDSs are usually trained and tested on low-fidelity datasets with unrealistic, labeled attacks. This makes difficult the task of evaluating, comparing, and validating IDSs. Here we detail and benchmark four time-based IDSs against the newly published ROAD dataset, the first open CAN IDS dataset with real (non-simulated) stealthy attacks with physically verified effects. We found that methods that perform hypothesis testing by explicitly estimating message timing distributions have lower performance than methods that seek anomalies in a distribution-related statistic. In particular, these "distribution-agnostic" based methods outperform "distribution-based" methods by at least 55% in area under the precision-recall curve (AUC-PR). Our results expand the body of knowledge of CAN time-based IDSs by providing details of these methods and reporting their results when tested on datasets with real advanced attacks. Finally, we develop an after-market plug-in detector using lightweight hardware, which can be used to deploy the best performing IDS method on nearly any vehicle.

[79]  arXiv:2101.05783 (cross-list from cs.CL) [pdf, other]
Title: Persistent Anti-Muslim Bias in Large Language Models
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

It has been observed that large-scale language models capture undesirable societal biases, e.g. relating to race and gender; yet religious bias has been relatively unexplored. We demonstrate that GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias. We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation, to understand this anti-Muslim bias, demonstrating that it appears consistently and creatively in different uses of the model and that it is severe even compared to biases about other religious groups. For instance, "Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is mapped to "money" in 5% of test cases. We quantify the positive distraction needed to overcome this bias with adversarial text prompts, and find that use of the most positive 6 adjectives reduces violent completions for "Muslims" from 66% to 20%, but which is still higher than for other religious groups.

[80]  arXiv:2101.05796 (cross-list from cs.CV) [pdf, other]
Title: DeFlow: Learning Complex Image Degradations from Unpaired Data with Conditional Flows
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

The difficulty of obtaining paired data remains a major bottleneck for learning image restoration and enhancement models for real-world applications. Current strategies aim to synthesize realistic training data by modeling noise and degradations that appear in real-world settings. We propose DeFlow, a method for learning stochastic image degradations from unpaired data. Our approach is based on a novel unpaired learning formulation for conditional normalizing flows. We model the degradation process in the latent space of a shared flow encoder-decoder network. This allows us to learn the conditional distribution of a noisy image given the clean input by solely minimizing the negative log-likelihood of the marginal distributions. We validate our DeFlow formulation on the task of joint image restoration and super-resolution. The models trained with the synthetic data generated by DeFlow outperform previous learnable approaches on all three datasets.

Replacements for Fri, 15 Jan 21

[81]  arXiv:1202.0302 (replaced) [pdf, other]
Title: Kernels on Sample Sets via Nonparametric Divergence Estimates
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[82]  arXiv:1812.01995 (replaced) [pdf, other]
Title: Deep Learning Model for Finding New Superconductors
Comments: 10 pages in main text. Deep learning, Machine learning, Material search, Superconductors
Journal-ref: Phys. Rev. B 103, 014509, (2021)
Subjects: Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci); Superconductivity (cond-mat.supr-con); Computation and Language (cs.CL); Computational Physics (physics.comp-ph)
[83]  arXiv:1904.12054 (replaced) [pdf, other]
Title: Benchmark and Survey of Automated Machine Learning Frameworks
Comments: Revised version accepted for publication at Journal of Artificial Intelligence Research (JAIR)
Journal-ref: Journal of Artificial Intelligence Research 70 (2021) 411-474
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[84]  arXiv:1906.08189 (replaced) [pdf, other]
Title: Reward Prediction Error as an Exploration Objective in Deep RL
Comments: Published at IJCAI 2020, camera-ready version
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[85]  arXiv:1909.13377 (replaced) [pdf, other]
Title: Lane Attention: Predicting Vehicles' Moving Trajectories by Learning Their Attention over Lanes
Comments: IROS 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
[86]  arXiv:1910.05177 (replaced) [pdf, other]
Title: IdBench: Evaluating Semantic Representations of Identifier Names in Source Code
Comments: Accepted as full research paper at International Conference on Software Engineering (ICSE) 2021
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE); Machine Learning (stat.ML)
[87]  arXiv:2002.09049 (replaced) [pdf, other]
Title: Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision
Comments: Accepted by AAAI2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[88]  arXiv:2003.03351 (replaced) [pdf, ps, other]
Title: Tighter Bound Estimation of Sensitivity Analysis for Incremental and Decremental Data Modification
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[89]  arXiv:2004.05916 (replaced) [pdf, other]
Title: Telling BERT's full story: from Local Attention to Global Aggregation
Comments: Accepted at EACL 2021
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[90]  arXiv:2004.07234 (replaced) [pdf, other]
Title: LOCA: LOcal Conformal Autoencoder for standardized data coordinates
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[91]  arXiv:2005.03448 (replaced) [pdf, other]
Title: Physics-informed learning of governing equations from scarce data
Comments: 46 pages; 1 table, 6 figures and 3 extended data figures in main text; 2 tables and 12 figures in supplementary information
Subjects: Machine Learning (cs.LG); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
[92]  arXiv:2006.04767 (replaced) [pdf, other]
Title: Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Machine Learning (stat.ML)
[93]  arXiv:2006.16801 (replaced) [pdf, other]
Title: Random Partitioning Forest for Point-Wise and Collective Anomaly Detection -- Application to Intrusion Detection
Comments: arXiv admin note: text overlap with arXiv:1705.03800
Journal-ref: IEEE Transactions on Information Forensics and Security, pp1-16, 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[94]  arXiv:2007.05646 (replaced) [pdf, other]
Title: Transformations between deep neural networks
Comments: 14 pages, 10 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[95]  arXiv:2007.07206 (replaced) [pdf, other]
Title: Learning Robust State Abstractions for Hidden-Parameter Block MDPs
Comments: Accepted at the 9th International Conference on Learning Representations. 22 pages, 14 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[96]  arXiv:2007.11091 (replaced) [pdf, other]
Title: EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[97]  arXiv:2008.00181 (replaced) [pdf, other]
Title: Relation-aware Meta-learning for Market Segment Demand Prediction with Limited Records
Comments: First two authors contributed equally; Accepted by WSDM 2021
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)
[98]  arXiv:2009.02296 (replaced) [pdf, other]
Title: Variational Deep Learning for the Identification and Reconstruction of Chaotic and Stochastic Dynamical Systems from Noisy and Partial Observations
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[99]  arXiv:2010.03409 (replaced) [pdf, other]
Title: Learning Mesh-Based Simulation with Graph Networks
Journal-ref: International Conference on Learning Representations (ICLR), 2021
Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
[100]  arXiv:2010.03658 (replaced) [pdf, other]
Title: Robust Semi-Supervised Learning with Out of Distribution Data
Comments: Preprint
Subjects: Machine Learning (cs.LG)
[101]  arXiv:2010.03934 (replaced) [pdf, other]
Title: Prioritized Level Replay
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[102]  arXiv:2010.08710 (replaced) [pdf, other]
Title: Causal Transfer Random Forest: Combining Logged Data and Randomized Experiments for Robust Prediction
Comments: 9 pages, 7 figures, 2 tables, accepted to WSDM 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[103]  arXiv:2010.15549 (replaced) [pdf, other]
Title: Multi-Constitutive Neural Network for Large Deformation Poromechanics Problem
Comments: Camera-ready (final) paper of the Third Workshop on Machine Learning and the Physical Sciences (NeurIPS 2020), Vancouver Add more figures despite the workshop is closed
Subjects: Machine Learning (cs.LG); Geophysics (physics.geo-ph)
[104]  arXiv:2011.10996 (replaced) [pdf, other]
Title: Time series classification for predictive maintenance on event logs
Comments: 19 pages, 9 figures, submitted to ECMLPKDD 2021 Journal Track
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[105]  arXiv:2011.11194 (replaced) [pdf, other]
Title: V3H: View Variation and View Heredity for Incomplete Multi-view Clustering
Comments: Accepted by IEEE Transactions on Artificial Intelligence
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
[106]  arXiv:2012.05199 (replaced) [pdf, other]
Title: A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[107]  arXiv:2012.10203 (replaced) [pdf, other]
Title: Classification with Strategically Withheld Data
Subjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT)
[108]  arXiv:2101.01152 (replaced) [pdf, other]
Title: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise
Comments: 29 pages, 9 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[109]  arXiv:2101.01867 (replaced) [pdf, other]
Title: dame-flame: A Python Library Providing Fast Interpretable Matching for Causal Inference
Authors: Neha R. Gupta (1), Vittorio Orlandi (1), Chia-Rui Chang (2), Tianyu Wang (1), Marco Morucci (1), Pritam Dey (1), Thomas J. Howell (1), Xian Sun (1), Angikar Ghosal (1), Sudeepa Roy (1), Cynthia Rudin (1), Alexander Volfovsky (1) ((1) Duke University, (2) Harvard University)
Comments: 5 pages, 1 figure; Reference and discussion of CEM corrected
Subjects: Machine Learning (cs.LG); Mathematical Software (cs.MS)
[110]  arXiv:2101.04223 (replaced) [pdf, other]
Title: Exploiting Multiple Timescales in Hierarchical Echo State Networks
Subjects: Machine Learning (cs.LG)
[111]  arXiv:2101.04562 (replaced) [pdf, other]
Title: Hyperbolic Deep Neural Networks: A Survey
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[112]  arXiv:1509.07553 (replaced) [pdf, other]
Title: Linear-time Learning on Distributions with Approximate Kernel Embeddings
Journal-ref: AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 2016, 2073-2079
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[113]  arXiv:1511.04150 (replaced) [pdf, other]
Title: Deep Mean Maps
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[114]  arXiv:1702.02982 (replaced) [pdf, other]
Title: Fixing an error in Caponnetto and de Vito (2007)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[115]  arXiv:1801.01401 (replaced) [pdf, other]
Title: Demystifying MMD GANs
Comments: Published at ICLR 2018: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[116]  arXiv:1805.11565 (replaced) [pdf, other]
Title: On gradient regularizers for MMD GANs
Comments: Code at this https URL
Journal-ref: Advances in Neural Information Processing Systems 31 (NeurIPS 2018), 6700-6710
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[117]  arXiv:1811.08357 (replaced) [pdf, other]
Title: Learning deep kernels for exponential family densities
Journal-ref: Proceedings of the 36th International Conference on Machine Learning (ICML 2019), PMLR 97:6737-6746
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[118]  arXiv:1812.03894 (replaced) [pdf, other]
Title: Physics-Based Learning for Robotic Environmental Sensing
Comments: 20 pages, 26 figures
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Machine Learning (stat.ML)
[119]  arXiv:1906.02104 (replaced) [pdf, ps, other]
Title: Unbiased estimators for the variance of MMD estimators
Comments: Fixes and extends the appendices of arXiv:1611.04488 and arXiv:1511.04581
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[120]  arXiv:1906.05497 (replaced) [pdf, other]
Title: Deep Network Approximation Characterized by Number of Neurons
Journal-ref: Communications in Computational Physics, Volume 28, Issue 5, November 2020, Pages 1768-1811
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[121]  arXiv:1906.10655 (replaced) [pdf, ps, other]
Title: Complexity of Highly Parallel Non-Smooth Convex Optimization
Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
[122]  arXiv:1910.04366 (replaced) [pdf, ps, other]
Title: Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity
Comments: 31 pages, 9 tables
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[123]  arXiv:2001.00137 (replaced) [pdf, other]
Title: Stacked DeBERT: All Attention in Incomplete Data for Text Classification
Comments: Published (this https URL), Code (this https URL)
Journal-ref: Neural Networks 136 (2021) 87-96
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[124]  arXiv:2001.09326 (replaced) [pdf, other]
Title: Gesticulator: A framework for semantically-aware speech-driven gesture generation
Comments: ICMI 2020 Best Paper Award. Code is available. 9 pages, 6 figures
Journal-ref: Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI '20)
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[125]  arXiv:2002.09116 (replaced) [pdf, other]
Title: Learning Deep Kernels for Non-Parametric Two-Sample Tests
Journal-ref: Proceedings of the 37th International Conference on Machine Learning (ICML 2020), PMLR 119:6316-6326
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
[126]  arXiv:2005.10400 (replaced) [pdf, other]
Title: Principal Fairness for Human and Algorithmic Decision-Making
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)
[127]  arXiv:2005.10881 (replaced) [pdf, other]
Title: Revisiting Membership Inference Under Realistic Assumptions
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[128]  arXiv:2006.03680 (replaced) [pdf, other]
Title: Evaluating the Disentanglement of Deep Generative Models through Manifold Topology
Comments: Published at ICLR 2021
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[129]  arXiv:2006.05942 (replaced) [pdf, ps, other]
Title: On Uniform Convergence and Low-Norm Interpolation Learning
Comments: v3: No content changes to this final version, as published at NeurIPS 2020: this https URL
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[130]  arXiv:2006.14154 (replaced) [pdf, other]
Title: Strictly Batch Imitation Learning by Energy-based Distribution Matching
Comments: In Proc. 34th International Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[131]  arXiv:2008.01553 (replaced) [pdf, other]
Title: E-Tree Learning: A Novel Decentralized Model Learning Framework for Edge AI
Comments: IEEE Internet of Things Journal, 2020
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[132]  arXiv:2008.12949 (replaced) [pdf, other]
Title: VR-Caps: A Virtual Environment for Capsule Endoscopy
Comments: 18 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[133]  arXiv:2009.05169 (replaced) [pdf, other]
Title: Sparsifying Transformer Models with Differentiable Representation Pooling
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[134]  arXiv:2009.07517 (replaced) [pdf, other]
Title: MATS: An Interpretable Trajectory Forecasting Representation for Planning and Control
Comments: 14 pages, 6 figures, 1 table. All code, models, and data can be found at this https URL . Conference on Robot Learning (CoRL) 2020
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Systems and Control (eess.SY)
[135]  arXiv:2011.08828 (replaced) [pdf, other]
Title: Uncertainty estimation for molecular dynamics and sampling
Comments: 17 pages, 9 figures
Subjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
[136]  arXiv:2011.13388 (replaced) [pdf, other]
Title: 3DSNet: Unsupervised Shape-to-Shape 3D Style Transfer
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[137]  arXiv:2012.02409 (replaced) [pdf, other]
Title: When does gradient descent with logistic loss find interpolating two-layer networks?
Comments: 43 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[138]  arXiv:2012.08023 (replaced) [pdf, other]
Title: Friedrichs Learning: Weak Solutions of Partial Differential Equations via Deep Learning
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[139]  arXiv:2012.11390 (replaced) [pdf, other]
Title: Adversarial Training for a Continuous Robustness Control Problem in Power Systems
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[140]  arXiv:2012.12820 (replaced) [pdf]
Title: Multiclass Spinal Cord Tumor Segmentation on MRI with Deep Learning
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[141]  arXiv:2012.13635 (replaced) [pdf, other]
Title: Logic Tensor Networks
Comments: 68 pages, 28 figures, 6 tables
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[142]  arXiv:2101.00401 (replaced) [pdf, other]
Title: Border Basis Computation with Gradient-Weighted Norm
Authors: Hiroshi Kera
Comments: 19 pages, 1 figure
Subjects: Symbolic Computation (cs.SC); Machine Learning (cs.LG); Commutative Algebra (math.AC)
[143]  arXiv:2101.01628 (replaced) [pdf]
Title: Local Translation Services for Neglected Languages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[144]  arXiv:2101.02888 (replaced) [pdf, other]
Title: Predicting Semen Motility using three-dimensional Convolutional Neural Networks
Comments: Corrected typos. Made slight changes as per the comments
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[145]  arXiv:2101.03641 (replaced) [pdf, other]
Title: Learning Augmented Index Policy for Optimal Service Placement at the Network Edge
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
[146]  arXiv:2101.04899 (replaced) [pdf, ps, other]
Title: Experimental Evaluation of Deep Learning models for Marathi Text Classification
Comments: Accepted at ICMISC 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[147]  arXiv:2101.04904 (replaced) [pdf, other]
Title: EEC: Learning to Encode and Regenerate Images for Continual Learning
Comments: Accepted at ICLR 2021. A preliminary version of this work was presented at ICML 2020 Workshop on Lifelong Machine Learning: arXiv:2007.06637
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[148]  arXiv:2101.05091 (replaced) [pdf]
Title: MRI Images, Brain Lesions and Deep Learning
Comments: Submitted to: Computer Programs and Methods in Biomedicine update (2021)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
[149]  arXiv:2101.05217 (replaced) [pdf, other]
Title: Similarity-based prediction for channel mapping and user positioning
Authors: Luc Le Magoarou (IRT b-com, Hypermedia)
Comments: IEEE Communications Letters, Institute of Electrical and Electronics Engineers, In press
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
[ total of 149 entries: 1-149 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2101, contact, help  (Access key information)