We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 137 entries: 1-137 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 23 Sep 21

[1]  arXiv:2109.10380 [pdf, other]
Title: Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

From assigning computing tasks to servers and advertisements to users, sequential online matching problems arise in a wide variety of domains. The challenge in online matching lies in making irrevocable assignments while there is uncertainty about future inputs. In the theoretical computer science literature, most policies are myopic or greedy in nature. In real-world applications where the matching process is repeated on a regular basis, the underlying data distribution can be leveraged for better decision-making. We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data. We devise a set of neural network architectures, design feature representations, and empirically evaluate them across two online matching problems: Edge-Weighted Online Bipartite Matching and Online Submodular Bipartite Matching. We show that most of the learning approaches perform significantly better than classical greedy algorithms on four synthetic and real-world datasets. Our code is publicly available at https://github.com/lyeskhalil/CORL.git.

[2]  arXiv:2109.10431 [pdf, other]
Title: Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Information Theory (cs.IT); Machine Learning (stat.ML)

We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.

[3]  arXiv:2109.10432 [pdf, other]
Title: Beyond Discriminant Patterns: On the Robustness of Decision Rule Ensembles
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Local decision rules are commonly understood to be more explainable, due to the local nature of the patterns involved. With numerical optimization methods such as gradient boosting, ensembles of local decision rules can gain good predictive performance on data involving global structure. Meanwhile, machine learning models are being increasingly used to solve problems in high-stake domains including healthcare and finance. Here, there is an emerging consensus regarding the need for practitioners to understand whether and how those models could perform robustly in the deployment environments, in the presence of distributional shifts. Past research on local decision rules has focused mainly on maximizing discriminant patterns, without due consideration of robustness against distributional shifts. In order to fill this gap, we propose a new method to learn and ensemble local decision rules, that are robust both in the training and deployment environments. Specifically, we propose to leverage causal knowledge by regarding the distributional shifts in subpopulations and deployment environments as the results of interventions on the underlying system. We propose two regularization terms based on causal knowledge to search for optimal and stable rules. Experiments on both synthetic and benchmark datasets show that our method is effective and robust against distributional shifts in multiple environments.

[4]  arXiv:2109.10436 [pdf, ps, other]
Title: Classification with Nearest Disjoint Centroids
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other closely related classifiers on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.

[5]  arXiv:2109.10442 [pdf, other]
Title: Selecting Datasets for Evaluating an Enhanced Deep Learning Framework
Comments: 5 pages, 2 figures, Submitted to SATNAC 2021, Drakensberg, South Africa
Subjects: Machine Learning (cs.LG)

A framework was developed to address limitations associated with existing techniques for analysing sequences. This work deals with the steps followed to select suitable datasets characterised by discrete irregular sequential patterns. To identify, select, explore and evaluate which datasets from various sources extracted from more than 400 research articles, an interquartile range method for outlier calculation and a qualitative Billauer's algorithm was adapted to provide periodical peak detection in such datasets.
The developed framework was then tested using the most appropriate datasets.
The research concluded that the financial market-daily currency exchange domain is the most suitable kind of data set for the evaluation of the designed deep learning framework, as it provides high levels of discrete irregular patterns.

[6]  arXiv:2109.10458 [pdf, other]
Title: Achieving Counterfactual Fairness for Causal Bandit
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

In online recommendation, customers arrive in a sequential and stochastic manner from an underlying distribution and the online decision model recommends a chosen item for each arriving individual based on some strategy. We study how to recommend an item at each step to maximize the expected reward while achieving user-side fairness for customers, i.e., customers who share similar profiles will receive a similar reward regardless of their sensitive attributes and items being recommended. By incorporating causal inference into bandits and adopting soft intervention to model the arm selection strategy, we first propose the d-separation based UCB algorithm (D-UCB) to explore the utilization of the d-separation set in reducing the amount of exploration needed to achieve low cumulative regret. Based on that, we then propose the fair causal bandit (F-UCB) for achieving the counterfactual individual fairness. Both theoretical analysis and empirical evaluation demonstrate effectiveness of our algorithms.

[7]  arXiv:2109.10469 [pdf, other]
Title: Differentiable Scaffolding Tree for Molecular Optimization
Subjects: Machine Learning (cs.LG)

The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery. Deep generative models and combinatorial optimization methods achieve initial success but still struggle with directly modeling discrete chemical structures and often heavily rely on brute-force enumeration. The challenge comes from the discrete and non-differentiable nature of molecule structures. To address this, we propose differentiable scaffolding tree (DST) that utilizes a learned knowledge network to convert discrete chemical structures to locally differentiable ones. DST enables a gradient-based optimization on a chemical graph structure by back-propagating the derivatives from the target properties through a graph neural network (GNN). Our empirical studies show the gradient-based molecular optimizations are both effective and sample efficient. Furthermore, the learned graph parameters can also provide an explanation that helps domain experts understand the model output.

[8]  arXiv:2109.10476 [pdf, other]
Title: Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules
Comments: 18 pages
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL)

We target the problem of synthesizing proofs of semantic equivalence between two programs made of sequences of statements with complex symbolic expressions. We propose a neural network architecture based on the transformer to generate axiomatic proofs of equivalence between program pairs. We generate expressions which include scalars and vectors and support multi-typed rewrite rules to prove equivalence. For training the system, we develop an original training technique, which we call self-supervised sample selection. This incremental training improves the quality, generalizability and extensibility of the learned model. We study the effectiveness of the system to generate proofs of increasing length, and we demonstrate how transformer models learn to represent complex and verifiable symbolic reasoning. Our system, S4Eq, achieves 97% proof success on 10,000 pairs of programs while ensuring zero false positives by design.

[9]  arXiv:2109.10490 [pdf]
Title: Benchmarking Lane-changing Decision-making for Deep Reinforcement Learning
Comments: 10 pages, 5 figures, 3 tables
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

The development of autonomous driving has attracted extensive attention in recent years, and it is essential to evaluate the performance of autonomous driving. However, testing on the road is expensive and inefficient. Virtual testing is the primary way to validate and verify self-driving cars, and the basis of virtual testing is to build simulation scenarios. In this paper, we propose a training, testing, and evaluation pipeline for the lane-changing task from the perspective of deep reinforcement learning. First, we design lane change scenarios for training and testing, where the test scenarios include stochastic and deterministic parts. Then, we deploy a set of benchmarks consisting of learning and non-learning approaches. We train several state-of-the-art deep reinforcement learning methods in the designed training scenarios and provide the benchmark metrics evaluation results of the trained models in the test scenarios. The designed lane-changing scenarios and benchmarks are both opened to provide a consistent experimental environment for the lane-changing task.

[10]  arXiv:2109.10502 [pdf, other]
Title: A Spectral Approach to Off-Policy Evaluation for POMDPs
Authors: Yash Nair, Nan Jiang
Subjects: Machine Learning (cs.LG)

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes, where the evaluation policy depends only on observable variables but the behavior policy depends on latent states (Tennenholtz et al. (2020a)). Prior work on this problem uses a causal identification strategy based on one-step observable proxies of the hidden state, which relies on the invertibility of certain one-step moment matrices. In this work, we relax this requirement by using spectral methods and extending one-step proxies both into the past and future. We empirically compare our OPE methods to existing ones and demonstrate their improved prediction accuracy and greater generality. Lastly, we derive a separate Importance Sampling (IS) algorithm which relies on rank, distinctness, and positivity conditions, and not on the strict sufficiency conditions of observable trajectories with respect to the reward and hidden-state structure required by Tennenholtz et al. (2020a).

[11]  arXiv:2109.10512 [pdf, other]
Title: Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center. Recently, advanced model compression methods, like the Lottery Ticket Hypothesis, have already been implemented on federated learning to reduce the model size and communication cost. However, Backdoor Attack can compromise its implementation in the federated learning scenario. The malicious edge device trains the client model with poisoned private data and uploads parameters to the center, embedding a backdoor to the global shared model after unwitting aggregative optimization. During the inference phase, the model with backdoors classifies samples with a certain trigger as one target category, while shows a slight decrease in inference accuracy to clean samples. In this work, we empirically demonstrate that Lottery Ticket models are equally vulnerable to backdoor attacks as the original dense models, and backdoor attacks can influence the structure of extracted tickets. Based on tickets' similarities between each other, we provide a feasible defense for federated learning against backdoor attacks on various datasets.

[12]  arXiv:2109.10535 [pdf, other]
Title: Cramér-Rao bound-informed training of neural networks for quantitative MRI
Comments: Xiaoxia Zhang, Quentin Duchemin, and Kangning Liu contributed equally to this work
Subjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)

Neural networks are increasingly used to estimate parameters in quantitative MRI, in particular in magnetic resonance fingerprinting. Their advantages over the gold standard non-linear least square fitting are their superior speed and their immunity to the non-convexity of many fitting problems. We find, however, that in heterogeneous parameter spaces, i.e. in spaces in which the variance of the estimated parameters varies considerably, good performance is hard to achieve and requires arduous tweaking of the loss function, hyper parameters, and the distribution of the training data in parameter space. Here, we address these issues with a theoretically well-founded loss function: the Cram\'er-Rao bound (CRB) provides a theoretical lower bound for the variance of an unbiased estimator and we propose to normalize the squared error with respective CRB. With this normalization, we balance the contributions of hard-to-estimate and not-so-hard-to-estimate parameters and areas in parameter space, and avoid a dominance of the former in the overall training loss. Further, the CRB-based loss function equals one for a maximally-efficient unbiased estimator, which we consider the ideal estimator. Hence, the proposed CRB-based loss function provides an absolute evaluation metric. We compare a network trained with the CRB-based loss with a network trained with the commonly used means squared error loss and demonstrate the advantages of the former in numerical, phantom, and in vivo experiments.

[13]  arXiv:2109.10538 [pdf, other]
Title: Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings
Comments: International Conference on Big Data Visual Analytics (ICBDVA), Venice, Italy, August 12-13 2021 this https URL Best paper award
Journal-ref: International Journal of Computer and Systems Engineering (2021), 15(8), 500 - 512
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

$t$-SNE is an embedding method that the data science community has widely Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. $t$-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding.
This algorithm is non-parametric, therefore two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. An approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift.
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding' match. The proposed algorithm has the same complexity than the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets' dynamics.

[14]  arXiv:2109.10552 [pdf, other]
Title: MEPG: A Minimalist Ensemble Policy Gradient Framework for Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Ensemble reinforcement learning (RL) aims to mitigate instability in Q-learning and to learn a robust policy, which introduces multiple value and policy functions. In this paper, we consider finding a novel but simple ensemble Deep RL algorithm to solve the resource consumption issue. Specifically, we consider integrating multiple models into a single model. To this end, we propose the \underline{M}inimalist \underline{E}nsemble \underline{P}olicy \underline{G}radient framework (MEPG), which introduces minimalist ensemble consistent Bellman update. And we find one value network is sufficient in our framework. Moreover, we theoretically show that the policy evaluation phase in the MEPG is mathematically equivalent to a deep Gaussian Process. To verify the effectiveness of the MEPG framework, we conduct experiments on the gym simulator, which show that the MEPG framework matches or outperforms the state-of-the-art ensemble methods and model-free methods without additional computational resource costs.

[15]  arXiv:2109.10569 [pdf, other]
Title: The Curse Revisited: a Newly Quantified Concept of Meaningful Distances for Learning from High-Dimensional Noisy Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Distances between data points are widely used in point cloud representation learning. Yet, it is no secret that under the effect of noise, these distances-and thus the models based upon them-may lose their usefulness in high dimensions. Indeed, the small marginal effects of the noise may then accumulate quickly, shifting empirical closest and furthest neighbors away from the ground truth. In this paper, we characterize such effects in high-dimensional data using an asymptotic probabilistic expression. Furthermore, while it has been previously argued that neighborhood queries become meaningless and unstable when there is a poor relative discrimination between the furthest and closest point, we conclude that this is not necessarily the case when explicitly separating the ground truth data from the noise. More specifically, we derive that under particular conditions, empirical neighborhood relations affected by noise are still likely to be true even when we observe this discrimination to be poor. We include thorough empirical verification of our results, as well as experiments that interestingly show our derived phase shift where neighbors become random or not is identical to the phase shift where common dimensionality reduction methods perform poorly or well for finding low-dimensional representations of high-dimensional data with dense noise.

[16]  arXiv:2109.10573 [pdf, other]
Title: An automatic differentiation system for the age of differential privacy
Comments: 8 pages
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

We introduce Tritium, an automatic differentiation-based sensitivity analysis framework for differentially private (DP) machine learning (ML). Optimal noise calibration in this setting requires efficient Jacobian matrix computations and tight bounds on the L2-sensitivity. Our framework achieves these objectives by relying on a functional analysis-based method for sensitivity tracking, which we briefly outline. This approach interoperates naturally and seamlessly with static graph-based automatic differentiation, which enables order-of-magnitude improvements in compilation times compared to previous work. Moreover, we demonstrate that optimising the sensitivity of the entire computational graph at once yields substantially tighter estimates of the true sensitivity compared to interval bound propagation techniques. Our work naturally befits recent developments in DP such as individual privacy accounting, aiming to offer improved privacy-utility trade-offs, and represents a step towards the integration of accessible machine learning tooling with advanced privacy accounting systems.

[17]  arXiv:2109.10591 [pdf, other]
Title: High-dimensional Bayesian Optimization for CNN Auto Pruning with Clustering and Rollback
Comments: 7 pages with 1 page for references
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Pruning has been widely used to slim convolutional neural network (CNN) models to achieve a good trade-off between accuracy and model size so that the pruned models become feasible for power-constrained devices such as mobile phones. This process can be automated to avoid the expensive hand-crafted efforts and to explore a large pruning space automatically so that the high-performance pruning policy can be achieved efficiently. Nowadays, reinforcement learning (RL) and Bayesian optimization (BO)-based auto pruners are widely used due to their solid theoretical foundation, universality, and high compressing quality. However, the RL agent suffers from long training times and high variance of results, while the BO agent is time-consuming for high-dimensional design spaces. In this work, we propose an enhanced BO agent to obtain significant acceleration for auto pruning in high-dimensional design spaces. To achieve this, a novel clustering algorithm is proposed to reduce the dimension of the design space to speedup the searching process. Then, a roll-back algorithm is proposed to recover the high-dimensional design space so that higher pruning accuracy can be obtained. We validate our proposed method on ResNet, MobileNet, and VGG models, and our experiments show that the proposed method significantly improves the accuracy of BO when pruning very deep CNN models. Moreover, our method achieves lower variance and shorter time than the RL-based counterpart.

[18]  arXiv:2109.10593 [pdf, other]
Title: Emulating Aerosol Microphysics with a Machine Learning
Subjects: Machine Learning (cs.LG)

Aerosol particles play an important role in the climate system by absorbing and scattering radiation and influencing cloud properties. They are also one of the biggest sources of uncertainty for climate modeling. Many climate models do not include aerosols in sufficient detail. In order to achieve higher accuracy, aerosol microphysical properties and processes have to be accounted for. This is done in the ECHAM-HAM global climate aerosol model using the M7 microphysics model, but increased computational costs make it very expensive to run at higher resolutions or for a longer time. We aim to use machine learning to approximate the microphysics model at sufficient accuracy and reduce the computational cost by being fast at inference time. The original M7 model is used to generate data of input-output pairs to train a neural network on it. By using a special logarithmic transform we are able to learn the variables tendencies achieving an average $R^2$ score of $89\%$. On a GPU we achieve a speed-up of 120 compared to the original model.

[19]  arXiv:2109.10596 [pdf, other]
Title: Fully probabilistic design for knowledge fusion between Bayesian filters under uniform disturbances
Authors: Lenka Kuklišová Pavelková (1), Ladislav Jirsa (1), Anthony Quinn (1 and 2) ((1) Czech Academy of Sciences, Institute of Information Theory and Automation, Czech Republic, (2) Trinity College Dublin, the University of Dublin, Ireland)
Comments: 39 pages
Subjects: Machine Learning (cs.LG)

This paper considers the problem of Bayesian transfer learning-based knowledge fusion between linear state-space processes driven by uniform state and observation noise processes. The target task conditions on probabilistic state predictor(s) supplied by the source filtering task(s) to improve its own state estimate. A joint model of the target and source(s) is not required and is not elicited. The resulting decision-making problem for choosing the optimal conditional target filtering distribution under incomplete modelling is solved via fully probabilistic design (FPD), i.e. via appropriate minimization of Kullback-Leibler divergence (KLD). The resulting FPD-optimal target learner is robust, in the sense that it can reject poor-quality source knowledge. In addition, the fact that this Bayesian transfer learning (BTL) scheme does not depend on a model of interaction between the source and target tasks ensures robustness to the misspecification of such a model. The latter is a problem that affects conventional transfer learning methods. The properties of the proposed BTL scheme are demonstrated via extensive simulations, and in comparison with two contemporary alternatives.

[20]  arXiv:2109.10598 [pdf, other]
Title: Diarisation using Location tracking with agglomerative clustering
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes to relax this assumption, by explicitly modelling the movements of speakers within an Agglomerative Hierarchical Clustering (AHC) diarisation framework. Kalman filters, which track the locations of speakers, are used to compute log-likelihood ratios that contribute to the cluster affinity computations for the AHC merging and stopping decisions. Experiments show that the proposed approach is able to yield improvements on a Microsoft rich meeting transcription task, compared to methods that do not use location information or that make stationarity assumptions.

[21]  arXiv:2109.10640 [pdf, other]
Title: LDC-VAE: A Latent Distribution Consistency Approach to Variational AutoEncoders
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)

Variational autoencoders (VAEs), as an important aspect of generative models, have received a lot of research interests and reached many successful applications. However, it is always a challenge to achieve the consistency between the learned latent distribution and the prior latent distribution when optimizing the evidence lower bound (ELBO), and finally leads to an unsatisfactory performance in data generation. In this paper, we propose a latent distribution consistency approach to avoid such substantial inconsistency between the posterior and prior latent distributions in ELBO optimizing. We name our method as latent distribution consistency VAE (LDC-VAE). We achieve this purpose by assuming the real posterior distribution in latent space as a Gibbs form, and approximating it by using our encoder. However, there is no analytical solution for such Gibbs posterior in approximation, and traditional approximation ways are time consuming, such as using the iterative sampling-based MCMC. To address this problem, we use the Stein Variational Gradient Descent (SVGD) to approximate the Gibbs posterior. Meanwhile, we use the SVGD to train a sampler net which can obtain efficient samples from the Gibbs posterior. Comparative studies on the popular image generation datasets show that our method has achieved comparable or even better performance than several powerful improvements of VAEs.

[22]  arXiv:2109.10642 [pdf, other]
Title: Decentralized Learning of Tree-Structured Gaussian Graphical Models from Noisy Data
Authors: Akram Hussain
Comments: 32 pages, there are more authors of this paper
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

This paper studies the decentralized learning of tree-structured Gaussian graphical models (GGMs) from noisy data. In decentralized learning, data set is distributed across different machines (sensors), and GGMs are widely used to model complex networks such as gene regulatory networks and social networks. The proposed decentralized learning uses the Chow-Liu algorithm for estimating the tree-structured GGM.
In previous works, upper bounds on the probability of incorrect tree structure recovery were given mostly without any practical noise for simplification. While this paper investigates the effects of three common types of noisy channels: Gaussian, Erasure, and binary symmetric channel. For Gaussian channel case, to satisfy the failure probability upper bound $\delta > 0$ in recovering a $d$-node tree structure, our proposed theorem requires only $\mathcal{O}(\log(\frac{d}{\delta}))$ samples for the smallest sample size ($n$) comparing to the previous literature \cite{Nikolakakis} with $\mathcal{O}(\log^4(\frac{d}{\delta}))$ samples by using the positive correlation coefficient assumption that is used in some important works in the literature. Moreover, the approximately bounded Gaussian random variable assumption does not appear in \cite{Nikolakakis}. Given some knowledge about the tree structure, the proposed Algorithmic Bound will achieve obviously better performance with small sample size (e.g., $< 2000$) comparing with formulaic bounds. Finally, we validate our theoretical results by performing simulations on synthetic data sets.

[23]  arXiv:2109.10683 [pdf, other]
Title: Adaptive Neural Message Passing for Inductive Learning on Hypergraphs
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM)

Graphs are the most ubiquitous data structures for representing relational datasets and performing inferences in them. They model, however, only pairwise relations between nodes and are not designed for encoding the higher-order relations. This drawback is mitigated by hypergraphs, in which an edge can connect an arbitrary number of nodes. Most hypergraph learning approaches convert the hypergraph structure to that of a graph and then deploy existing geometric deep learning methods. This transformation leads to information loss, and sub-optimal exploitation of the hypergraph's expressive power. We present HyperMSG, a novel hypergraph learning framework that uses a modular two-level neural message passing strategy to accurately and efficiently propagate information within each hyperedge and across the hyperedges. HyperMSG adapts to the data and task by learning an attention weight associated with each node's degree centrality. Such a mechanism quantifies both local and global importance of a node, capturing the structural properties of a hypergraph. HyperMSG is inductive, allowing inference on previously unseen nodes. Further, it is robust and outperforms state-of-the-art hypergraph learning methods on a wide range of tasks and datasets. Finally, we demonstrate the effectiveness of HyperMSG in learning multimodal relations through detailed experimentation on a challenging multimedia dataset.

[24]  arXiv:2109.10696 [pdf, other]
Title: CC-Cert: A Probabilistic Approach to Certify General Robustness of Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks -- small modifications of the input that change the predictions. Besides rigorously studied $\ell_p$-bounded additive perturbations, recently proposed semantic perturbations (e.g. rotation, translation) raise a serious concern on deploying ML systems in real-world. Therefore, it is important to provide provable guarantees for deep learning models against semantically meaningful input transformations. In this paper, we propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds that can be used in general attack settings. We estimate the probability of a model to fail if the attack is sampled from a certain distribution. Our theoretical findings are supported by experimental results on different datasets.

[25]  arXiv:2109.10736 [pdf, other]
Title: Estimation Error Correction in Deep Reinforcement Learning for Deterministic Actor-Critic Methods
Comments: Accepted at ICTAI 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

In value-based deep reinforcement learning methods, approximation of value functions induces overestimation bias and leads to suboptimal policies. We show that in deep actor-critic methods that aim to overcome the overestimation bias, if the reinforcement signals received by the agent have a high variance, a significant underestimation bias arises. To minimize the underestimation, we introduce a parameter-free, novel deep Q-learning variant. Our Q-value update rule combines the notions behind Clipped Double Q-learning and Maxmin Q-learning by computing the critic objective through the nested combination of maximum and minimum operators to bound the approximate value estimates. We evaluate our modification on the suite of several OpenAI Gym continuous control tasks, improving the state-of-the-art in every environment tested.

[26]  arXiv:2109.10757 [pdf, other]
Title: Unsupervised Movement Detection in Indoor Positioning Systems
Subjects: Machine Learning (cs.LG); Applications (stat.AP)

In recent years, the usage of indoor positioning systems for manufacturing processes became increasingly popular. Typically, the production hall is equipped with satellites which receive position data of sensors that can be pinned on components, load carriers or industrial trucks. This enables a company e.g. to reduce search efforts and to optimize individual system processes. In our research context, a sensor only sends position information when it is moved. However, various circumstances frequently affect that data is undesirably sent, e.g. due to disrupting factors nearby. This has a negative impact on the data quality, the energy consumption, and the reliability of the whole system. Motivated by this, we aim to distinguish between actual movements and signals that were undesirably sent which is in particular challenging due to the susceptibility of indoor systems in terms of noise and measuring errors. Therefore, we propose two novel unsupervised classification algorithms suitable for this task. Depending on the question of interest, they rely either on a distance-based or on a time-based criterion, which allows to make use of all essential information. Furthermore, we propose an approach to combine both classifications and to aggregate them on spatial production areas. This enables us to generate a comprehensive map of the underlying production hall with the sole usage of the position data. Aside from the analysis and detection of the underlying movement structure, the user benefits from a better understanding of own system processes and from the detection of problematic system areas which leads to a more efficient usage of positioning systems. Since all our approaches are constructed with unsupervised techniques, they are handily applicable in practice and do not require more information than the output data of the positioning system.

[27]  arXiv:2109.10770 [pdf, ps, other]
Title: Exploring Adversarial Examples for Efficient Active Learning in Machine Learning Classifiers
Subjects: Machine Learning (cs.LG)

Machine learning researchers have long noticed the phenomenon that the model training process will be more effective and efficient when the training samples are densely sampled around the underlying decision boundary. While this observation has already been widely applied in a range of machine learning security techniques, it lacks theoretical analyses of the correctness of the observation. To address this challenge, we first add particular perturbation to original training examples using adversarial attack methods so that the generated examples could lie approximately on the decision boundary of the ML classifiers. We then investigate the connections between active learning and these particular training examples. Through analyzing various representative classifiers such as k-NN classifiers, kernel methods as well as deep neural networks, we establish a theoretical foundation for the observation. As a result, our theoretical proofs provide support to more efficient active learning methods with the help of adversarial examples, contrary to previous works where adversarial examples are often used as destructive solutions. Experimental results show that the established theoretical foundation will guide better active learning strategies based on adversarial examples.

[28]  arXiv:2109.10781 [pdf, other]
Title: Introducing Symmetries to Black Box Meta Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems. We hypothesise that these symmetries can play an important role in meta-generalisation. Building off recent work in black-box supervised meta learning, we develop a black-box meta RL system that exhibits these same symmetries. We show through careful experimentation that incorporating these symmetries can lead to algorithms with a greater ability to generalise to unseen action & observation spaces, tasks, and environments.

[29]  arXiv:2109.10795 [pdf, other]
Title: Neural network relief: a pruning algorithm based on neural activity
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Current deep neural networks (DNNs) are overparameterized and use most of their neuronal connections during inference for each task. The human brain, however, developed specialized regions for different tasks and performs inference with a small fraction of its neuronal connections. We propose an iterative pruning strategy introducing a simple importance-score metric that deactivates unimportant connections, tackling overparameterization in DNNs and modulating the firing patterns. The aim is to find the smallest number of connections that is still capable of solving a given task with comparable accuracy, i.e. a simpler subnetwork. We achieve comparable performance for LeNet architectures on MNIST, and significantly higher parameter compression than state-of-the-art algorithms for VGG and ResNet architectures on CIFAR-10/100 and Tiny-ImageNet. Our approach also performs well for the two different optimizers considered -- Adam and SGD. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations, although it performs reasonably when compared to the state of the art.

[30]  arXiv:2109.10797 [pdf, ps, other]
Title: Improved Multi-label Classification with Frequent Label-set Mining and Association
Subjects: Machine Learning (cs.LG)

Multi-label (ML) data deals with multiple classes associated with individual samples at the same time. This leads to the co-occurrence of several classes repeatedly, which indicates some existing correlation among them. In this article, the correlation among classes has been explored to improve the classification performance of existing ML classifiers. A novel approach of frequent label-set mining has been proposed to extract these correlated classes from the label-sets of the data. Both co-presence (CP) and co-absence (CA) of classes have been taken into consideration. The rules mined from the ML data has been further used to incorporate class correlation information into existing ML classifiers. The soft scores generated by an ML classifier are modified through a novel approach using the CP-CA rules. A concept of certain and uncertain scores has been defined here, where the proposed method aims to improve the uncertain scores with the help of the certain scores and their corresponding CP-CA rules. This has been experimentally analysed on ten ML datasets for three ML existing classifiers which shows substantial improvement in their overall performance.

[31]  arXiv:2109.10803 [pdf, other]
Title: Multi-Slice Clustering for 3-order Tensor Data
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Several methods of triclustering of three dimensional data require the specification of the cluster size in each dimension. This introduces a certain degree of arbitrariness. To address this issue, we propose a new method, namely the multi-slice clustering (MSC) for a 3-order tensor data set. We analyse, in each dimension or tensor mode, the spectral decomposition of each tensor slice, i.e. a matrix. Thus, we define a similarity measure between matrix slices up to a threshold (precision) parameter, and from that, identify a cluster. The intersection of all partial clusters provides the desired triclustering. The effectiveness of our algorithm is shown on both synthetic and real-world data sets.

[32]  arXiv:2109.10813 [pdf, other]
Title: A Workflow for Offline Model-Free Robotic Reinforcement Learning
Comments: CoRL 2021. Project Website: this https URL First two authors contributed equally
Subjects: Machine Learning (cs.LG)

Offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. This can allow robots to acquire generalizable skills from large and diverse datasets, without any costly or unsafe online data collection. Despite recent algorithmic advances in offline RL, applying these methods to real-world problems has proven challenging. Although offline RL methods can learn from prior data, there is no clear and well-understood process for making various design choices, from model architecture to algorithm hyperparameters, without actually evaluating the learned policies online. In this paper, our aim is to develop a practical workflow for using offline RL analogous to the relatively well-understood workflows for supervised learning problems. To this end, we devise a set of metrics and conditions that can be tracked over the course of offline training, and can inform the practitioner about how the algorithm and model architecture should be adjusted to improve final performance. Our workflow is derived from a conceptual understanding of the behavior of conservative offline RL algorithms and cross-validation in supervised learning. We demonstrate the efficacy of this workflow in producing effective policies without any online tuning, both in several simulated robotic learning scenarios and for three tasks on two distinct real robots, focusing on learning manipulation skills with raw image observations with sparse binary rewards. Explanatory video and additional results can be found at sites.google.com/view/offline-rl-workflow

[33]  arXiv:2109.10817 [pdf, other]
Title: Causal Inference in Non-linear Time-series usingDeep Networks and Knockoff Counterfactuals
Journal-ref: IEEE International Conference on Machine Learning and Applications (ICMLA) 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Estimating causal relations is vital in understanding the complex interactions in multivariate time series. Non-linear coupling of variables is one of the major challenges inaccurate estimation of cause-effect relations. In this paper, we propose to use deep autoregressive networks (DeepAR) in tandem with counterfactual analysis to infer nonlinear causal relations in multivariate time series. We extend the concept of Granger causality using probabilistic forecasting with DeepAR. Since deep networks can neither handle missing input nor out-of-distribution intervention, we propose to use the Knockoffs framework (Barberand Cand`es, 2015) for generating intervention variables and consequently counterfactual probabilistic forecasting. Knockoff samples are independent of their output given the observed variables and exchangeable with their counterpart variables without changing the underlying distribution of the data. We test our method on synthetic as well as real-world time series datasets. Overall our method outperforms the widely used vector autoregressive Granger causality and PCMCI in detecting nonlinear causal dependency in multivariate time series.

[34]  arXiv:2109.10824 [pdf, other]
Title: Learning by Examples Based on Multi-level Optimization
Subjects: Machine Learning (cs.LG)

Learning by examples, which learns to solve a new problem by looking into how similar problems are solved, is an effective learning method in human learning. When a student learns a new topic, he/she finds out exemplar topics that are similar to this new topic and studies the exemplar topics to deepen the understanding of the new topic. We aim to investigate whether this powerful learning skill can be borrowed from humans to improve machine learning as well. In this work, we propose a novel learning approach called Learning By Examples (LBE). Our approach automatically retrieves a set of training examples that are similar to query examples and predicts labels for query examples by using class labels of the retrieved examples. We propose a three-level optimization framework to formulate LBE which involves three stages of learning: learning a Siamese network to retrieve similar examples; learning a matching network to make predictions on query examples by leveraging class labels of retrieved similar examples; learning the ``ground-truth'' similarities between training examples by minimizing the validation loss. We develop an efficient algorithm to solve the LBE problem and conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method on both supervised and few-shot learning.

[35]  arXiv:2109.10847 [pdf, other]
Title: Small-Bench NLP: Benchmark for small single GPU trained models in Natural Language Processing
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Recent progress in the Natural Language Processing domain has given us several State-of-the-Art (SOTA) pretrained models which can be finetuned for specific tasks. These large models with billions of parameters trained on numerous GPUs/TPUs over weeks are leading in the benchmark leaderboards. In this paper, we discuss the need for a benchmark for cost and time effective smaller models trained on a single GPU. This will enable researchers with resource constraints experiment with novel and innovative ideas on tokenization, pretraining tasks, architecture, fine tuning methods etc. We set up Small-Bench NLP, a benchmark for small efficient neural language models trained on a single GPU. Small-Bench NLP benchmark comprises of eight NLP tasks on the publicly available GLUE datasets and a leaderboard to track the progress of the community. Our ELECTRA-DeBERTa (15M parameters) small model architecture achieves an average score of 81.53 which is comparable to that of BERT-Base's 82.20 (110M parameters). Our models, code and leaderboard are available at https://github.com/smallbenchnlp

[36]  arXiv:2109.10888 [pdf, other]
Title: Quantifying Model Predictive Uncertainty with Perturbation Theory
Comments: 16 pages, 12 figures, 4 tables. arXiv admin note: text overlap with arXiv:2103.01374
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)

We propose a framework for predictive uncertainty quantification of a neural network that replaces the conventional Bayesian notion of weight probability density function (PDF) with a physics based potential field representation of the model weights in a Gaussian reproducing kernel Hilbert space (RKHS) embedding. This allows us to use perturbation theory from quantum physics to formulate a moment decomposition problem over the model weight-output relationship. The extracted moments reveal successive degrees of regularization of the weight potential field around the local neighborhood of the model output. Such localized moments represent well the PDF tails and provide significantly greater accuracy of the model's predictive uncertainty than the central moments characterized by Bayesian and ensemble methods or their variants. We show that this consequently leads to a better ability to detect false model predictions of test data that has undergone a covariate shift away from the training PDF learned by the model. We evaluate our approach against baseline uncertainty quantification methods on several benchmark datasets that are corrupted using common distortion techniques. Our approach provides fast model predictive uncertainty estimates with much greater precision and calibration.

[37]  arXiv:2109.10896 [pdf, other]
Title: Updating Embeddings for Dynamic Knowledge Graphs
Subjects: Machine Learning (cs.LG)

Data in Knowledge Graphs often represents part of the current state of the real world. Thus, to stay up-to-date the graph data needs to be updated frequently. To utilize information from Knowledge Graphs, many state-of-the-art machine learning approaches use embedding techniques. These techniques typically compute an embedding, i.e., vector representations of the nodes as input for the main machine learning algorithm. If a graph update occurs later on -- specifically when nodes are added or removed -- the training has to be done all over again. This is undesirable, because of the time it takes and also because downstream models which were trained with these embeddings have to be retrained if they change significantly. In this paper, we investigate embedding updates that do not require full retraining and evaluate them in combination with various embedding models on real dynamic Knowledge Graphs covering multiple use cases. We study approaches that place newly appearing nodes optimally according to local information, but notice that this does not work well. However, we find that if we continue the training of the old embedding, interleaved with epochs during which we only optimize for the added and removed parts, we obtain good results in terms of typical metrics used in link prediction. This performance is obtained much faster than with a complete retraining and hence makes it possible to maintain embeddings for dynamic Knowledge Graphs.

Cross-lists for Thu, 23 Sep 21

[38]  arXiv:2109.10360 (cross-list from astro-ph.CO) [pdf, other]
Title: Robust marginalization of baryonic effects for cosmological inference at the field level
Comments: 7 pages, 4 figures. Second paper of a series of four. The 2D maps, codes, and network weights used in this paper are publicly available at this https URL
Subjects: Cosmology and Nongalactic Astrophysics (astro-ph.CO); Astrophysics of Galaxies (astro-ph.GA); Instrumentation and Methods for Astrophysics (astro-ph.IM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We train neural networks to perform likelihood-free inference from $(25\,h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project. We show that the networks can extract information beyond one-point functions and power spectra from all resolved scales ($\gtrsim 100\,h^{-1}{\rm kpc}$) while performing a robust marginalization over baryonic physics at the field level: the model can infer the value of $\Omega_{\rm m} (\pm 4\%)$ and $\sigma_8 (\pm 2.5\%)$ from simulations completely different to the ones used to train it.

[39]  arXiv:2109.10376 (cross-list from cs.NE) [pdf, other]
Title: Learning through structure: towards deep neuromorphic knowledge graph embeddings
Comments: Accepted for publication at the International Conference on Neuromorphic Computing (ICNC 2021)
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)

Computing latent representations for graph-structured data is an ubiquitous learning task in many industrial and academic applications ranging from molecule synthetization to social network analysis and recommender systems. Knowledge graphs are among the most popular and widely used data representations related to the Semantic Web. Next to structuring factual knowledge in a machine-readable format, knowledge graphs serve as the backbone of many artificial intelligence applications and allow the ingestion of context information into various learning algorithms. Graph neural networks attempt to encode graph structures in low-dimensional vector spaces via a message passing heuristic between neighboring nodes. Over the recent years, a multitude of different graph neural network architectures demonstrated ground-breaking performances in many learning tasks. In this work, we propose a strategy to map deep graph learning architectures for knowledge graph reasoning to neuromorphic architectures. Based on the insight that randomly initialized and untrained (i.e., frozen) graph neural networks are able to preserve local graph structures, we compose a frozen neural network with shallow knowledge graph embedding models. We experimentally show that already on conventional computing hardware, this leads to a significant speedup and memory reduction while maintaining a competitive performance level. Moreover, we extend the frozen architecture to spiking neural networks, introducing a novel, event-based and highly sparse knowledge graph embedding algorithm that is suitable for implementation in neuromorphic hardware.

[40]  arXiv:2109.10393 (cross-list from cs.CV) [pdf, other]
Title: Towards a Real-Time Facial Analysis System
Comments: Accepted in IEEE MMSP 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Facial analysis is an active research area in computer vision, with many practical applications. Most of the existing studies focus on addressing one specific task and maximizing its performance. For a complete facial analysis system, one needs to solve these tasks efficiently to ensure a smooth experience. In this work, we present a system-level design of a real-time facial analysis system. With a collection of deep neural networks for object detection, classification, and regression, the system recognizes age, gender, facial expression, and facial similarity for each person that appears in the camera view. We investigate the parallelization and interplay of individual tasks. Results on common off-the-shelf architecture show that the system's accuracy is comparable to the state-of-the-art methods, and the recognition speed satisfies real-time requirements. Moreover, we propose a multitask network for jointly predicting the first three attributes, i.e., age, gender, and facial expression. Source code and trained models are available at https://github.com/mahehu/TUT-live-age-estimator.

[41]  arXiv:2109.10399 (cross-list from physics.ao-ph) [pdf, other]
Title: Learned Benchmarks for Subseasonal Forecasting
Comments: 15 pages of main paper and 18 pages of appendix text
Subjects: Atmospheric and Oceanic Physics (physics.ao-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)

We develop a subseasonal forecasting toolkit of simple learned benchmark models that outperform both operational practice and state-of-the-art machine learning and deep learning methods. Our new models include (a) Climatology++, an adaptive alternative to climatology that, for precipitation, is 9% more accurate and 250% more skillful than the United States operational Climate Forecasting System (CFSv2); (b) CFSv2++, a learned CFSv2 correction that improves temperature and precipitation accuracy by 7-8% and skill by 50-275%; and (c) Persistence++, an augmented persistence model that combines CFSv2 forecasts with lagged measurements to improve temperature and precipitation accuracy by 6-9% and skill by 40-130%. Across the contiguous U.S., our Climatology++, CFSv2++, and Persistence++ toolkit consistently outperforms standard meteorological baselines, state-of-the-art machine and deep learning methods, and the European Centre for Medium-Range Weather Forecasts ensemble. Overall, we find that augmenting traditional forecasting approaches with learned enhancements yields an effective and computationally inexpensive strategy for building the next generation of subseasonal forecasting benchmarks.

[42]  arXiv:2109.10404 (cross-list from eess.SP) [pdf]
Title: Digital Signal Processing Using Deep Neural Networks
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Currently there is great interest in the utility of deep neural networks (DNNs) for the physical layer of radio frequency (RF) communications. In this manuscript, we describe a custom DNN specially designed to solve problems in the RF domain. Our model leverages the mechanisms of feature extraction and attention through the combination of an autoencoder convolutional network with a transformer network, to accomplish several important communications network and digital signals processing (DSP) tasks. We also present a new open dataset and physical data augmentation model that enables training of DNNs that can perform automatic modulation classification, infer and correct transmission channel effects, and directly demodulate baseband RF signals.

[43]  arXiv:2109.10410 (cross-list from cs.CL) [pdf, other]
Title: RETRONLU: Retrieval Augmented Task-Oriented Semantic Parsing
Comments: 12 pages, 9 figures, 5 Tables
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

While large pre-trained language models accumulate a lot of knowledge in their parameters, it has been demonstrated that augmenting it with non-parametric retrieval-based memory has a number of benefits from accuracy improvements to data efficiency for knowledge-focused tasks, such as question answering. In this paper, we are applying retrieval-based modeling ideas to the problem of multi-domain task-oriented semantic parsing for conversational assistants. Our approach, RetroNLU, extends a sequence-to-sequence model architecture with a retrieval component, used to fetch existing similar examples and provide them as an additional input to the model. In particular, we analyze two settings, where we augment an input with (a) retrieved nearest neighbor utterances (utterance-nn), and (b) ground-truth semantic parses of nearest neighbor utterances (semparse-nn). Our technique outperforms the baseline method by 1.5% absolute macro-F1, especially at the low resource setting, matching the baseline model accuracy with only 40% of the data. Furthermore, we analyze the nearest neighbor retrieval component's quality, model sensitivity and break down the performance for semantic parses of different utterance complexity.

[44]  arXiv:2109.10452 (cross-list from stat.ML) [pdf, other]
Title: Personalized Online Machine Learning
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization. Namely, POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized (i.e., optimization with respect to baseline covariate subject ID) to many individuals (i.e., optimization with respect to common baseline covariates). As an online algorithm, POSL learns in real-time. POSL can leverage a diversity of candidate algorithms, including online algorithms with different training and update times, fixed algorithms that are never updated during the procedure, pooled algorithms that learn from many individuals' time-series, and individualized algorithms that learn from within a single time-series. POSL's ensembling of this hybrid of base learning strategies depends on the amount of data collected, the stationarity of the time-series, and the mutual characteristics of a group of time-series. In essence, POSL decides whether to learn across samples, through time, or both, based on the underlying (unknown) structure in the data. For a wide range of simulations that reflect realistic forecasting scenarios, and in a medical data application, we examine the performance of POSL relative to other current ensembling and online learning methods. We show that POSL is able to provide reliable predictions for time-series data and adjust to changing data-generating environments. We further cultivate POSL's practicality by extending it to settings where time-series enter/exit dynamically over chronological time.

[45]  arXiv:2109.10462 (cross-list from cs.SI) [pdf, other]
Title: A Hierarchical Network-Oriented Analysis of User Participation in Misinformation Spread on WhatsApp
Comments: Paper Accepted in Information Processing & Management, Elsevier
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Computation (stat.CO)

WhatsApp emerged as a major communication platform in many countries in the recent years. Despite offering only one-to-one and small group conversations, WhatsApp has been shown to enable the formation of a rich underlying network, crossing the boundaries of existing groups, and with structural properties that favor information dissemination at large. Indeed, WhatsApp has reportedly been used as a forum of misinformation campaigns with significant social, political and economic consequences in several countries. In this article, we aim at complementing recent studies on misinformation spread on WhatsApp, mostly focused on content properties and propagation dynamics, by looking into the network that connects users sharing the same piece of content. Specifically, we present a hierarchical network-oriented characterization of the users engaged in misinformation spread by focusing on three perspectives: individuals, WhatsApp groups and user communities, i.e., groupings of users who, intentionally or not, share the same content disproportionately often. By analyzing sharing and network topological properties, our study offers valuable insights into how WhatsApp users leverage the underlying network connecting different groups to gain large reach in the spread of misinformation on the platform.

[46]  arXiv:2109.10465 (cross-list from cs.CL) [pdf, other]
Title: Scalable and Efficient MoE Training for Multitask Multilingual Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

The Mixture of Experts (MoE) models are an emerging class of sparsely activated deep learning models that have sublinear compute costs with respect to their parameters. In contrast with dense models, the sparse architecture of MoE offers opportunities for drastically growing model size with significant accuracy gain while consuming much lower compute budget. However, supporting large scale MoE training also has its own set of system and modeling challenges. To overcome the challenges and embrace the opportunities of MoE, we first develop a system capable of scaling MoE models efficiently to trillions of parameters. It combines multi-dimensional parallelism and heterogeneous memory technologies harmoniously with MoE to empower 8x larger models on the same hardware compared with existing work. Besides boosting system efficiency, we also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve inference time efficiency. By combining the efficient system and training methods, we are able to significantly scale up large multitask multilingual models for language generation which results in a great improvement in model accuracy. A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks. The system support of efficient MoE training has been implemented and open-sourced with the DeepSpeed library.

[47]  arXiv:2109.10471 (cross-list from cs.CY) [pdf, other]
Title: The First Vision For Vitals (V4V) Challenge for Non-Contact Video-Based Physiological Estimation
Comments: ICCVw'21. V4V Dataset and Challenge: this https URL
Subjects: Computers and Society (cs.CY); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

Telehealth has the potential to offset the high demand for help during public health emergencies, such as the COVID-19 pandemic. Remote Photoplethysmography (rPPG) - the problem of non-invasively estimating blood volume variations in the microvascular tissue from video - would be well suited for these situations. Over the past few years a number of research groups have made rapid advances in remote PPG methods for estimating heart rate from digital video and obtained impressive results. How these various methods compare in naturalistic conditions, where spontaneous behavior, facial expressions, and illumination changes are present, is relatively unknown. To enable comparisons among alternative methods, the 1st Vision for Vitals Challenge (V4V) presented a novel dataset containing high-resolution videos time-locked with varied physiological signals from a diverse population. In this paper, we outline the evaluation protocol, the data used, and the results. V4V is to be held in conjunction with the 2021 International Conference on Computer Vision.

[48]  arXiv:2109.10477 (cross-list from cs.CV) [pdf, other]
Title: Generating Compositional Color Representations from Text
Comments: Accepted as a full paper at CIKM 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Machine Learning (cs.LG)

We consider the cross-modal task of producing color representations for text phrases. Motivated by the fact that a significant fraction of user queries on an image search engine follow an (attribute, object) structure, we propose a generative adversarial network that generates color profiles for such bigrams. We design our pipeline to learn composition - the ability to combine seen attributes and objects to unseen pairs. We propose a novel dataset curation pipeline from existing public sources. We describe how a set of phrases of interest can be compiled using a graph propagation technique, and then mapped to images. While this dataset is specialized for our investigations on color, the method can be extended to other visual dimensions where composition is of interest. We provide detailed ablation studies that test the behavior of our GAN architecture with loss functions from the contrastive learning literature. We show that the generative model achieves lower Frechet Inception Distance than discriminative ones, and therefore predicts color profiles that better match those from real images. Finally, we demonstrate improved performance in image retrieval and classification, indicating the crucial role that color plays in these downstream tasks.

[49]  arXiv:2109.10478 (cross-list from cs.CV) [pdf, other]
Title: AI in Osteoporosis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this chapter we explore and evaluate methods for trabecular bone characterization and osteoporosis diagnosis with increased interest in sparse approximations. We first describe texture representation and classification techniques, patch-based methods such as Bag of Keypoints, and more recent deep neural networks. Then we introduce the concept of sparse representations for pattern recognition and we detail integrative sparse analysis methods and classifier decision fusion methods. We report cross-validation results on osteoporosis datasets of bone radiographs and compare the results produced by the different categories of methods. We conclude that advances in the AI and machine learning fields have enabled the development of methods that can be used as diagnostic tools in clinical settings.

[50]  arXiv:2109.10503 (cross-list from astro-ph.EP) [pdf, other]
Title: Identifying Potential Exomoon Signals with Convolutional Neural Networks
Comments: 14 pages, 13 figures, 1 table. Accepted for publication in Monthly Notices of the Royal Astronomical Society, 15 September 2021
Subjects: Earth and Planetary Astrophysics (astro-ph.EP); Machine Learning (cs.LG)

Targeted observations of possible exomoon host systems will remain difficult to obtain and time-consuming to analyze in the foreseeable future. As such, time-domain surveys such as Kepler, K2 and TESS will continue to play a critical role as the first step in identifying candidate exomoon systems, which may then be followed-up with premier ground- or space-based telescopes. In this work, we train an ensemble of convolutional neural networks (CNNs) to identify candidate exomoon signals in single-transit events observed by Kepler. Our training set consists of ${\sim}$27,000 examples of synthetic, planet-only and planet+moon single transits, injected into Kepler light curves. We achieve up to 88\% classification accuracy with individual CNN architectures and 97\% precision in identifying the moons in the validation set when the CNN ensemble is in total agreement. We then apply the CNN ensemble to light curves from 1880 Kepler Objects of Interest with periods $>10$ days ($\sim$57,000 individual transits), and further test the accuracy of the CNN classifier by injecting planet transits into each light curve, thus quantifying the extent to which residual stellar activity may result in false positive classifications. We find a small fraction of these transits contain moon-like signals, though we caution against strong inferences of the exomoon occurrence rate from this result. We conclude by discussing some ongoing challenges to utilizing neural networks for the exomoon search.

[51]  arXiv:2109.10506 (cross-list from cs.CL) [pdf, other]
Title: Tecnologica cosa: Modeling Storyteller Personalities in Boccaccio's Decameron
Comments: The 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (co-located with EMNLP 2021)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We explore Boccaccio's Decameron to see how digital humanities tools can be used for tasks that have limited data in a language no longer in contemporary use: medieval Italian. We focus our analysis on the question: Do the different storytellers in the text exhibit distinct personalities? To answer this question, we curate and release a dataset based on the authoritative edition of the text. We use supervised classification methods to predict storytellers based on the stories they tell, confirming the difficulty of the task, and demonstrate that topic modeling can extract thematic storyteller "profiles."

[52]  arXiv:2109.10509 (cross-list from cs.CL) [pdf, other]
Title: Unsupervised Contextualized Document Representation
Comments: 9 Pages, 4 Figures, 7 tables, SustaiNLP2021 @ EMNLP-2021
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)

Several NLP tasks need the effective representation of text documents. Arora et. al., 2017 demonstrate that simple weighted averaging of word vectors frequently outperforms neural models. SCDV (Mekala et. al., 2017) further extends this from sentences to documents by employing soft and sparse clustering over pre-computed word vectors. However, both techniques ignore the polysemy and contextual character of words. In this paper, we address this issue by proposing SCDV+BERT(ctxd), a simple and effective unsupervised representation that combines contextualized BERT (Devlin et al., 2019) based word embedding for word sense disambiguation with SCDV soft clustering approach. We show that our embeddings outperform original SCDV, pre-train BERT, and several other baselines on many classification datasets. We also demonstrate our embeddings effectiveness on other tasks, such as concept matching and sentence similarity. In addition, we show that SCDV+BERT(ctxd) outperforms fine-tune BERT and different embedding approaches in scenarios with limited data and only few shots examples.

[53]  arXiv:2109.10514 (cross-list from cs.CL) [pdf]
Title: Towards The Automatic Coding of Medical Transcripts to Improve Patient-Centered Communication
Comments: Society for Design and Process Science (SDPS) 2016
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This paper aims to provide an approach for automatic coding of physician-patient communication transcripts to improve patient-centered communication (PCC). PCC is a central part of high-quality health care. To improve PCC, dialogues between physicians and patients have been recorded and tagged with predefined codes. Trained human coders have manually coded the transcripts. Since it entails huge labor costs and poses possible human errors, automatic coding methods should be considered for efficiency and effectiveness. We adopted three machine learning algorithms (Na\"ive Bayes, Random Forest, and Support Vector Machine) to categorize lines in transcripts into corresponding codes. The result showed that there is evidence to distinguish the codes, and this is considered to be sufficient for training of human annotators.

[54]  arXiv:2109.10523 (cross-list from cs.SI) [pdf, other]
Title: Investigating and Modeling the Dynamics of Long Ties
Comments: 46 pages, 18 figures
Subjects: Social and Information Networks (cs.SI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Physics and Society (physics.soc-ph)

Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds.

[55]  arXiv:2109.10528 (cross-list from cs.CR) [pdf, other]
Title: A unified interpretation of the Gaussian mechanism for differential privacy through the sensitivity index
Comments: Under review at PETS 2022
Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG)

The Gaussian mechanism (GM) represents a universally employed tool for achieving differential privacy (DP), and a large body of work has been devoted to its analysis. We argue that the three prevailing interpretations of the GM, namely $(\varepsilon, \delta)$-DP, f-DP and R\'enyi DP can be expressed by using a single parameter $\psi$, which we term the sensitivity index. $\psi$ uniquely characterises the GM and its properties by encapsulating its two fundamental quantities: the sensitivity of the query and the magnitude of the noise perturbation. With strong links to the ROC curve and the hypothesis-testing interpretation of DP, $\psi$ offers the practitioner a powerful method for interpreting, comparing and communicating the privacy guarantees of Gaussian mechanisms.

[56]  arXiv:2109.10581 (cross-list from eess.SP) [pdf, other]
Title: Deep Augmented MUSIC Algorithm for Data-Driven DoA Estimation
Comments: Submitted to ICASSP2022
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

Direction of arrival (DoA) estimation is a crucial task in sensor array signal processing, giving rise to various successful model-based (MB) algorithms as well as recently developed data-driven (DD) methods. This paper introduces a new hybrid MB/DD DoA estimation architecture, based on the classical multiple signal classification (MUSIC) algorithm. Our approach augments crucial aspects of the original MUSIC structure with specifically designed neural architectures, allowing it to overcome certain limitations of the purely MB method, such as its inability to successfully localize coherent sources. The deep augmented MUSIC algorithm is shown to outperform its unaltered version with a superior resolution.

[57]  arXiv:2109.10617 (cross-list from cs.AI) [pdf, other]
Title: Solving Large Steiner Tree Problems in Graphs for Cost-Efficient Fiber-To-The-Home Network Expansion
Comments: Submitted to ICAART 2022, 10 pages, 18 figures
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

The expansion of Fiber-To-The-Home (FTTH) networks creates high costs due to expensive excavation procedures. Optimizing the planning process and minimizing the cost of the earth excavation work therefore lead to large savings. Mathematically, the FTTH network problem can be described as a minimum Steiner Tree problem. Even though the Steiner Tree problem has already been investigated intensively in the last decades, it might be further optimized with the help of new computing paradigms and emerging approaches. This work studies upcoming technologies, such as Quantum Annealing, Simulated Annealing and nature-inspired methods like Evolutionary Algorithms or slime-mold-based optimization. Additionally, we investigate partitioning and simplifying methods. Evaluated on several real-life problem instances, we could outperform a traditional, widely-used baseline (NetworkX Approximate Solver) on most of the domains. Prior partitioning of the initial graph and the presented slime-mold-based approach were especially valuable for a cost-efficient approximation. Quantum Annealing seems promising, but was limited by the number of available qubits.

[58]  arXiv:2109.10623 (cross-list from stat.ML) [pdf, ps, other]
Title: Sharp Analysis of Random Fourier Features in Classification
Authors: Zhu Li
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We study the theoretical properties of random Fourier features classification with Lipschitz continuous loss functions such as support vector machine and logistic regression. Utilizing the regularity condition, we show for the first time that random Fourier features classification can achieve $O(1/\sqrt{n})$ learning rate with only $\Omega(\sqrt{n} \log n)$ features, as opposed to $\Omega(n)$ features suggested by previous results. Our study covers the standard feature sampling method for which we reduce the number of features required, as well as a problem-dependent sampling method which further reduces the number of features while still keeping the optimal generalization property. Moreover, we prove that the random Fourier features classification can obtain a fast $O(1/n)$ learning rate for both sampling schemes under Massart's low noise assumption. Our results demonstrate the potential effectiveness of random Fourier features approximation in reducing the computational complexity (roughly from $O(n^3)$ in time and $O(n^2)$ in space to $O(n^2)$ and $O(n\sqrt{n})$ respectively) without having to trade-off the statistical prediction accuracy. In addition, the achieved trade-off in our analysis is at least the same as the optimal results in the literature under the worst case scenario and significantly improves the optimal results under benign regularity conditions.

[59]  arXiv:2109.10632 (cross-list from cs.AI) [pdf, other]
Title: Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Machine Learning (stat.ML)

Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. As environments grow in size, effective credit assignment becomes increasingly harder and often results in infeasible learning times. Still, in many real-world settings, there exist simplified underlying dynamics that can be leveraged for more scalable solutions. In this work, we exploit such locality structures effectively whilst maintaining global cooperation. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Centralized Training Decentralized Execution paradigm. Additionally, we provide a direct reward decomposition method for finding these local rewards when only a global signal is provided. We test our method empirically, showing it scales well compared to other methods, significantly improving performance and convergence speed.

[60]  arXiv:2109.10641 (cross-list from eess.IV) [pdf, other]
Title: Uncertainty-Aware Training for Cardiac Resynchronisation Therapy Response Prediction
Comments: STACOM 2021 Workshop
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Evaluation of predictive deep learning (DL) models beyond conventional performance metrics has become increasingly important for applications in sensitive environments like healthcare. Such models might have the capability to encode and analyse large sets of data but they often lack comprehensive interpretability methods, preventing clinical trust in predictive outcomes. Quantifying uncertainty of a prediction is one way to provide such interpretability and promote trust. However, relatively little attention has been paid to how to include such requirements into the training of the model. In this paper we: (i) quantify the data (aleatoric) and model (epistemic) uncertainty of a DL model for Cardiac Resynchronisation Therapy response prediction from cardiac magnetic resonance images, and (ii) propose and perform a preliminary investigation of an uncertainty-aware loss function that can be used to retrain an existing DL image-based classification model to encourage confidence in correct predictions and reduce confidence in incorrect predictions. Our initial results are promising, showing a significant increase in the (epistemic) confidence of true positive predictions, with some evidence of a reduction in false negative confidence.

[61]  arXiv:2109.10656 (cross-list from cs.RO) [pdf, other]
Title: Vehicle Behavior Prediction and Generalization Using Imbalanced Learning Techniques
Comments: Accepted for 2021 IEEE 24th International Conference on Intelligent Transportation Systems (ITSC)
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

The use of learning-based methods for vehicle behavior prediction is a promising research topic. However, many publicly available data sets suffer from class distribution skews which limits learning performance if not addressed. This paper proposes an interaction-aware prediction model consisting of an LSTM autoencoder and SVM classifier. Additionally, an imbalanced learning technique, the multiclass balancing ensemble is proposed. Evaluations show that the method enhances model performance, resulting in improved classification accuracy. Good generalization properties of learned models are important and therefore a generalization study is done where models are evaluated on unseen traffic data with dissimilar traffic behavior stemming from different road configurations. This is realized by using two distinct highway traffic recordings, the publicly available NGSIM US-101 and I80 data sets. Moreover, methods for encoding structural and static features into the learning process for improved generalization are evaluated. The resulting methods show substantial improvements in classification as well as generalization performance.

[62]  arXiv:2109.10679 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Application of Video-to-Video Translation Networks to Computational Fluid Dynamics
Authors: Hiromitsu Kigure
Comments: Published in Frontiers in Artificial Intelligence
Subjects: Fluid Dynamics (physics.flu-dyn); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In recent years, the evolution of artificial intelligence, especially deep learning, has been remarkable, and its application to various fields has been growing rapidly. In this paper, I report the results of the application of generative adversarial networks (GANs), specifically video-to-video translation networks, to computational fluid dynamics (CFD) simulations. The purpose of this research is to reduce the computational cost of CFD simulations with GANs. The architecture of GANs in this research is a combination of the image-to-image translation networks (the so-called "pix2pix") and Long Short-Term Memory (LSTM). It is shown that the results of high-cost and high-accuracy simulations (with high-resolution computational grids) can be estimated from those of low-cost and low-accuracy simulations (with low-resolution grids). In particular, the time evolution of density distributions in the cases of a high-resolution grid is reproduced from that in the cases of a low-resolution grid through GANs, and the density inhomogeneity estimated from the image generated by GANs recovers the ground truth with good accuracy. Qualitative and quantitative comparisons of the results of the proposed method with those of several super-resolution algorithms are also presented.

[63]  arXiv:2109.10681 (cross-list from stat.ML) [pdf, ps, other]
Title: A Latent Restoring Force Approach to Nonlinear System Identification
Comments: 18 pages, 11 figures, preprint submitted to Mechanical Systems and Signal Processing
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)

Identification of nonlinear dynamic systems remains a significant challenge across engineering. This work suggests an approach based on Bayesian filtering to extract and identify the contribution of an unknown nonlinear term in the system which can be seen as an alternative viewpoint on restoring force surface type approaches. To achieve this identification, the contribution which is the nonlinear restoring force is modelled, initially, as a Gaussian process in time. That Gaussian process is converted into a state-space model and combined with the linear dynamic component of the system. Then, by inference of the filtering and smoothing distributions, the internal states of the system and the nonlinear restoring force can be extracted. In possession of these states a nonlinear model can be constructed. The approach is demonstrated to be effective in both a simulated case study and on an experimental benchmark dataset.

[64]  arXiv:2109.10686 (cross-list from cs.CL) [pdf, other]
Title: Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

There remain many open questions pertaining to the scaling behaviour of Transformer architectures. These scaling decisions and findings can be critical, as training runs often come with an associated computational cost which have both financial and/or environmental impact. The goal of this paper is to present scaling insights from pretraining and finetuning Transformers. While Kaplan et al. presents a comprehensive study of the scaling behaviour of Transformer language models, the scope is only on the upstream (pretraining) loss. Therefore, it is still unclear if these set of findings transfer to downstream task within the context of the pretrain-finetune paradigm. The key findings of this paper are as follows: (1) we show that aside from only the model size, model shape matters for downstream fine-tuning, (2) scaling protocols operate differently at different compute regions, (3) widely adopted T5-base and T5-large sizes are Pareto-inefficient. To this end, we present improved scaling protocols whereby our redesigned models achieve similar downstream fine-tuning quality while having 50\% fewer parameters and training 40\% faster compared to the widely adopted T5-base model. We publicly release over 100 pretrained checkpoints of different T5 configurations to facilitate future research and analysis.

[65]  arXiv:2109.10697 (cross-list from cs.AI) [pdf, other]
Title: Towards Automatic Bias Detection in Knowledge Graphs
Comments: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings (EMNLP 2021). Nov 7--11, 2021
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

With the recent surge in social applications relying on knowledge graphs, the need for techniques to ensure fairness in KG based methods is becoming increasingly evident. Previous works have demonstrated that KGs are prone to various social biases, and have proposed multiple methods for debiasing them. However, in such studies, the focus has been on debiasing techniques, while the relations to be debiased are specified manually by the user. As manual specification is itself susceptible to human cognitive bias, there is a need for a system capable of quantifying and exposing biases, that can support more informed decisions on what to debias. To address this gap in the literature, we describe a framework for identifying biases present in knowledge graph embeddings, based on numerical bias metrics. We illustrate the framework with three different bias measures on the task of profession prediction, and it can be flexibly extended to further bias definitions and applications. The relations flagged as biased can then be handed to decision makers for judgement upon subsequent debiasing.

[66]  arXiv:2109.10742 (cross-list from cs.CV) [pdf, other]
Title: Early Lane Change Prediction for Automated Driving Systems Using Multi-Task Attention-based Convolutional Neural Networks
Comments: 12 pages, 12 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Lane change (LC) is one of the safety-critical manoeuvres in highway driving according to various road accident records. Thus, reliably predicting such manoeuvre in advance is critical for the safe and comfortable operation of automated driving systems. The majority of previous studies rely on detecting a manoeuvre that has been already started, rather than predicting the manoeuvre in advance. Furthermore, most of the previous works do not estimate the key timings of the manoeuvre (e.g., crossing time), which can actually yield more useful information for the decision making in the ego vehicle. To address these shortcomings, this paper proposes a novel multi-task model to simultaneously estimate the likelihood of LC manoeuvres and the time-to-lane-change (TTLC). In both tasks, an attention-based convolutional neural network (CNN) is used as a shared feature extractor from a bird's eye view representation of the driving environment. The spatial attention used in the CNN model improves the feature extraction process by focusing on the most relevant areas of the surrounding environment. In addition, two novel curriculum learning schemes are employed to train the proposed approach. The extensive evaluation and comparative analysis of the proposed method in existing benchmark datasets show that the proposed method outperforms state-of-the-art LC prediction models, particularly considering long-term prediction performance.

[67]  arXiv:2109.10743 (cross-list from cs.HC) [pdf]
Title: Natural Typing Recognition vis Surface Electromyography
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

By using a computer keyboard as a finger recording device, we construct the largest existing dataset for gesture recognition via surface electromyography (sEMG), and use deep learning to achieve over 90% character-level accuracy on reconstructing typed text entirely from measured muscle potentials. We prioritize the temporal structure of the EMG signal instead of the spatial structure of the electrode layout, using network architectures inspired by those used for real-time spoken language transcription. Our architecture recognizes the rapid movements of natural computer typing, which occur at irregular intervals and often overlap in time. The extensive size of our dataset also allows us to study gesture recognition after synthetically downgrading the spatial or temporal resolution, showing the system capabilities necessary for real-time gesture recognition.

[68]  arXiv:2109.10765 (cross-list from physics.flu-dyn) [pdf, other]
Title: An artificial neural network approach to bifurcating phenomena in computational fluid dynamics
Comments: 28 pages, 22 figures
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)

This work deals with the investigation of bifurcating fluid phenomena using a reduced order modelling setting aided by artificial neural networks. We discuss the POD-NN approach dealing with non-smooth solutions set of nonlinear parametrized PDEs. Thus, we study the Navier-Stokes equations describing: (i) the Coanda effect in a channel, and (ii) the lid driven triangular cavity flow, in a physical/geometrical multi-parametrized setting, considering the effects of the domain's configuration on the position of the bifurcation points. Finally, we propose a reduced manifold-based bifurcation diagram for a non-intrusive recovery of the critical points evolution. Exploiting such detection tool, we are able to efficiently obtain information about the pattern flow behaviour, from symmetry breaking profiles to attaching/spreading vortices, even at high Reynolds numbers.

[69]  arXiv:2109.10777 (cross-list from cs.CV) [pdf, other]
Title: Deep Variational Clustering Framework for Self-labeling of Large-scale Medical Images
Comments: arXiv admin note: text overlap with arXiv:2109.05232
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

We propose a Deep Variational Clustering (DVC) framework for unsupervised representation learning and clustering of large-scale medical images. DVC simultaneously learns the multivariate Gaussian posterior through the probabilistic convolutional encoder and the likelihood distribution with the probabilistic convolutional decoder; and optimizes cluster labels assignment. Here, the learned multivariate Gaussian posterior captures the latent distribution of a large set of unlabeled images. Then, we perform unsupervised clustering on top of the variational latent space using a clustering loss. In this approach, the probabilistic decoder helps to prevent the distortion of data points in the latent space and to preserve the local structure of data generating distribution. The training process can be considered as a self-training process to refine the latent space and simultaneously optimizing cluster assignments iteratively. We evaluated our proposed framework on three public datasets that represented different medical imaging modalities. Our experimental results show that our proposed framework generalizes better across different datasets. It achieves compelling results on several medical imaging benchmarks. Thus, our approach offers potential advantages over conventional deep unsupervised learning in real-world applications. The source code of the method and all the experiments are available publicly at: https://github.com/csfarzin/DVC

[70]  arXiv:2109.10778 (cross-list from cs.CV) [pdf, other]
Title: Label Cleaning Multiple Instance Learning: Refining Coarse Annotations on Single Whole-Slide Images
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development. However, generating exhaustive and accurate annotations is labor-intensive, challenging, and costly. Drawing only coarse and approximate annotations is a much easier task, less costly, and it alleviates pathologists' workload. In this paper, we study the problem of refining these approximate annotations in digital pathology to obtain more accurate ones. Some previous works have explored obtaining machine learning models from these inaccurate annotations, but few of them tackle the refinement problem where the mislabeled regions should be explicitly identified and corrected, and all of them require a - often very large - number of training samples. We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data. Patches cropped from a WSI with inaccurate labels are processed jointly with a MIL framework, and a deep-attention mechanism is leveraged to discriminate mislabeled instances, mitigating their impact on the predictive model and refining the segmentation. Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming the state-of-the-art alternatives, even while learning from a single slide. These results demonstrate the LC-MIL is a promising, lightweight tool to provide fine-grained annotations from coarsely annotated pathology sets.

[71]  arXiv:2109.10793 (cross-list from math.OC) [pdf, other]
Title: Physics-informed Neural Networks-based Model Predictive Control for Multi-link Manipulators
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Robotics (cs.RO)

We discuss nonlinear model predictive control (NMPC) for multi-body dynamics via physics-informed machine learning methods. Physics-informed neural networks (PINNs) are a promising tool to approximate (partial) differential equations. PINNs are not suited for control tasks in their original form since they are not designed to handle variable control actions or variable initial values. We thus present the idea of enhancing PINNs by adding control actions and initial conditions as additional network inputs. The high-dimensional input space is subsequently reduced via a sampling strategy and a zero-hold assumption. This strategy enables the controller design based on a PINN as an approximation of the underlying system dynamics. The additional benefit is that the sensitivities are easily computed via automatic differentiation, thus leading to efficient gradient-based algorithms. Finally, we present our results using our PINN-based MPC to solve a tracking problem for a complex mechanical system, a multi-link manipulator.

[72]  arXiv:2109.10794 (cross-list from stat.ML) [pdf, ps, other]
Title: Entropic Issues in Likelihood-Based OOD Detection
Comments: NeurIPS Workshop Submission
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Deep generative models trained by maximum likelihood remain very popular methods for reasoning about data probabilistically. However, it has been observed that they can assign higher likelihoods to out-of-distribution (OOD) data than in-distribution data, thus calling into question the meaning of these likelihood values. In this work we provide a novel perspective on this phenomenon, decomposing the average likelihood into a KL divergence term and an entropy term. We argue that the latter can explain the curious OOD behaviour mentioned above, suppressing likelihood values on datasets with higher entropy. Although our idea is simple, we have not seen it explored yet in the literature. This analysis provides further explanation for the success of OOD detection methods based on likelihood ratios, as the problematic entropy term cancels out in expectation. Finally, we discuss how this observation relates to recent success in OOD detection with manifold-supported models, for which the above decomposition does not hold.

[73]  arXiv:2109.10834 (cross-list from astro-ph.SR) [pdf, other]
Title: SCSS-Net: Solar Corona Structures Segmentation by Deep Learning
Comments: accepted for publication in Monthly Notices of the Royal Astronomical Society; for associated code, see this https URL
Subjects: Solar and Stellar Astrophysics (astro-ph.SR); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Space Physics (physics.space-ph)

Structures in the solar corona are the main drivers of space weather processes that might directly or indirectly affect the Earth. Thanks to the most recent space-based solar observatories, with capabilities to acquire high-resolution images continuously, the structures in the solar corona can be monitored over the years with a time resolution of minutes. For this purpose, we have developed a method for automatic segmentation of solar corona structures observed in EUV spectrum that is based on a deep learning approach utilizing Convolutional Neural Networks. The available input datasets have been examined together with our own dataset based on the manual annotation of the target structures. Indeed, the input dataset is the main limitation of the developed model's performance. Our \textit{SCSS-Net} model provides results for coronal holes and active regions that could be compared with other generally used methods for automatic segmentation. Even more, it provides a universal procedure to identify structures in the solar corona with the help of the transfer learning technique. The outputs of the model can be then used for further statistical studies of connections between solar activity and the influence of space weather on Earth.

[74]  arXiv:2109.10852 (cross-list from cs.CV) [pdf, other]
Title: Pix2seq: A Language Modeling Framework for Object Detection
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

This paper presents Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we simply cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural net to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms.

[75]  arXiv:2109.10854 (cross-list from math.OC) [pdf, ps, other]
Title: Imitation Learning of Stabilizing Policies for Nonlinear Systems
Authors: Sebastian East
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

There has been a recent interest in imitation learning methods that are guaranteed to produce a stabilizing control law with respect to a known system. Work in this area has generally considered linear systems and controllers, for which stabilizing imitation learning takes the form of a biconvex optimization problem. In this paper it is demonstrated that the same methods developed for linear systems and controllers can be readily extended to polynomial systems and controllers using sum of squares techniques. A projected gradient descent algorithm and an alternating direction method of multipliers algorithm are proposed as heuristics for solving the stabilizing imitation learning problem, and their performance is illustrated through numerical experiments.

[76]  arXiv:2109.10855 (cross-list from cs.CL) [pdf, other]
Title: BFClass: A Backdoor-free Text Classification Framework
Comments: Accepted to appear in Findings of EMNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Backdoor attack introduces artificial vulnerabilities into the model by poisoning a subset of the training data via injecting triggers and modifying labels. Various trigger design strategies have been explored to attack text classifiers, however, defending such attacks remains an open problem. In this work, we propose BFClass, a novel efficient backdoor-free training framework for text classification. The backbone of BFClass is a pre-trained discriminator that predicts whether each token in the corrupted input was replaced by a masked language model. To identify triggers, we utilize this discriminator to locate the most suspicious token from each training sample and then distill a concise set by considering their association strengths with particular labels. To recognize the poisoned subset, we examine the training samples with these identified triggers as the most suspicious token, and check if removing the trigger will change the poisoned model's prediction. Extensive experiments demonstrate that BFClass can identify all the triggers, remove 95% poisoned training samples with very limited false alarms, and achieve almost the same performance as the models trained on the benign training data.

[77]  arXiv:2109.10856 (cross-list from cs.CL) [pdf, other]
Title: Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data
Comments: Accepted to appear in EMNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases. To accommodate such requirements, we introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data. Instead of asking for new fine-grained human annotations, we opt to leverage label surface names as the only human guidance and weave in rich pre-trained generative language models into the iterative weak supervision strategy. Specifically, we first propose a label-conditioned finetuning formulation to attune these generators for our task. Furthermore, we devise a regularization objective based on the coarse-fine label constraints derived from our problem setting, giving us even further improvements over the prior formulation. Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement. Extensive experiments and case studies on two real-world datasets demonstrate superior performance over SOTA zero-shot classification baselines.

[78]  arXiv:2109.10862 (cross-list from cs.CL) [pdf, other]
Title: Recursively Summarizing Books with Human Feedback
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

A major challenge for scaling machine learning is training models to perform tasks that are very difficult or time-consuming for humans to evaluate. We present progress on this problem on the task of abstractive summarization of entire fiction novels. Our method combines learning from human feedback with recursive task decomposition: we use models trained on smaller parts of the task to assist humans in giving feedback on the broader task. We collect a large volume of demonstrations and comparisons from human labelers, and fine-tune GPT-3 using behavioral cloning and reward modeling to do summarization recursively. At inference time, the model first summarizes small sections of the book and then recursively summarizes these summaries to produce a summary of the entire book. Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves. Our resulting model generates sensible summaries of entire books, even matching the quality of human-written summaries in a few cases ($\sim5\%$ of books). We achieve state-of-the-art results on the recent BookSum dataset for book-length summarization. A zero-shot question-answering model using these summaries achieves state-of-the-art results on the challenging NarrativeQA benchmark for answering questions about books and movie scripts. We release datasets of samples from our model.

[79]  arXiv:2109.10870 (cross-list from cs.CR) [pdf, other]
Title: SoK: Machine Learning Governance
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Software Engineering (cs.SE)

The application of machine learning (ML) in computer systems introduces not only many benefits but also risks to society. In this paper, we develop the concept of ML governance to balance such benefits and risks, with the aim of achieving responsible applications of ML. Our approach first systematizes research towards ascertaining ownership of data and models, thus fostering a notion of identity specific to ML systems. Building on this foundation, we use identities to hold principals accountable for failures of ML systems through both attribution and auditing. To increase trust in ML systems, we then survey techniques for developing assurance, i.e., confidence that the system meets its security requirements and does not exhibit certain known failures. This leads us to highlight the need for techniques that allow a model owner to manage the life cycle of their system, e.g., to patch or retire their ML system. Put altogether, our systematization of knowledge standardizes the interactions between principals involved in the deployment of ML throughout its life cycle. We highlight opportunities for future work, e.g., to formalize the resulting game between ML principals.

[80]  arXiv:2109.10883 (cross-list from cs.NI) [pdf, other]
Title: ENERO: Efficient Real-Time Routing Optimization
Comments: 12 pages, 10 figures
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)

Wide Area Networks (WAN) are a key infrastructure in today's society. During the last years, WANs have seen a considerable increase in network's traffic as well as in the number of network applications. To enable the deployment of emergent network applications (e.g., Vehicular networks, Internet of Things), existing Traffic Engineering (TE) solutions must be able to achieve high performance real-time network operation. In addition, TE solutions must be able to adapt to dynamic scenarios (e.g., changes in the traffic matrix or topology link failures). However, current TE technologies rely on hand-crafted heuristics or computationally expensive solvers, which are not suitable for highly dynamic TE scenarios.
In this paper we propose Enero, an efficient real-time TE engine. Enero is based on a two-stage optimization process. In the first one, it leverages Deep Reinforcement Learning (DRL) to optimize the routing configuration by generating a long-term TE strategy. We integrated a Graph Neural Network (GNN) into the DRL agent to enable efficient TE on dynamic networks. In the second stage, Enero uses a Local Search algorithm to improve DRL's solution without adding computational overhead to the optimization process. Enero offers a lower bound in performance, enabling the network operator to know the worst-case performance of the DRL agent. We believe that the lower bound in performance will lighten the path of deploying DRL-based solutions in real-world network scenarios. The experimental results indicate that Enero is able to operate in real-world dynamic network topologies in 4.5 seconds on average for topologies up to 100 edges.

[81]  arXiv:2109.10895 (cross-list from cs.HC) [pdf, other]
Title: Geo-Context Aware Study of Vision-Based Autonomous Driving Models and Spatial Video Data
Comments: 11 pages, 8 figures, and 1 table. This paper is accepted and to be published in IEEE Transactions on Visualization and Computer Graphics
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)

Vision-based deep learning (DL) methods have made great progress in learning autonomous driving models from large-scale crowd-sourced video datasets. They are trained to predict instantaneous driving behaviors from video data captured by on-vehicle cameras. In this paper, we develop a geo-context aware visualization system for the study of Autonomous Driving Model (ADM) predictions together with large-scale ADM video data. The visual study is seamlessly integrated with the geographical environment by combining DL model performance with geospatial visualization techniques. Model performance measures can be studied together with a set of geospatial attributes over map views. Users can also discover and compare prediction behaviors of multiple DL models in both city-wide and street-level analysis, together with road images and video contents. Therefore, the system provides a new visual exploration platform for DL model designers in autonomous driving. Use cases and domain expert evaluation show the utility and effectiveness of the visualization system.

[82]  arXiv:2109.10898 (cross-list from stat.ML) [pdf, other]
Title: A Robust Asymmetric Kernel Function for Bayesian Optimization, with Application to Image Defect Detection in Manufacturing Systems
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Some response surface functions in complex engineering systems are usually highly nonlinear, unformed, and expensive-to-evaluate. To tackle this challenge, Bayesian optimization, which conducts sequential design via a posterior distribution over the objective function, is a critical method used to find the global optimum of black-box functions. Kernel functions play an important role in shaping the posterior distribution of the estimated function. The widely used kernel function, e.g., radial basis function (RBF), is very vulnerable and susceptible to outliers; the existence of outliers is causing its Gaussian process surrogate model to be sporadic. In this paper, we propose a robust kernel function, Asymmetric Elastic Net Radial Basis Function (AEN-RBF). Its validity as a kernel function and computational complexity are evaluated. When compared to the baseline RBF kernel, we prove theoretically that AEN-RBF can realize smaller mean squared prediction error under mild conditions. The proposed AEN-RBF kernel function can also realize faster convergence to the global optimum. We also show that the AEN-RBF kernel function is less sensitive to outliers, and hence improves the robustness of the corresponding Bayesian optimization with Gaussian processes. Through extensive evaluations carried out on synthetic and real-world optimization problems, we show that AEN-RBF outperforms existing benchmark kernel functions.

Replacements for Thu, 23 Sep 21

[83]  arXiv:1906.08823 (replaced) [pdf, other]
Title: Cross-Subject Statistical Shift Estimation for Generalized Electroencephalography-based Mental Workload Assessment
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Machine Learning (stat.ML)
[84]  arXiv:2002.10631 (replaced) [pdf, other]
Title: Batch norm with entropic regularization turns deterministic autoencoders into generative models
Journal-ref: Published in the Proceedings of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[85]  arXiv:2004.06383 (replaced) [pdf, other]
Title: Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions
Comments: 31 pages, 10 figures, 3 tables, 1 algorithms
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[86]  arXiv:2006.01738 (replaced) [pdf, other]
Title: Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[87]  arXiv:2006.05826 (replaced) [pdf, other]
Title: Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[88]  arXiv:2007.12911 (replaced) [pdf, other]
Title: Tighter risk certificates for neural networks
Comments: New version includes: i) experiment showing the potential of the risk certificate for neural architecture search (Fig. 2); ii) experiments spanning uncertainty quantification and analysis of prior/posterior (Section 7.8); iii) an outline of the strengths of probabilistic neural networks trained by PBB (Section 7.9) and iv) a strengthened discussion on the connection to Bayesian learning
Journal-ref: Journal of Machine Learning Research, 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[89]  arXiv:2010.11773 (replaced) [pdf, other]
Title: On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks
Comments: Accepted at ICPR 2020, fixed Figure 5
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[90]  arXiv:2011.03186 (replaced) [pdf, other]
Title: Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning
Comments: The paper was published in AISTATS-2021 and the longer version of the paper is under review at JMLR
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[91]  arXiv:2011.07018 (replaced) [pdf, other]
Title: Synthetic Data -- Anonymisation Groundhog Day
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[92]  arXiv:2011.08558 (replaced) [pdf, other]
Title: On the Transferability of Adversarial Attacksagainst Neural Text Classifier
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
[93]  arXiv:2101.07415 (replaced) [pdf, other]
Title: ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution
Comments: 22 pages. This is an updated version of a previous submission which can be found at arXiv:1907.06511. See this https URL for associated code
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO)
[94]  arXiv:2102.01621 (replaced) [pdf, ps, other]
Title: Depth separation beyond radial functions
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[95]  arXiv:2103.05134 (replaced) [pdf, other]
Title: Constrained Learning with Non-Convex Losses
Comments: arXiv admin note: text overlap with arXiv:2006.05487
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[96]  arXiv:2105.13327 (replaced) [pdf, other]
Title: Encoders and Ensembles for Task-Free Continual Learning
Subjects: Machine Learning (cs.LG)
[97]  arXiv:2105.14103 (replaced) [pdf, other]
Title: An Attention Free Transformer
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
[98]  arXiv:2106.00730 (replaced) [pdf, other]
Title: Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification
Comments: To appear in CIKM 2021
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
[99]  arXiv:2106.10934 (replaced) [pdf, other]
Title: GRAND: Graph Neural Diffusion
Comments: 15 pages, 4 figures. Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[100]  arXiv:2107.02168 (replaced) [pdf, other]
Title: DPPIN: A Biological Repository of Dynamic Protein-Protein Interaction Network Data
Authors: Dongqi Fu, Jingrui He
Subjects: Machine Learning (cs.LG)
[101]  arXiv:2107.11774 (replaced) [pdf, other]
Title: SGD May Never Escape Saddle Points
Comments: Typoes fixed
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[102]  arXiv:2107.14171 (replaced) [pdf, other]
Title: Tianshou: a Highly Modularized Deep Reinforcement Learning Library
Comments: 10 pages, 4 figures, 4 tables
Subjects: Machine Learning (cs.LG)
[103]  arXiv:2107.14432 (replaced) [pdf, other]
Title: Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
Comments: 24 pages. Published as a conference paper at ECML PKDD 2021. This version includes Appendix which was not included in the published version because of page limit
Journal-ref: Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part III
Subjects: Machine Learning (cs.LG)
[104]  arXiv:2108.04417 (replaced) [pdf, other]
Title: Privacy-Preserving Machine Learning: Methods, Challenges and Directions
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
[105]  arXiv:2108.09020 (replaced) [pdf, other]
Title: Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data
Comments: Accepted to ICCV 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[106]  arXiv:2109.08356 (replaced) [pdf, other]
Title: Accurate, Interpretable, and Fast Animation: An Iterative, Sparse, and Nonconvex Approach
Subjects: Machine Learning (cs.LG); Graphics (cs.GR)
[107]  arXiv:2109.08381 (replaced) [pdf, other]
Title: From Known to Unknown: Knowledge-guided Transformer for Time-Series Sales Forecasting in Alibaba
Comments: 8 pages, 7 figure
Subjects: Machine Learning (cs.LG)
[108]  arXiv:2109.09307 (replaced) [pdf, other]
Title: Assisted Learning for Organizations with Limited Data
Comments: 16 pages, 18 figures
Subjects: Machine Learning (cs.LG)
[109]  arXiv:2109.09705 (replaced) [pdf, other]
Title: Neural forecasting at scale
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
[110]  arXiv:2109.09888 (replaced) [pdf, other]
Title: Chemical-Reaction-Aware Molecule Representation Learning
Subjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Quantitative Methods (q-bio.QM)
[111]  arXiv:2109.09948 (replaced) [pdf, other]
Title: Neural networks with trainable matrix activation functions
Subjects: Machine Learning (cs.LG)
[112]  arXiv:2109.10020 (replaced) [pdf, other]
Title: Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks
Comments: 10 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
[113]  arXiv:1906.06427 (replaced) [pdf, other]
Title: Real-Time Privacy-Preserving Data Release for Smart Meters
Journal-ref: in IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 5174-5183, Nov. 2020
Subjects: Signal Processing (eess.SP); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
[114]  arXiv:1908.05715 (replaced) [pdf, other]
Title: Automated classification of plasma regions using 3D particle energy distributions
Comments: Accepted to JGR: Space Physics
Subjects: Space Physics (physics.space-ph); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[115]  arXiv:1912.09526 (replaced) [pdf, other]
Title: Inference for Hit Enrichment Curves, with Applications to Drug Discovery
Comments: 34 pages, 6 figures, 2 tables. Original version was a chapter in Jeremy Ash's dissertation. Current version is the paper submitted to Journal of American Statistical Association: Applications and Case Studies
Subjects: Applications (stat.AP); Information Retrieval (cs.IR); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Methodology (stat.ME)
[116]  arXiv:2007.08970 (replaced) [pdf, other]
Title: Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[117]  arXiv:2008.10087 (replaced) [pdf, other]
Title: Blindness of score-based methods to isolated components and mixing proportions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[118]  arXiv:2012.07483 (replaced) [pdf, other]
Title: On the Treatment of Optimization Problems with L1 Penalty Terms via Multiobjective Continuation
Comments: Accepted by IEEE TPAMI 2021
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[119]  arXiv:2012.08496 (replaced) [pdf, other]
Title: Spectral Methods for Data Science: A Statistical Perspective
Subjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP); Statistics Theory (math.ST)
[120]  arXiv:2012.15739 (replaced) [pdf, other]
Title: Uncertainty Bounds for Multivariate Machine Learning Predictions on High-Strain Brittle Fracture
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
[121]  arXiv:2101.12190 (replaced) [pdf, other]
Title: Practical distributed quantum information processing with LOCCNet
Comments: 19 pages
Subjects: Quantum Physics (quant-ph); Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Theory (cs.IT); Machine Learning (cs.LG); High Energy Physics - Theory (hep-th)
[122]  arXiv:2102.00310 (replaced) [pdf, other]
Title: Symmetry-Aware Reservoir Computing
Comments: 10 pages, 7 Figures
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
[123]  arXiv:2102.07252 (replaced) [pdf, other]
Title: On Topology Optimization and Routing in Integrated Access and Backhaul Networks: A Genetic Algorithm-based Approach
Comments: Revised manuscript in IEEE Open Journal of the Communications Society
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG)
[124]  arXiv:2102.10242 (replaced) [pdf, other]
Title: Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[125]  arXiv:2102.10556 (replaced) [pdf, other]
Title: Inductive logic programming at 30
Comments: Extension of IJCAI20 survey paper. Accepted for the MLJ. arXiv admin note: substantial text overlap with arXiv:2002.11002, arXiv:2008.07912
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[126]  arXiv:2103.09448 (replaced) [pdf, other]
Title: Adversarial Attacks on Camera-LiDAR Models for 3D Car Detection
Comments: arXiv admin note: text overlap with arXiv:2101.10747 Updates in v2: Expanded conclusion and future work, reduced Figure 5's size, and a small correction in Table 3
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Graphics (cs.GR); Machine Learning (cs.LG)
[127]  arXiv:2105.09266 (replaced) [pdf, ps, other]
Title: Copyright in Generative Deep Learning
Comments: 16 pages. Second version contains updates after entry into force of EU's directive on copyright in the Digital Single Market, and corrections of typos. Third version contains a new section about GitHub Copilot and its copyright implications. Fourth version contains improvements in abstract, introduction and conclusions, and a general rearrangement of the central sections
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[128]  arXiv:2106.02036 (replaced) [pdf, other]
Title: Anticipative Video Transformer
Comments: ICCV 2021. Ranked #1 in CVPR'21 EPIC-Kitchens-100 Action Anticipation challenge. Webpage/code/models: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
[129]  arXiv:2107.01347 (replaced) [pdf, other]
Title: Traffic Signal Control with Communicative Deep Reinforcement Learning Agents: a Case Study
Subjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[130]  arXiv:2107.07634 (replaced) [pdf, other]
Title: Multi-task Learning with Cross Attention for Keyword Spotting
Comments: Accepted at ASRU 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[131]  arXiv:2107.08209 (replaced) [pdf, other]
Title: Minimising quantifier variance under prior probability shift
Authors: Dirk Tasche
Comments: 12 pages, 1 figure
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[132]  arXiv:2108.07636 (replaced) [pdf, other]
Title: Semi-parametric Bayesian Additive Regression Trees
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[133]  arXiv:2108.12211 (replaced) [pdf, other]
Title: Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation
Comments: 8 pages, 5 figures, 3 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[134]  arXiv:2109.00590 (replaced) [pdf, other]
Title: WebQA: Multihop and Multimodal QA
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[135]  arXiv:2109.04699 (replaced) [pdf, other]
Title: EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[136]  arXiv:2109.06148 (replaced) [pdf, other]
Title: DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection
Comments: Main paper: 7 pages, References: 2 pages, Appendix: 5 pages; Main paper: 5 figures, Appendix: 5 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[137]  arXiv:2109.10115 (replaced) [pdf, other]
Title: StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation
Comments: ICCV 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
[ total of 137 entries: 1-137 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2109, contact, help  (Access key information)