We gratefully acknowledge support from
the Simons Foundation and member institutions.

Machine Learning

New submissions

[ total of 111 entries: 1-111 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 13 May 21

[1]  arXiv:2105.05275 [pdf, other]
Title: Hermitian Symmetric Spaces for Graph Embeddings
Comments: 13 pages, 1 figure. Accepted at NeurIPS 2020 workshop on Differential Geometry meets Deep Learning
Subjects: Machine Learning (cs.LG); Computational Geometry (cs.CG)

Learning faithful graph representations as sets of vertex embeddings has become a fundamental intermediary step in a wide range of machine learning applications. The quality of the embeddings is usually determined by how well the geometry of the target space matches the structure of the data. In this work we learn continuous representations of graphs in spaces of symmetric matrices over C. These spaces offer a rich geometry that simultaneously admits hyperbolic and Euclidean subspaces, and are amenable to analysis and explicit computations. We implement an efficient method to learn embeddings and compute distances, and develop the tools to operate with such spaces. The proposed models are able to automatically adapt to very dissimilar arrangements without any apriori estimates of graph features. On various datasets with very diverse structural properties and reconstruction measures our model ties the results of competitive baselines for geometrically pure graphs and outperforms them for graphs with mixed geometric features, showcasing the versatility of our approach.

[2]  arXiv:2105.05316 [pdf, other]
Title: A Computational Framework for Modeling Complex Sensor Network Data Using Graph Signal Processing and Graph Neural Networks in Structural Health Monitoring
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Complex networks lend themselves to the modeling of multidimensional data, such as relational and/or temporal data. In particular, when such complex data and their inherent relationships need to be formalized, complex network modeling and its resulting graph representations enable a wide range of powerful options. In this paper, we target this - connected to specific machine learning approaches on graphs for structural health monitoring on an analysis and predictive (maintenance) perspective. Specifically, we present a framework based on Complex Network Modeling, integrating Graph Signal Processing (GSP) and Graph Neural Network (GNN) approaches. We demonstrate this framework in our targeted application domain of Structural Health Monitoring (SHM). In particular, we focus on a prominent real-world structural health monitoring use case, i.e., modeling and analyzing sensor data (strain, vibration) of a large bridge in the Netherlands. In our experiments, we show that GSP enables the identification of the most important sensors, for which we investigate a set of search and optimization approaches. Furthermore, GSP enables the detection of specific graph signal patterns (mode shapes), capturing physical functional properties of the sensors in the applied complex network. In addition, we show the efficacy of applying GNNs for strain prediction on this kind of data.

[3]  arXiv:2105.05326 [pdf, other]
Title: Multi-version Tensor Completion for Time-delayed Spatio-temporal Data
Subjects: Machine Learning (cs.LG); Information Retrieval (cs.IR)

Real-world spatio-temporal data is often incomplete or inaccurate due to various data loading delays. For example, a location-disease-time tensor of case counts can have multiple delayed updates of recent temporal slices for some locations or diseases. Recovering such missing or noisy (under-reported) elements of the input tensor can be viewed as a generalized tensor completion problem. Existing tensor completion methods usually assume that i) missing elements are randomly distributed and ii) noise for each tensor element is i.i.d. zero-mean. Both assumptions can be violated for spatio-temporal tensor data. We often observe multiple versions of the input tensor with different under-reporting noise levels. The amount of noise can be time- or location-dependent as more updates are progressively introduced to the tensor. We model such dynamic data as a multi-version tensor with an extra tensor mode capturing the data updates. We propose a low-rank tensor model to predict the updates over time. We demonstrate that our method can accurately predict the ground-truth values of many real-world tensors. We obtain up to 27.2% lower root mean-squared-error compared to the best baseline method. Finally, we extend our method to track the tensor data over time, leading to significant computational savings.

[4]  arXiv:2105.05328 [pdf, other]
Title: Comparing interpretability and explainability for feature selection
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A common approach for feature selection is to examine the variable importance scores for a machine learning model, as a way to understand which features are the most relevant for making predictions. Given the significance of feature selection, it is crucial for the calculated importance scores to reflect reality. Falsely overestimating the importance of irrelevant features can lead to false discoveries, while underestimating importance of relevant features may lead us to discard important features, resulting in poor model performance. Additionally, black-box models like XGBoost provide state-of-the art predictive performance, but cannot be easily understood by humans, and thus we rely on variable importance scores or methods for explainability like SHAP to offer insight into their behavior.
In this paper, we investigate the performance of variable importance as a feature selection method across various black-box and interpretable machine learning methods. We compare the ability of CART, Optimal Trees, XGBoost and SHAP to correctly identify the relevant subset of variables across a number of experiments. The results show that regardless of whether we use the native variable importance method or SHAP, XGBoost fails to clearly distinguish between relevant and irrelevant features. On the other hand, the interpretable methods are able to correctly and efficiently identify irrelevant features, and thus offer significantly better performance for feature selection.

[5]  arXiv:2105.05347 [pdf, other]
Title: Return-based Scaling: Yet Another Normalisation Trick for Deep RL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Scaling issues are mundane yet irritating for practitioners of reinforcement learning. Error scales vary across domains, tasks, and stages of learning; sometimes by many orders of magnitude. This can be detrimental to learning speed and stability, create interference between learning tasks, and necessitate substantial tuning. We revisit this topic for agents based on temporal-difference learning, sketch out some desiderata and investigate scenarios where simple fixes fall short. The mechanism we propose requires neither tuning, clipping, nor adaptation. We validate its effectiveness and robustness on the suite of Atari games. Our scaling method turns out to be particularly helpful at mitigating interference, when training a shared neural network on multiple targets that differ in reward scale or discounting.

[6]  arXiv:2105.05381 [pdf, other]
Title: Accuracy-Privacy Trade-off in Deep Ensemble
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

Deep ensemble learning aims to improve the classification accuracy by training several neural networks and fusing their outputs. It has been widely shown to improve accuracy. At the same time, ensemble learning has also been proposed to mitigate privacy leakage in terms of membership inference (MI), where the goal of an attacker is to infer whether a particular data sample has been used to train a target model. In this paper, we show that these two goals of ensemble learning, namely improving accuracy and privacy, directly conflict with each other. Using a wide range of datasets and model architectures, we empirically demonstrate the trade-off between privacy and accuracy in deep ensemble learning. We find that ensembling can improve either privacy or accuracy, but not both simultaneously -- when ensembling improves the classification accuracy, the effectiveness of the MI attack also increases. We analyze various factors that contribute to such privacy leakage in ensembling such as prediction confidence and agreement between models that constitute the ensemble. Our evaluation of defenses against MI attacks, such as regularization and differential privacy, shows that they can mitigate the effectiveness of the MI attack but simultaneously degrade ensemble accuracy. The source code is available at https://github.com/shrezaei/MI-on-EL.

[7]  arXiv:2105.05400 [pdf, ps, other]
Title: Homogeneous vector bundles and $G$-equivariant convolutional neural networks
Authors: Jimmy Aronsson
Comments: 23 pages
Subjects: Machine Learning (cs.LG); Representation Theory (math.RT); Machine Learning (stat.ML)

$G$-equivariant convolutional neural networks (GCNNs) is a geometric deep learning model for data defined on a homogeneous $G$-space $\mathcal{M}$. GCNNs are designed to respect the global symmetry in $\mathcal{M}$, thereby facilitating learning. In this paper, we analyze GCNNs on homogeneous spaces $\mathcal{M} = G/K$ in the case of unimodular Lie groups $G$ and compact subgroups $K \leq G$. We demonstrate that homogeneous vector bundles is the natural setting for GCNNs. We also use reproducing kernel Hilbert spaces to obtain a precise criterion for expressing $G$-equivariant layers as convolutional layers. This criterion is then rephrased as a bandwidth criterion, leading to even stronger results for some groups.

[8]  arXiv:2105.05449 [pdf, ps, other]
Title: An efficient projection neural network for $\ell_1$-regularized logistic regression
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

$\ell_1$ regularization has been used for logistic regression to circumvent the overfitting and use the estimated sparse coefficient for feature selection. However, the challenge of such a regularization is that the $\ell_1$ norm is not differentiable, making the standard algorithms for convex optimization not applicable to this problem. This paper presents a simple projection neural network for $\ell_1$-regularized logistics regression. In contrast to many available solvers in the literature, the proposed neural network does not require any extra auxiliary variable nor any smooth approximation, and its complexity is almost identical to that of the gradient descent for logistic regression without $\ell_1$ regularization, thanks to the projection operator. We also investigate the convergence of the proposed neural network by using the Lyapunov theory and show that it converges to a solution of the problem with any arbitrary initial value. The proposed neural solution significantly outperforms state-of-the-art methods with respect to the execution time and is competitive in terms of accuracy and AUROC.

[9]  arXiv:2105.05458 [pdf, other]
Title: Learning Graphs from Smooth Signals under Moment Uncertainty
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Optimization and Control (math.OC)

We consider the problem of inferring the graph structure from a given set of smooth graph signals. The number of perceived graph signals is always finite and possibly noisy, thus the statistical properties of the data distribution is ambiguous. Traditional graph learning models do not take this distributional uncertainty into account, thus performance may be sensitive to different sets of data. In this paper, we propose a distributionally robust approach to graph learning, which incorporates the first and second moment uncertainty into the smooth graph learning model. Specifically, we cast our graph learning model as a minimax optimization problem, and further reformulate it as a nonconvex minimization problem with linear constraints. In our proposed formulation, we find a theoretical interpretation of the Laplacian regularizer, which is adopted in many existing works in an intuitive manner. Although the first moment uncertainty leads to an annoying square root term in the objective function, we prove that it enjoys the smoothness property with probability 1 over the entire constraint. We develop a efficient projected gradient descent (PGD) method and establish its global iterate convergence to a critical point. We conduct extensive experiments on both synthetic and real data to verify the effectiveness of our model and the efficiency of the PGD algorithm. Compared with the state-of-the-art smooth graph learning methods, our approach exhibits superior and more robust performance across different populations of signals in terms of various evaluation metrics.

[10]  arXiv:2105.05473 [pdf, other]
Title: Interpretable performance analysis towards offline reinforcement learning: A dataset perspective
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Offline reinforcement learning (RL) has increasingly become the focus of the artificial intelligent research due to its wide real-world applications where the collection of data may be difficult, time-consuming, or costly. In this paper, we first propose a two-fold taxonomy for existing offline RL algorithms from the perspective of exploration and exploitation tendency. Secondly, we derive the explicit expression of the upper bound of extrapolation error and explore the correlation between the performance of different types of algorithms and the distribution of actions under states. Specifically, we relax the strict assumption on the sufficiently large amount of state-action tuples. Accordingly, we provably explain why batch constrained Q-learning (BCQ) performs better than other existing techniques. Thirdly, after identifying the weakness of BCQ on dataset of low mean episode returns, we propose a modified variant based on top return selection mechanism, which is proved to be able to gain state-of-the-art performance on various datasets. Lastly, we create a benchmark platform on the Atari domain, entitled RL easy go (RLEG), at an estimated cost of more than 0.3 million dollars. We make it open-source for fair and comprehensive competitions between offline RL algorithms with complete datasets and checkpoints being provided.

[11]  arXiv:2105.05482 [pdf, other]
Title: On the reproducibility of fully convolutional neural networks for modeling time-space evolving physical systems
Subjects: Machine Learning (cs.LG)

Reproducibility of a deep-learning fully convolutional neural network is evaluated by training several times the same network on identical conditions (database, hyperparameters, hardware) with non-deterministic Graphics Processings Unit (GPU) operations. The propagation of two-dimensional acoustic waves, typical of time-space evolving physical systems, is studied on both recursive and non-recursive tasks. Significant changes in models properties (weights, featured fields) are observed. When tested on various propagation benchmarks, these models systematically returned estimations with a high level of deviation, especially for the recurrent analysis which strongly amplifies variability due to the non-determinism. Trainings performed with double floating-point precision provide slightly better estimations and a significant reduction of the variability of both the network parameters and its testing error range.

[12]  arXiv:2105.05495 [pdf, other]
Title: LipBaB: Computing exact Lipschitz constant of ReLU networks
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)

The Lipschitz constant of neural networks plays an important role in several contexts of deep learning ranging from robustness certification and regularization to stability analysis of systems with neural network controllers. Obtaining tight bounds of the Lipschitz constant is therefore important. We introduce LipBaB, a branch and bound framework to compute certified bounds of the local Lipschitz constant of deep neural networks with ReLU activation functions up to any desired precision. We achieve this by bounding the norm of the Jacobians, corresponding to different activation patterns of the network caused within the input domain. Our algorithm can provide provably exact computation of the Lipschitz constant for any p-norm.

[13]  arXiv:2105.05504 [pdf, ps, other]
Title: An Empirical Experiment on Deep Learning Models for Predicting Traffic Data
Comments: 6 pages, 3 figures, accepted at 37th IEEE International Conference on Data Engineering (ICDE 2021)
Subjects: Machine Learning (cs.LG)

To tackle ever-increasing city traffic congestion problems, researchers have proposed deep learning models to aid decision-makers in the traffic control domain. Although the proposed models have been remarkably improved in recent years, there are still questions that need to be answered before deploying models. For example, it is difficult to figure out which models provide state-of-the-art performance, as recently proposed models have often been evaluated with different datasets and experiment environments. It is also difficult to determine which models would work when traffic conditions change abruptly (e.g., rush hour). In this work, we conduct two experiments to answer the two questions. In the first experiment, we conduct an experiment with the state-of-the-art models and the identical public datasets to compare model performance under a consistent experiment environment. We then extract a set of temporal regions in the datasets, whose speeds change abruptly and use these regions to explore model performance with difficult intervals. The experiment results indicate that Graph-WaveNet and GMAN show better performance in general. We also find that prediction models tend to have varying performances with data and intervals, which calls for in-depth analysis of models on difficult intervals for real-world deployment.

[14]  arXiv:2105.05530 [pdf, other]
Title: Winograd Algorithm for AdderNet
Comments: 9 pages, accepted by ICML2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

Adder neural network (AdderNet) is a new kind of deep model that replaces the original massive multiplications in convolutions by additions while preserving the high performance. Since the hardware complexity of additions is much lower than that of multiplications, the overall energy consumption is thus reduced significantly. To further optimize the hardware overhead of using AdderNet, this paper studies the winograd algorithm, which is a widely used fast algorithm for accelerating convolution and saving the computational costs. Unfortunately, the conventional Winograd algorithm cannot be directly applied to AdderNets since the distributive law in multiplication is not valid for the l1-norm. Therefore, we replace the element-wise multiplication in the Winograd equation by additions and then develop a new set of transform matrixes that can enhance the representation ability of output features to maintain the performance. Moreover, we propose the l2-to-l1 training strategy to mitigate the negative impacts caused by formal inconsistency. Experimental results on both FPGA and benchmarks show that the new method can further reduce the energy consumption without affecting the accuracy of the original AdderNet.

[15]  arXiv:2105.05539 [pdf]
Title: A new framework for experimental design using Bayesian Evidential Learning: the case of wellhead protection area
Subjects: Machine Learning (cs.LG)

In this contribution, we predict the wellhead protection area (WHPA, target), the shape and extent of which is influenced by the distribution of hydraulic conductivity (K), from a small number of tracing experiments (predictor). Our first objective is to make stochastic predictions of the WHPA within the Bayesian Evidential Learning (BEL) framework, which aims to find a direct relationship between predictor and target using machine learning. This relationship is learned from a small set of training models (400) sampled from the prior distribution of K. The associated 400 pairs of simulated predictors and targets are obtained through forward modelling. Newly collected field data can then be directly used to predict the approximate posterior distribution of the corresponding WHPA. The uncertainty range of the posterior WHPA distribution is affected by the number and position of data sources (injection wells). Our second objective is to extend BEL to identify the optimal design of data source locations that minimizes the posterior uncertainty of the WHPA. This can be done explicitly, without averaging or approximating because once trained, the BEL model allows the computation of the posterior uncertainty corresponding to any new input data. We use the Modified Hausdorff Distance and the Structural Similarity index metrics to estimate the posterior uncertainty range of the WHPA. Increasing the number of injection wells effectively reduces the derived posterior WHPA uncertainty. Our approach can also estimate which injection wells are more informative than others, as validated through a k-fold cross-validation procedure. Overall, the application of BEL to experimental design makes it possible to identify the data sources maximizing the information content of any measurement data.

[16]  arXiv:2105.05553 [pdf, other]
Title: Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias
Subjects: Machine Learning (cs.LG)

Convolutional Neural networks of different architectures seem to learn to classify images in the same order. To understand this phenomenon, we revisit the over-parametrized deep linear network model. Our analysis of this model's learning dynamics reveals that the convergence rate of its parameters is exponentially faster along directions corresponding to the larger principal components of the data, at a rate governed by the singular values. We term this convergence pattern the Principal Components bias (PC-bias). We show how the PC-bias streamlines the order of learning of both linear and non-linear networks, more prominently in earlier stages of learning. We then compare our results to the spectral bias, showing that both biases can be seen independently, and affect the order of learning in different ways. Finally, we discuss how the PC-bias can explain several phenomena, including the benefits of prevalent initialization schemes, how early stopping may be related to PCA, and why deep networks converge more slowly when given random labels.

[17]  arXiv:2105.05555 [pdf, ps, other]
Title: Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time
Authors: Yu Cheng, Honghao Lin
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST); Machine Learning (stat.ML)

We study the problem of learning Bayesian networks where an $\epsilon$-fraction of the samples are adversarially corrupted. We focus on the fully-observable case where the underlying graph structure is known. In this work, we present the first nearly-linear time algorithm for this problem with a dimension-independent error guarantee. Previous robust algorithms with comparable error guarantees are slower by at least a factor of $(d/\epsilon)$, where $d$ is the number of variables in the Bayesian network and $\epsilon$ is the fraction of corrupted samples.
Our algorithm and analysis are considerably simpler than those in previous work. We achieve this by establishing a direct connection between robust learning of Bayesian networks and robust mean estimation. As a subroutine in our algorithm, we develop a robust mean estimation algorithm whose runtime is nearly-linear in the number of nonzeros in the input samples, which may be of independent interest.

[18]  arXiv:2105.05559 [pdf, other]
Title: Learning Uncertainty with Artificial Neural Networks for Improved Remaining Time Prediction of Business Processes
Comments: Accepted for the main conference at the Business Process Management Conferences 2021, 6-10 September 2021, Rome, Italy
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Artificial neural networks will always make a prediction, even when completely uncertain and regardless of the consequences. This obliviousness of uncertainty is a major obstacle towards their adoption in practice. Techniques exist, however, to estimate the two major types of uncertainty: model uncertainty and observation noise in the data. Bayesian neural networks are theoretically well-founded models that can learn the model uncertainty of their predictions. Minor modifications to these models and their loss functions allow learning the observation noise for individual samples as well. This paper is the first to apply these techniques to predictive process monitoring. We found that they contribute towards more accurate predictions and work quickly. However, their main benefit resides with the uncertainty estimates themselves that allow the separation of higher-quality from lower-quality predictions and the building of confidence intervals. This leads to many interesting applications, enables an earlier adoption of prediction systems with smaller datasets and fosters a better cooperation with humans.

[19]  arXiv:2105.05612 [pdf, other]
Title: Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn.
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns.
OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side information. Our approach shows that we can defer this requirement to an independent model selection stage. We obtain SOTA results in visual recognition on biased data and generalization across visual domains. The method - the first to evade the simplicity bias - highlights the need for a better understanding and control of inductive biases in deep learning.

[20]  arXiv:2105.05622 [pdf, other]
Title: On risk-based active learning for structural health monitoring
Comments: 28 pages. 23 figures. Under review, preprint submitted to Mechanical Systems and Signal Processing
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A primary motivation for the development and implementation of structural health monitoring systems, is the prospect of gaining the ability to make informed decisions regarding the operation and maintenance of structures and infrastructure. Unfortunately, descriptive labels for measured data corresponding to health-state information for the structure of interest are seldom available prior to the implementation of a monitoring system. This issue limits the applicability of the traditional supervised and unsupervised approaches to machine learning in the development of statistical classifiers for decision-supporting SHM systems.
The current paper presents a risk-based formulation of active learning, in which the querying of class-label information is guided by the expected value of said information for each incipient data point. When applied to structural health monitoring, the querying of class labels can be mapped onto the inspection of a structure of interest in order to determine its health state. In the current paper, the risk-based active learning process is explained and visualised via a representative numerical example and subsequently applied to the Z24 Bridge benchmark. The results of the case studies indicate that a decision-maker's performance can be improved via the risk-based active learning of a statistical classifier, such that the decision process itself is taken into account.

[21]  arXiv:2105.05631 [pdf, other]
Title: Cross-Modal and Multimodal Data Analysis Based on Functional Mapping of Spectral Descriptors and Manifold Regularization
Comments: 37 pages
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

Multimodal manifold modeling methods extend the spectral geometry-aware data analysis to learning from several related and complementary modalities. Most of these methods work based on two major assumptions: 1) there are the same number of homogeneous data samples in each modality, and 2) at least partial correspondences between modalities are given in advance as prior knowledge. This work proposes two new multimodal modeling methods. The first method establishes a general analyzing framework to deal with the multimodal information problem for heterogeneous data without any specific prior knowledge. For this purpose, first, we identify the localities of each manifold by extracting local descriptors via spectral graph wavelet signatures (SGWS). Then, we propose a manifold regularization framework based on the functional mapping between SGWS descriptors (FMBSD) for finding the pointwise correspondences. The second method is a manifold regularized multimodal classification based on pointwise correspondences (M$^2$CPC) used for the problem of multiclass classification of multimodal heterogeneous, which the correspondences between modalities are determined based on the FMBSD method. The experimental results of evaluating the FMBSD method on three common cross-modal retrieval datasets and evaluating the (M$^2$CPC) method on three benchmark multimodal multiclass classification datasets indicate their effectiveness and superiority over state-of-the-art methods.

[22]  arXiv:2105.05674 [pdf]
Title: Automatic Classification of Games using Support Vector Machine
Comments: 7 pages, 7 figures
Subjects: Machine Learning (cs.LG)

Game developers benefit from availability of custom game genres when doing game market analysis. This information can help them to spot opportunities in market and make them more successful in planning a new game. In this paper we find good classifier for predicting category of a game. Prediction is based on description and title of a game. We use 2443 iOS App Store games as data set to generate a document-term matrix. To reduce the curse of dimensionality we use Latent Semantic Indexing, which, reduces the term dimension to approximately 1/9. Support Vector Machine supervised learning model is fit to pre-processed data. Model parameters are optimized using grid search and 20-fold cross validation. Best model yields to 77% mean accuracy or roughly 70% accuracy with 95% confidence. Developed classifier has been used in-house to assist games market research.

[23]  arXiv:2105.05682 [pdf, other]
Title: Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning
Comments: 7 pages, 5 figures, 3 tables. Accepted by the 30th International Joint Conference on Artificial Intelligence (IJCAI-21)
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI)

Graph representation learning plays a vital role in processing graph-structured data. However, prior arts on graph representation learning heavily rely on the labeling information. To overcome this problem, inspired by the recent success of graph contrastive learning and Siamese networks in visual representation learning, we propose a novel self-supervised approach in this paper to learn node representations by enhancing Siamese self-distillation with multi-scale contrastive learning. Specifically, we first generate two augmented views from the input graph based on local and global perspectives. Then, we employ two objectives called cross-view and cross-network contrastiveness to maximize the agreement between node representations across different views and networks. To demonstrate the effectiveness of our approach, we perform empirical experiments on five real-world datasets. Our method not only achieves new state-of-the-art results but also surpasses some semi-supervised counterparts by large margins.

[24]  arXiv:2105.05717 [pdf, other]
Title: An Efficient Learning Framework For Federated XGBoost Using Secret Sharing And Distributed Optimization
Comments: 24 pages, Special issue of ACM Transactions on Intelligent Systems and Technology
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

XGBoost is one of the most widely used machine learning models in the industry due to its superior learning accuracy and efficiency. Targeting at data isolation issues in the big data problems, it is crucial to deploy a secure and efficient federated XGBoost (FedXGB) model. Existing FedXGB models either have data leakage issues or are only applicable to the two-party setting with heavy communication and computation overheads. In this paper, a lossless multi-party federated XGB learning framework is proposed with a security guarantee, which reshapes the XGBoost's split criterion calculation process under a secret sharing setting and solves the leaf weight calculation problem by leveraging distributed optimization. Remarkably, a thorough analysis of model security is provided as well, and multiple numerical results showcase the superiority of the proposed FedXGB compared with the state-of-the-art models on benchmark datasets.

[25]  arXiv:2105.05728 [pdf, other]
Title: Early prediction of respiratory failure in the intensive care unit
Comments: 14 pages, 5 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The development of respiratory failure is common among patients in intensive care units (ICU). Large data quantities from ICU patient monitoring systems make timely and comprehensive analysis by clinicians difficult but are ideal for automatic processing by machine learning algorithms. Early prediction of respiratory system failure could alert clinicians to patients at risk of respiratory failure and allow for early patient reassessment and treatment adjustment. We propose an early warning system that predicts moderate/severe respiratory failure up to 8 hours in advance. Our system was trained on HiRID-II, a data-set containing more than 60,000 admissions to a tertiary care ICU. An alarm is typically triggered several hours before the beginning of respiratory failure. Our system outperforms a clinical baseline mimicking traditional clinical decision-making based on pulse-oximetric oxygen saturation and the fraction of inspired oxygen. To provide model introspection and diagnostics, we developed an easy-to-use web browser-based system to explore model input data and predictions visually.

[26]  arXiv:2105.05734 [pdf]
Title: The FeatureCloud AI Store for Federated Learning in Biomedicine and Beyond
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Machine Learning (ML) and Artificial Intelligence (AI) have shown promising results in many areas and are driven by the increasing amount of available data. However, this data is often distributed across different institutions and cannot be shared due to privacy concerns. Privacy-preserving methods, such as Federated Learning (FL), allow for training ML models without sharing sensitive data, but their implementation is time-consuming and requires advanced programming skills. Here, we present the FeatureCloud AI Store for FL as an all-in-one platform for biomedical research and other applications. It removes large parts of this complexity for developers and end-users by providing an extensible AI Store with a collection of ready-to-use apps. We show that the federated apps produce similar results to centralized ML, scale well for a typical number of collaborators and can be combined with Secure Multiparty Computation (SMPC), thereby making FL algorithms safely and easily applicable in biomedical and clinical environments.

[27]  arXiv:2105.05735 [pdf, other]
Title: Autoencoding Under Normalization Constraints
Comments: Accepted to ICML 2021. The code is released in this https URL . Any updates regarding the work, e.g. the release of the code, will be broadcasted through the mailing list, see this https URL
Subjects: Machine Learning (cs.LG)

Likelihood is a standard estimate for outlier detection. The specific role of the normalization constraint is to ensure that the out-of-distribution (OOD) regime has a small likelihood when samples are learned using maximum likelihood. Because autoencoders do not possess such a process of normalization, they often fail to recognize outliers even when they are obviously OOD. We propose the Normalized Autoencoder (NAE), a normalized probabilistic model constructed from an autoencoder. The probability density of NAE is defined using the reconstruction error of an autoencoder, which is differently defined in the conventional energy-based model. In our model, normalization is enforced by suppressing the reconstruction of negative samples, significantly improving the outlier detection performance. Our experimental results confirm the efficacy of NAE, both in detecting outliers and in generating in-distribution samples.

[28]  arXiv:2105.05736 [pdf, other]
Title: Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces
Comments: To appear in ICML 2021
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.

[29]  arXiv:2105.05757 [pdf, other]
Title: Exploring the Similarity of Representations in Model-Agnostic Meta-Learning
Comments: Learning to Learn workshop at ICLR 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)

In past years model-agnostic meta-learning (MAML) has been one of the most promising approaches in meta-learning. It can be applied to different kinds of problems, e.g., reinforcement learning, but also shows good results on few-shot learning tasks. Besides their tremendous success in these tasks, it has still not been fully revealed yet, why it works so well. Recent work proposes that MAML rather reuses features than rapidly learns. In this paper, we want to inspire a deeper understanding of this question by analyzing MAML's representation. We apply representation similarity analysis (RSA), a well-established method in neuroscience, to the few-shot learning instantiation of MAML. Although some part of our analysis supports their general results that feature reuse is predominant, we also reveal arguments against their conclusion. The similarity-increase of layers closer to the input layers arises from the learning task itself and not from the model. In addition, the representations after inner gradient steps make a broader change to the representation than the changes during meta-training.

[30]  arXiv:2105.05758 [pdf, other]
Title: DEEMD: Drug Efficacy Estimation against SARS-CoV-2 based on cell Morphology with Deep multiple instance learning
Comments: Supplementary material is appended to the end of the paper
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)

Drug repurposing can accelerate the identification of effective compounds for clinical use against SARS-CoV-2, with the advantage of pre-existing clinical safety data and an established supply chain. RNA viruses such as SARS-CoV-2 manipulate cellular pathways and induce reorganization of subcellular structures to support their life cycle. These morphological changes can be quantified using bioimaging techniques. In this work, we developed DEEMD: a computational pipeline using deep neural network models within a multiple instance learning (MIL) framework, to identify putative treatments effective against SARS-CoV-2 based on morphological analysis of the publicly available RxRx19a dataset. This dataset consists of fluorescence microscopy images of SARS-CoV-2 non-infected cells and infected cells, with and without drug treatment. DEEMD first extracts discriminative morphological features to generate cell morphological profiles from the non-infected and infected cells. These morphological profiles are then used in a statistical model to estimate the applied treatment efficacy on infected cells based on similarities to non-infected cells. DEEMD is capable of localizing infected cells via weak supervision without any expensive pixel-level annotations. DEEMD identifies known SARS-CoV-2 inhibitors, such as Remdesivir and Aloxistatin, supporting the validity of our approach. DEEMD is scalable to process and screen thousands of treatments in parallel and can be explored for other emerging viruses and datasets to rapidly identify candidate antiviral treatments in the future.

[31]  arXiv:2105.05806 [pdf, other]
Title: High-Dimensional Experimental Design and Kernel Bandits
Subjects: Machine Learning (cs.LG)

In recent years methods from optimal linear experimental design have been leveraged to obtain state of the art results for linear bandits. A design returned from an objective such as $G$-optimal design is actually a probability distribution over a pool of potential measurement vectors. Consequently, one nuisance of the approach is the task of converting this continuous probability distribution into a discrete assignment of $N$ measurements. While sophisticated rounding techniques have been proposed, in $d$ dimensions they require $N$ to be at least $d$, $d \log(\log(d))$, or $d^2$ based on the sub-optimality of the solution. In this paper we are interested in settings where $N$ may be much less than $d$, such as in experimental design in an RKHS where $d$ may be effectively infinite. In this work, we propose a rounding procedure that frees $N$ of any dependence on the dimension $d$, while achieving nearly the same performance guarantees of existing rounding procedures. We evaluate the procedure against a baseline that projects the problem to a lower dimensional space and performs rounding which requires $N$ to just be at least a notion of the effective dimension. We also leverage our new approach in a new algorithm for kernelized bandits to obtain state of the art results for regret minimization and pure exploration. An advantage of our approach over existing UCB-like approaches is that our kernel bandit algorithms are also robust to model misspecification.

[32]  arXiv:2105.05817 [pdf, other]
Title: Adversarial Reinforcement Learning in Dynamic Channel Access and Power Control
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT)

Deep reinforcement learning (DRL) has recently been used to perform efficient resource allocation in wireless communications. In this paper, the vulnerabilities of such DRL agents to adversarial attacks is studied. In particular, we consider multiple DRL agents that perform both dynamic channel access and power control in wireless interference channels. For these victim DRL agents, we design a jammer, which is also a DRL agent. We propose an adversarial jamming attack scheme that utilizes a listening phase and significantly degrades the users' sum rate. Subsequently, we develop an ensemble policy defense strategy against such a jamming attacker by reloading models (saved during retraining) that have minimum transition correlation.

Cross-lists for Thu, 13 May 21

[33]  arXiv:2006.11812 (cross-list from cs.CV) [pdf, other]
Title: Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper tackles the problem of human action recognition, defined as classifying which action is displayed in a trimmed sequence, from skeletal data. Albeit state-of-the-art approaches designed for this application are all supervised, in this paper we pursue a more challenging direction: Solving the problem with unsupervised learning. To this end, we propose a novel subspace clustering method, which exploits covariance matrix to enhance the action's discriminability and a timestamp pruning approach that allow us to better handle the temporal dimension of the data. Through a broad experimental validation, we show that our computational pipeline surpasses existing unsupervised approaches but also can result in favorable performances as compared to supervised methods.

[34]  arXiv:2105.05278 (cross-list from physics.chem-ph) [pdf]
Title: Polygrammar: Grammar for Digital Polymer Representation and Generation
Subjects: Chemical Physics (physics.chem-ph); Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

Polymers are widely-studied materials with diverse properties and applications determined by different molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches are unable to offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, we present a parametric, context-sensitive grammar designed specifically for the representation and generation of polymers. As a demonstrative example, we implement our grammar for polyurethanes. Using our symbolic hypergraph representation and 14 simple production rules, our PolyGrammar is able to represent and generate all valid polyurethane structures. We also present an algorithm to translate any polyurethane structure from the popular SMILES string format into our PolyGrammar representation. We test the representative power of PolyGrammar by translating a dataset of over 600 polyurethane samples collected from literature. Furthermore, we show that PolyGrammar can be easily extended to the other copolymers and homopolymers such as polyacrylates. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, our PolyGrammar takes an important step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.

[35]  arXiv:2105.05303 (cross-list from stat.AP) [pdf]
Title: Development of an expected possession value model to analyse team attacking performances in rugby league
Subjects: Applications (stat.AP); Machine Learning (cs.LG)

This study aimed to provide a framework to evaluate team attacking performances in rugby league using 59,233 plays from 180 Super League matches via expected possession value (EPV) models. The EPV-308 split the pitch into 308 5m x 5m zones, the EPV-77 split the pitch into 77 10m x 10m zones and the EPV-19 split the pitch in 19 zones of variable size dependent on the total zone value generated during a match. Attacking possessions were considered as Markov Chains, allowing the value of each zone visited to be estimated based on the outcome of the possession. The Kullback-Leibler Divergence was used to evaluate the reproducibility of the value generated from each zone (the reward distribution) by teams between matches. The EPV-308 had the greatest variability and lowest reproducibility, compared to EPV-77 and EPV-19. When six previous matches were considered, the team's subsequent match attacking performances had a similar reward distribution for EPV-19, EPV-77 and EPV-308 on 95 +/- 4%, 51 +/- 12% and 0 +/- 0% of occasions. This study supports the use of EPV-19 to evaluate team attacking performance in rugby league and provides a simple framework through which attacking performances can be compared between teams.

[36]  arXiv:2105.05318 (cross-list from eess.IV) [pdf, other]
Title: GANs for Medical Image Synthesis: An Empirical Study
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Generative Adversarial Networks (GANs) have become increasingly powerful, generating mind-blowing photorealistic images that mimic the content of datasets they were trained to replicate. One recurrent theme in medical imaging is whether GANs can also be effective at generating workable medical data as they are for generating realistic RGB images. In this paper, we perform a multi-GAN and multi-application study to gauge the benefits of GANs in medical imaging. We tested various GAN architectures from basic DCGAN to more sophisticated style-based GANs on three medical imaging modalities and organs namely : cardiac cine-MRI, liver CT and RGB retina images. GANs were trained on well-known and widely utilized datasets from which their FID score were computed to measure the visual acuity of their generated images. We further tested their usefulness by measuring the segmentation accuracy of a U-Net trained on these generated images.
Results reveal that GANs are far from being equal as some are ill-suited for medical imaging applications while others are much better off. The top-performing GANs are capable of generating realistic-looking medical images by FID standards that can fool trained experts in a visual Turing test and comply to some metrics. However, segmentation results suggests that no GAN is capable of reproducing the full richness of a medical datasets.

[37]  arXiv:2105.05320 (cross-list from cs.SI) [pdf, ps, other]
Title: Seeing All From a Few: Nodes Selection Using Graph Pooling for Graph Clustering
Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Graph clustering aiming to obtain a partition of data using the graph information, has received considerable attention in recent years. However, noisy edges and nodes in the graph may make the clustering results worse. In this paper, we propose a novel dual graph embedding network(DGEN) to improve the robustness of the graph clustering to the noisy nodes and edges. DGEN is designed as a two-step graph encoder connected by a graph pooling layer, which learns the graph embedding of the selected nodes. Based on the assumption that a node and its nearest neighbors should belong to the same cluster, we devise the neighbor cluster pooling(NCPool) to select the most informative subset of vertices based on the clustering assignments of nodes and their nearest neighbor. This can effectively alleviate the impact of the noise edge to the clustering. After obtaining the clustering assignments of the selected nodes, a classifier is trained using these selected nodes and the final clustering assignments for all the nodes can be obtained by this classifier. Experiments on three benchmark graph datasets demonstrate the superiority compared with several state-of-the-art algorithms.

[38]  arXiv:2105.05330 (cross-list from cs.AI) [pdf, other]
Title: Neuro-Symbolic Artificial Intelligence Current Trends
Comments: under review
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Neuro-Symbolic Artificial Intelligence -- the combination of symbolic methods with methods that are based on artificial neural networks -- has a long-standing history. In this article, we provide a structured overview of current trends, by means of categorizing recent publications from key conferences. The article is meant to serve as a convenient starting point for research on the general topic.

[39]  arXiv:2105.05345 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised Representation Learning from Pathology Images with Multi-directional Contrastive Predictive Coding
Comments: 5 pages, 4 figures, presented at IEEE International Symposium on Biomedical Imaging (ISBI) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Digital pathology tasks have benefited greatly from modern deep learning algorithms. However, their need for large quantities of annotated data has been identified as a key challenge. This need for data can be countered by using unsupervised learning in situations where data are abundant but access to annotations is limited. Feature representations learned from unannotated data using contrastive predictive coding (CPC) have been shown to enable classifiers to obtain state of the art performance from relatively small amounts of annotated computer vision data. We present a modification to the CPC framework for use with digital pathology patches. This is achieved by introducing an alternative mask for building the latent context and using a multi-directional PixelCNN autoregressor. To demonstrate our proposed method we learn feature representations from the Patch Camelyon histology dataset. We show that our proposed modification can yield improved deep classification of histology patches.

[40]  arXiv:2105.05358 (cross-list from eess.SY) [pdf]
Title: Computational Simulation and Analysis of Major Control Parameters of Time-Dependent PV/T Collectors
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

In order to improve performance of photovoltaic/thermal (or PV/T for simplicity) collectors, this paper firstly validated a previous computational thermal model and then introduced an improved computational thermal model to investigate the effects of the major control parameters on the thermal performance of PV/T collectors, including solar cell temperature, back surface temperature, and outlet water temperature. Besides, a computational electrical model of PV/T system was also introduced to elaborate the relationship of voltage, current and power of a PV module (MSX60 polycrystalline solar cell) used in an experiment in the literature. Simulation results agree with the experimental data very well. The effects of the time-steps from 1 hour to minute, which is closed to the real time, were also reported. At last, several suggestions to improve the efficiency of PV/T system were illustrated.

[41]  arXiv:2105.05361 (cross-list from cs.CL) [pdf, other]
Title: The Summary Loop: Learning to Write Abstractive Summaries Without Examples
Comments: ACL2020, 16 pages, 9 figures
Journal-ref: Association for Computational Linguistics (2020) 5135-5150
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This work presents a new approach to unsupervised abstractive summarization based on maximizing a combination of coverage and fluency for a given length constraint. It introduces a novel method that encourages the inclusion of key terms from the original document into the summary: key terms are masked out of the original document and must be filled in by a coverage model using the current generated summary. A novel unsupervised training procedure leverages this coverage model along with a fluency model to generate and score summaries. When tested on popular news summarization datasets, the method outperforms previous unsupervised methods by more than 2 R-1 points, and approaches results of competitive supervised methods. Our model attains higher levels of abstraction with copied passages roughly two times shorter than prior work, and learns to compress and merge sentences without supervision.

[42]  arXiv:2105.05432 (cross-list from eess.SY) [pdf, other]
Title: Discrete-time Contraction-based Control of Nonlinear Systems with Parametric Uncertainties using Neural Networks
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)

Flexible manufacturing in the process industry requires control systems to achieve time-varying setpoints (e.g., product specifications) based on market demand. Contraction theory provides a useful framework for reference-independent system analysis and tracking control for nonlinear systems. However, determination of the control contraction metrics and control laws can be very difficult for general nonlinear systems. This work develops an approach to discrete-time contraction analysis and control using neural networks. The methodology involves training a neural network to learn a contraction metric and feedback gain. The resulting contraction-based controller embeds the trained neural network and is capable of achieving efficient tracking of time-varying references, with a full range of model uncertainty, without the need for controller structure redesign. This is a robust approach that can deal with bounded parametric uncertainties in the process model, which are commonly encountered in industrial (chemical) processes. Simulation examples are provided to illustrate the above approach.

[43]  arXiv:2105.05489 (cross-list from stat.ML) [pdf, other]
Title: Multiscale Invertible Generative Networks for High-Dimensional Bayesian Inference
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Computation (stat.CO)

We propose a Multiscale Invertible Generative Network (MsIGN) and associated training algorithm that leverages multiscale structure to solve high-dimensional Bayesian inference. To address the curse of dimensionality, MsIGN exploits the low-dimensional nature of the posterior, and generates samples from coarse to fine scale (low to high dimension) by iteratively upsampling and refining samples. MsIGN is trained in a multi-stage manner to minimize the Jeffreys divergence, which avoids mode dropping in high-dimensional cases. On two high-dimensional Bayesian inverse problems, we show superior performance of MsIGN over previous approaches in posterior approximation and multiple mode capture. On the natural image synthesis task, MsIGN achieves superior performance in bits-per-dimension over baseline models and yields great interpret-ability of its neurons in intermediate layers.

[44]  arXiv:2105.05498 (cross-list from cs.CL) [pdf, other]
Title: Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction
Comments: To appear in ACL 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Generating accurate terminology is a crucial component for the practicality and reliability of neural machine translation (NMT) systems. To address this, lexically constrained NMT explores various methods to ensure pre-specified words and phrases to appear in the translations. In many cases, however, those methods are evaluated on general domain corpora, where the terms are mostly uni- and bi-grams (>98%). In this paper, we instead tackle a more challenging setup consisting of domain-specific corpora with much longer n-gram and highly specialized terms. To encourage span-level representations in generation, we additionally impose a source-sentence conditioned masked span prediction loss in the decoder and observe improvements on both terminology translation as well as BLEU scores. Experimental results on three domain-specific corpora in two language pairs demonstrate that the proposed training scheme can improve the performance of existing lexically constrained methods that can operate both with or without a term dictionary at test time.

[45]  arXiv:2105.05540 (cross-list from cs.IT) [pdf, other]
Title: Cyclically Equivariant Neural Decoders for Cyclic Codes
Authors: Xiangyu Chen, Min Ye
Comments: Accepted for long presentation at ICML 2021. Code available at this https URL
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG)

Neural decoders were introduced as a generalization of the classic Belief Propagation (BP) decoding algorithms, where the Trellis graph in the BP algorithm is viewed as a neural network, and the weights in the Trellis graph are optimized by training the neural network. In this work, we propose a novel neural decoder for cyclic codes by exploiting their cyclically invariant property. More precisely, we impose a shift invariant structure on the weights of our neural decoder so that any cyclic shift of inputs results in the same cyclic shift of outputs. Extensive simulations with BCH codes and punctured Reed-Muller (RM) codes show that our new decoder consistently outperforms previous neural decoders when decoding cyclic codes. Finally, we propose a list decoding procedure that can significantly reduce the decoding error probability for BCH codes and punctured RM codes. For certain high-rate codes, the gap between our list decoder and the Maximum Likelihood decoder is less than $0.1$dB. Code available at https://github.com/cyclicallyneuraldecoder/CyclicallyEquivariantNeuralDecoders

[46]  arXiv:2105.05541 (cross-list from cs.CL) [pdf, other]
Title: Evaluating Gender Bias in Natural Language Inference
Comments: NeurIPS 2020 Workshop on Dataset Curation and Security
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Gender-bias stereotypes have recently raised significant ethical concerns in natural language processing. However, progress in detection and evaluation of gender bias in natural language understanding through inference is limited and requires further investigation. In this work, we propose an evaluation methodology to measure these biases by constructing a challenge task that involves pairing gender-neutral premises against a gender-specific hypothesis. We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations. Our findings suggest that three models (BERT, RoBERTa, BART) trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors. We also find that debiasing techniques such as augmenting the training dataset to ensure a gender-balanced dataset can help reduce such bias in certain cases.

[47]  arXiv:2105.05564 (cross-list from cs.NI) [pdf, other]
Title: A Survey on Reinforcement Learning-Aided Caching in Mobile Edge Networks
Comments: 26 pages, 7 figures
Subjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT); Machine Learning (cs.LG)

Mobile networks are experiencing tremendous increase in data volume and user density. An efficient technique to alleviate this issue is to bring the data closer to the users by exploiting the caches of edge network nodes, such as fixed or mobile access points and even user devices. Meanwhile, the fusion of machine learning and wireless networks offers a viable way for network optimization as opposed to traditional optimization approaches which incur high complexity, or fail to provide optimal solutions. Among the various machine learning categories, reinforcement learning operates in an online and autonomous manner without relying on large sets of historical data for training. In this survey, reinforcement learning-aided mobile edge caching is presented, aiming at highlighting the achieved network gains over conventional caching approaches. Taking into account the heterogeneity of sixth generation (6G) networks in various wireless settings, such as fixed, vehicular and flying networks, learning-aided edge caching is presented, departing from traditional architectures. Furthermore, a categorization according to the desirable performance metric, such as spectral, energy and caching efficiency, average delay, and backhaul and fronthaul offloading is provided. Finally, several open issues are discussed, targeting to stimulate further interest in this important research field.

[48]  arXiv:2105.05566 (cross-list from quant-ph) [pdf, other]
Title: Structural risk minimization for quantum linear classifiers
Comments: 28 pages, 3 figures
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

Quantum machine learning (QML) stands out as one of the typically highlighted candidates for quantum computing's near-term "killer application". In this context, QML models based on parameterized quantum circuits comprise a family of machine learning models that are well suited for implementations on near-term devices and that can potentially harness computational powers beyond what is efficiently achievable on a classical computer. However, how to best use these models -- e.g., how to control their expressivity to best balance between training accuracy and generalization performance -- is far from understood. In this paper we investigate capacity measures of two closely related QML models called explicit and implicit quantum linear classifiers (also called the quantum variational method and quantum kernel estimator) with the objective of identifying new ways to implement structural risk minimization -- i.e., how to balance between training accuracy and generalization performance. In particular, we identify that the rank and Frobenius norm of the observables used in the QML model closely control the model's capacity. Additionally, we theoretically investigate the effect that these model parameters have on the training accuracy of the QML model. Specifically, we show that there exists datasets that require a high-rank observable for correct classification, and that there exists datasets that can only be classified with a given margin using an observable of at least a certain Frobenius norm. Our results provide new options for performing structural risk minimization for QML models.

[49]  arXiv:2105.05582 (cross-list from cs.CL) [pdf, other]
Title: Discrete representations in neural models of spoken language
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

The distributed and continuous representations used by neural networks are at odds with representations employed in linguistics, which are typically symbolic. Vector quantization has been proposed as a way to induce discrete neural representations that are closer in nature to their linguistic counterparts. However, it is not clear which metrics are the best-suited to analyze such discrete representations. We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language. We perform a systematic analysis of the impact of (i) architectural choices, (ii) the learning objective and training dataset, and (iii) the evaluation metric. We find that the different evaluation metrics can give inconsistent results. In particular, we find that the use of minimal pairs of phoneme triples as stimuli during evaluation disadvantages larger embeddings, unlike metrics applied to complete utterances.

[50]  arXiv:2105.05596 (cross-list from cs.CL) [pdf, other]
Title: Unsupervised Knowledge Graph Alignment by Probabilistic Reasoning and Semantic Embedding
Comments: Accepted by IJCAI 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Knowledge Graph (KG) alignment is to discover the mappings (i.e., equivalent entities, relations, and others) between two KGs. The existing methods can be divided into the embedding-based models, and the conventional reasoning and lexical matching based systems. The former compute the similarity of entities via their cross-KG embeddings, but they usually rely on an ideal supervised learning setting for good performance and lack appropriate reasoning to avoid logically wrong mappings; while the latter address the reasoning issue but are poor at utilizing the KG graph structures and the entity contexts. In this study, we aim at combining the above two solutions and thus propose an iterative framework named PRASE which is based on probabilistic reasoning and semantic embedding. It learns the KG embeddings via entity mappings from a probabilistic reasoning system named PARIS, and feeds the resultant entity mappings and embeddings back into PARIS for augmentation. The PRASE framework is compatible with different embedding-based models, and our experiments on multiple datasets have demonstrated its state-of-the-art performance.

[51]  arXiv:2105.05601 (cross-list from cs.CL) [pdf, other]
Title: OutFlip: Generating Out-of-Domain Samples for Unknown Intent Detection with Natural Language Attack
Comments: 9 pages, 3 figures; to be appear in ACL Findings of ACL-IJCNLP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Out-of-domain (OOD) input detection is vital in a task-oriented dialogue system since the acceptance of unsupported inputs could lead to an incorrect response of the system. This paper proposes OutFlip, a method to generate out-of-domain samples using only in-domain training dataset automatically. A white-box natural language attack method HotFlip is revised to generate out-of-domain samples instead of adversarial examples. Our evaluation results showed that integrating OutFlip-generated out-of-domain samples into the training dataset could significantly improve an intent classification model's out-of-domain detection performance.

[52]  arXiv:2105.05605 (cross-list from cs.CL) [pdf, other]
Title: Priberam Labs at the NTCIR-15 SHINRA2020-ML: Classification Task
Comments: Presented at NTCIR-15 conference (2020)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Wikipedia is an online encyclopedia available in 285 languages. It composes an extremely relevant Knowledge Base (KB), which could be leveraged by automatic systems for several purposes. However, the structure and organisation of such information are not prone to automatic parsing and understanding and it is, therefore, necessary to structure this knowledge. The goal of the current SHINRA2020-ML task is to leverage Wikipedia pages in order to categorise their corresponding entities across 268 hierarchical categories, belonging to the Extended Named Entity (ENE) ontology. In this work, we propose three distinct models based on the contextualised embeddings yielded by Multilingual BERT. We explore the performances of a linear layer with and without explicit usage of the ontology's hierarchy, and a Gated Recurrent Units (GRU) layer. We also test several pooling strategies to leverage BERT's embeddings and selection criteria based on the labels' scores. We were able to achieve good performance across a large variety of languages, including those not seen during the fine-tuning process (zero-shot languages).

[53]  arXiv:2105.05614 (cross-list from cs.CL) [pdf, other]
Title: Priberam at MESINESP Multi-label Classification of Medical Texts Task
Comments: Presented at CLEF2020 conference (2020)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Medical articles provide current state of the art treatments and diagnostics to many medical practitioners and professionals. Existing public databases such as MEDLINE contain over 27 million articles, making it difficult to extract relevant content without the use of efficient search engines. Information retrieval tools are crucial in order to navigate and provide meaningful recommendations for articles and treatments. Classifying these articles into broader medical topics can improve the retrieval of related articles. The set of medical labels considered for the MESINESP task is on the order of several thousands of labels (DeCS codes), which falls under the extreme multi-label classification problem. The heterogeneous and highly hierarchical structure of medical topics makes the task of manually classifying articles extremely laborious and costly. It is, therefore, crucial to automate the process of classification. Typical machine learning algorithms become computationally demanding with such a large number of labels and achieving better recall on such datasets becomes an unsolved problem.
This work presents Priberam's participation at the BioASQ task Mesinesp. We address the large multi-label classification problem through the use of four different models: a Support Vector Machine (SVM), a customised search engine (Priberam Search), a BERT based classifier, and a SVM-rank ensemble of all the previous models. Results demonstrate that all three individual models perform well and the best performance is achieved by their ensemble, granting Priberam the 6th place in the present challenge and making it the 2nd best team.

[54]  arXiv:2105.05633 (cross-list from cs.CV) [pdf, other]
Title: Segmenter: Transformer for Semantic Segmentation
Comments: Code available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Image segmentation is often ambiguous at the level of individual image patches and requires contextual information to reach label consensus. In this paper we introduce Segmenter, a transformer model for semantic segmentation. In contrast to convolution based approaches, our approach allows to model global context already at the first layer and throughout the network. We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation. To do so, we rely on the output embeddings corresponding to image patches and obtain class labels from these embeddings with a point-wise linear decoder or a mask transformer decoder. We leverage models pre-trained for image classification and show that we can fine-tune them on moderate sized datasets available for semantic segmentation. The linear decoder allows to obtain excellent results already, but the performance can be further improved by a mask transformer generating class masks. We conduct an extensive ablation study to show the impact of the different parameters, in particular the performance is better for large models and small patch sizes. Segmenter attains excellent results for semantic segmentation. It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.

[55]  arXiv:2105.05641 (cross-list from cs.CL) [pdf, other]
Title: How Reliable are Model Diagnostics?
Comments: ACL 2021 Findings
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

In the pursuit of a deeper understanding of a model's behaviour, there is recent impetus for developing suites of probes aimed at diagnosing models beyond simple metrics like accuracy or BLEU. This paper takes a step back and asks an important and timely question: how reliable are these diagnostics in providing insight into models and training setups? We critically examine three recent diagnostic tests for pre-trained language models, and find that likelihood-based and representation-based model diagnostics are not yet as reliable as previously assumed. Based on our empirical findings, we also formulate recommendations for practitioners and researchers.

[56]  arXiv:2105.05648 (cross-list from stat.ML) [pdf, other]
Title: Look-Ahead Screening Rules for the Lasso
Authors: Johan Larsson
Comments: EYSM 2021 short paper; 6 pages, 2 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)

The lasso is a popular method to induce shrinkage and sparsity in the solution vector (coefficients) of regression problems, particularly when there are many predictors relative to the number of observations. Solving the lasso in this high-dimensional setting can, however, be computationally demanding. Fortunately, this demand can be alleviated via the use of screening rules that discard predictors prior to fitting the model, leading to a reduced problem to be solved. In this paper, we present a new screening strategy: look-ahead screening. Our method uses safe screening rules to find a range of penalty values for which a given predictor cannot enter the model, thereby screening predictors along the remainder of the path. In experiments we show that these look-ahead screening rules improve the performance of existing screening strategies.

[57]  arXiv:2105.05650 (cross-list from cond-mat.stat-mech) [pdf, other]
Title: Unbiased Monte Carlo Cluster Updates with Autoregressive Neural Networks
Comments: 8 pages, 5 figures
Subjects: Statistical Mechanics (cond-mat.stat-mech); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG); Machine Learning (stat.ML)

Efficient sampling of complex high-dimensional probability densities is a central task in computational science. Machine Learning techniques based on autoregressive neural networks have been recently shown to provide good approximations of probability distributions of interest in physics. In this work, we propose a systematic way to remove the intrinsic bias associated with these variational approximations, combining it with Markov-chain Monte Carlo in an automatic scheme to efficiently generate cluster updates, which is particularly useful for models for which no efficient cluster update scheme is known. Our approach is based on symmetry-enforced cluster updates building on the neural-network representation of conditional probabilities. We demonstrate that such finite-cluster updates are crucial to circumvent ergodicity problems associated with global neural updates. We test our method for first- and second-order phase transitions in classical spin systems, proving in particular its viability for critical systems, or in the presence of metastable states.

[58]  arXiv:2105.05686 (cross-list from cs.IR) [pdf, ps, other]
Title: Yes, BM25 is a Strong Baseline for Legal Case Retrieval
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG)

We describe our single submission to task 1 of COLIEE 2021. Our vanilla BM25 got second place, well above the median of submissions. Code is available at https://github.com/neuralmind-ai/coliee.

[59]  arXiv:2105.05690 (cross-list from math.NA) [pdf, ps, other]
Title: Machine learning moment closure models for the radiative transfer equation I: directly learning a gradient based closure
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)

In this paper, we take a data-driven approach and apply machine learning to the moment closure problem for radiative transfer equation in slab geometry. Instead of learning the unclosed high order moment, we propose to directly learn the gradient of the high order moment using neural networks. This new approach is consistent with the exact closure we derive for the free streaming limit and also provides a natural output normalization. A variety of benchmark tests, including the variable scattering problem, the Gaussian source problem and the two material problem, show both good accuracy and generalizability of our machine learning closure model.

[60]  arXiv:2105.05699 (cross-list from cs.DB) [pdf, other]
Title: Automating Data Science: Prospects and Challenges
Comments: 19 pages, 3 figures. Accepted for publication (April 2021) in Communications of the ACM
Subjects: Databases (cs.DB); Machine Learning (cs.LG)

Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process.
Key insights:
* Automation in data science aims to facilitate and transform the work of data scientists, not to replace them.
* Important parts of data science are already being automated, especially in the modeling stages, where techniques such as automated machine learning (AutoML) are gaining traction.
* Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.

[61]  arXiv:2105.05716 (cross-list from cs.AI) [pdf, other]
Title: Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Model based reinforcement learning (MBRL) uses an imperfect model of the world to imagine trajectories of future states and plan the best actions to maximize a reward function. These trajectories are imperfect and MBRL attempts to overcome this by relying on model predictive control (MPC) to continuously re-imagine trajectories from scratch. Such re-generation of imagined trajectories carries the major computational cost and increasing complexity in tasks with longer receding horizon. This paper aims to investigate how far in the future the imagined trajectories can be relied upon while still maintaining acceptable reward. Firstly, an error analysis is presented for systematic skipping recalculations for varying number of consecutive steps.% in several challenging benchmark control tasks. Secondly, we propose two methods offering when to trust and act upon imagined trajectories, looking at recent errors with respect to expectations, or comparing the confidence in an action imagined against its execution. Thirdly, we evaluate the effects of acting upon imagination while training the model of the world. Results show that acting upon imagination can reduce calculations by at least 20% and up to 80%, depending on the environment, while retaining acceptable reward.

[62]  arXiv:2105.05733 (cross-list from cs.IR) [pdf, other]
Title: Thematic recommendations on knowledge graphs using multilayer networks
Comments: 20 pages, 5 figures
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)

We present a framework to generate and evaluate thematic recommendations based on multilayer network representations of knowledge graphs (KGs). In this representation, each layer encodes a different type of relationship in the KG, and directed interlayer couplings connect the same entity in different roles. The relative importance of different types of connections is captured by an intuitive salience matrix that can be estimated from data, tuned to incorporate domain knowledge, address different use cases, or respect business logic.
We apply an adaptation of the personalised PageRank algorithm to multilayer models of KGs to generate item-item recommendations. These recommendations reflect the knowledge we hold about the content and are suitable for thematic and/or cold-start recommendation settings. Evaluating thematic recommendations from user data presents unique challenges that we address by developing a method to evaluate recommendations relying on user-item ratings, yet respecting their thematic nature. We also show that the salience matrix can be estimated from user data. We demonstrate the utility of our methods by significantly improving consumption metrics in an AB test where collaborative filtering delivered subpar performance. We also apply our approach to movie recommendation using publicly-available data to ensure the reproducibility of our results. We demonstrate that our approach outperforms existing thematic recommendation methods and is even competitive with collaborative filtering approaches.

[63]  arXiv:2105.05737 (cross-list from cs.CL) [pdf, other]
Title: Encoding Explanatory Knowledge for Zero-shot Science Question Answering
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

This paper describes N-XKT (Neural encoding based on eXplanatory Knowledge Transfer), a novel method for the automatic transfer of explanatory knowledge through neural encoding mechanisms. We demonstrate that N-XKT is able to improve accuracy and generalization on science Question Answering (QA). Specifically, by leveraging facts from background explanatory knowledge corpora, the N-XKT model shows a clear improvement on zero-shot QA. Furthermore, we show that N-XKT can be fine-tuned on a target QA dataset, enabling faster convergence and more accurate results. A systematic analysis is conducted to quantitatively analyze the performance of the N-XKT model and the impact of different categories of knowledge on the zero-shot generalization task.

[64]  arXiv:2105.05790 (cross-list from cs.CL) [pdf, other]
Title: The Greedy and Recursive Search for Morphological Productivity
Comments: CogSci 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

As children acquire the knowledge of their language's morphology, they invariably discover the productive processes that can generalize to new words. Morphological learning is made challenging by the fact that even fully productive rules have exceptions, as in the well-known case of English past tense verbs, which features the -ed rule against the irregular verbs. The Tolerance Principle is a recent proposal that provides a precise threshold of exceptions that a productive rule can withstand. Its empirical application so far, however, requires the researcher to fully specify rules defined over a set of words. We propose a greedy search model that automatically hypothesizes rules and evaluates their productivity over a vocabulary. When the search for broader productivity fails, the model recursively subdivides the vocabulary and continues the search for productivity over narrower rules. Trained on psychologically realistic data from child-directed input, our model displays developmental patterns observed in child morphology acquisition, including the notoriously complex case of German noun pluralization. It also produces responses to nonce words that, despite receiving only a fraction of the training data, are more similar to those of human subjects than current neural network models' responses are.

[65]  arXiv:2105.05791 (cross-list from cs.SD) [pdf, other]
Title: Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms
Comments: Submitted to Signals (ISSN 2624-6120)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal, in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores. Experimental results showed that the proposed regularized model outperformed the conventional RNN-based model in terms of the tatum-level error rate and the frame-level F-measure, even when only a limited amount of paired data was available so that the non-regularized model underperformed the RNN-based model.

[66]  arXiv:2105.05821 (cross-list from cs.AR) [pdf, other]
Title: SimNet: Computer Architecture Simulation using Machine Learning
Subjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)

While cycle-accurate simulators are essential tools for architecture research, design, and development, their practicality is limited by an extremely long time-to-solution for realistic problems under investigation. This work describes a concerted effort, where machine learning (ML) is used to accelerate discrete-event simulation. First, an ML-based instruction latency prediction framework that accounts for both static instruction/architecture properties and dynamic execution context is constructed. Then, a GPU-accelerated parallel simulator is implemented based on the proposed instruction latency predictor, and its simulation accuracy and throughput are validated and evaluated against a state-of-the-art simulator. Leveraging modern GPUs, the ML-based simulator outperforms traditional simulators significantly.

[67]  arXiv:2105.05827 (cross-list from eess.IV) [pdf, other]
Title: 20-fold Accelerated 7T fMRI Using Referenceless Self-Supervised Deep Learning Reconstruction
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP); Medical Physics (physics.med-ph)

High spatial and temporal resolution across the whole brain is essential to accurately resolve neural activities in fMRI. Therefore, accelerated imaging techniques target improved coverage with high spatio-temporal resolution. Simultaneous multi-slice (SMS) imaging combined with in-plane acceleration are used in large studies that involve ultrahigh field fMRI, such as the Human Connectome Project. However, for even higher acceleration rates, these methods cannot be reliably utilized due to aliasing and noise artifacts. Deep learning (DL) reconstruction techniques have recently gained substantial interest for improving highly-accelerated MRI. Supervised learning of DL reconstructions generally requires fully-sampled training datasets, which is not available for high-resolution fMRI studies. To tackle this challenge, self-supervised learning has been proposed for training of DL reconstruction with only undersampled datasets, showing similar performance to supervised learning. In this study, we utilize a self-supervised physics-guided DL reconstruction on a 5-fold SMS and 4-fold in-plane accelerated 7T fMRI data. Our results show that our self-supervised DL reconstruction produce high-quality images at this 20-fold acceleration, substantially improving on existing methods, while showing similar functional precision and temporal effects in the subsequent analysis compared to a standard 10-fold accelerated acquisition.

[68]  arXiv:2105.05837 (cross-list from cs.CV) [pdf, other]
Title: When Does Contrastive Visual Representation Learning Work?
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Recent self-supervised representation learning techniques have largely closed the gap between supervised and unsupervised learning on ImageNet classification. While the particulars of pretraining on ImageNet are now relatively well understood, the field still lacks widely accepted best practices for replicating this success on other datasets. As a first step in this direction, we study contrastive self-supervised learning on four diverse large-scale datasets. By looking through the lenses of data quantity, data domain, data quality, and task granularity, we provide new insights into the necessary conditions for successful self-supervised learning. Our key findings include observations such as: (i) the benefit of additional pretraining data beyond 500k images is modest, (ii) adding pretraining images from another domain does not lead to more general representations, (iii) corrupted pretraining images have a disparate impact on supervised and self-supervised pretraining, and (iv) contrastive learning lags far behind supervised learning on fine-grained visual classification tasks.

[69]  arXiv:2105.05842 (cross-list from stat.ML) [pdf, other]
Title: Kernel Thinning
Comments: 55 pages, 4 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

We introduce kernel thinning, a simple algorithm for generating better-than-Monte-Carlo approximations to distributions $\mathbb{P}$ on $\mathbb{R}^d$. Given $n$ input points, a suitable reproducing kernel $\mathbf{k}$, and $\mathcal{O}(n^2)$ time, kernel thinning returns $\sqrt{n}$ points with comparable integration error for every function in the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-\frac{1}{2}}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} \sqrt{(\log n)^{d+1}\log\log n})$ for sub-exponential $\mathbb{P}$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-\frac14})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning.

[70]  arXiv:2105.05847 (cross-list from cs.CV) [pdf, other]
Title: Learning to Generate Novel Scene Compositions from Single Images and Videos
Comments: The AI for Content Creation (AICC) workshop at CVPR 2021. The full 8-page version of this submission is available at arXiv:2103.13389
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism. This allows synthesis of visually plausible, novel compositions of a scene, with varying content and layout, while preserving the context of the original sample. Compared to previous single-image GAN models, One-Shot GAN achieves higher diversity and quality of synthesis. It is also not restricted to the single image setting, successfully learning in the introduced setting of a single video.

Replacements for Thu, 13 May 21

[71]  arXiv:1907.12439 (replaced) [pdf, other]
Title: Hindsight Trust Region Policy Optimization
Comments: Accepted by IJCAI 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[72]  arXiv:2002.03129 (replaced) [pdf, other]
Title: GLSearch: Maximum Common Subgraph Detection via Learning to Search
Comments: Accepted by ICML 2021
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[73]  arXiv:2006.03463 (replaced) [pdf, other]
Title: Sponge Examples: Energy-Latency Attacks on Neural Networks
Comments: Accepted at 6th IEEE European Symposium on Security and Privacy (EuroS&P)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[74]  arXiv:2006.14026 (replaced) [pdf, other]
Title: Subpopulation Data Poisoning Attacks
Comments: May12 update: add sever + backdoor defenses, comparison to witches' brew attack, better comparison to related work, transferability of representations for cmatch
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[75]  arXiv:2007.14120 (replaced) [pdf, other]
Title: Reachable Sets of Classifiers and Regression Models: (Non-)Robustness Analysis and Robust Training
Comments: Published as a journal paper at ECML PKDD 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[76]  arXiv:2010.04683 (replaced) [pdf, other]
Title: Smooth Variational Graph Embeddings for Efficient Neural Architecture Search
Comments: 8 pages, 3 figures, 5 tables. Camera-Ready Version for IJCNN 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[77]  arXiv:2011.00467 (replaced) [pdf, other]
Title: Differentially Private Bayesian Inference for Generalized Linear Models
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
[78]  arXiv:2012.01118 (replaced) [pdf, other]
Title: Neural Teleportation
Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[79]  arXiv:2012.13063 (replaced) [pdf]
Title: Decentralized Federated Learning via Mutual Knowledge Transfer
Comments: Published in IEEE Internet of Things Journal
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[80]  arXiv:2012.15085 (replaced) [pdf, other]
Title: Is Pessimism Provably Efficient for Offline RL?
Comments: 60 pages, 3 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Statistics Theory (math.ST); Machine Learning (stat.ML)
[81]  arXiv:2101.11863 (replaced) [pdf, other]
Title: Exploiting the Hidden Tasks of GANs: Making Implicit Subproblems Explicit
Authors: Romann M. Weber
Comments: 12 pages, 3 figures
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
[82]  arXiv:2102.07627 (replaced) [pdf, ps, other]
Title: A first look into the carbon footprint of federated learning
Comments: arXiv admin note: substantial text overlap with arXiv:2010.06537
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
[83]  arXiv:2102.07762 (replaced) [pdf, other]
Title: Reconstruction-Based Membership Inference Attacks are Easier on Difficult Problems
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
[84]  arXiv:2104.14320 (replaced) [pdf, other]
Title: Adversarial Multi-task Learning Enhanced Physics-informed Neural Networks for Solving Partial Differential Equations
Comments: Accepted by the International Joint Conference on Neural Networks (IJCNN) 2021, Oral presentation
Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
[85]  arXiv:2105.04104 (replaced) [pdf, other]
Title: AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative Architecture for DNN Inference
Comments: Accepted by DAC2021
Subjects: Machine Learning (cs.LG)
[86]  arXiv:2105.05233 (replaced) [pdf, other]
Title: Diffusion Models Beat GANs on Image Synthesis
Comments: Updated proof in Appendix G and added more results in Table 5
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[87]  arXiv:1908.07749 (replaced) [pdf, other]
Title: Boosting the Rating Prediction with Click Data and Textual Contents
Comments: arXiv admin note: substantial text overlap with arXiv:1705.02085
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[88]  arXiv:2004.07179 (replaced) [pdf, other]
Title: Interpretable Probabilistic Password Strength Meters via Deep Learning
Comments: An abridged version of this paper appears in the proceedings of the 25th European Symposium on Research in Computer Security (ESORICS) 2020
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[89]  arXiv:2009.04013 (replaced) [pdf, other]
Title: Attribute Privacy: Framework and Mechanisms
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
[90]  arXiv:2009.09523 (replaced) [pdf, other]
Title: VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware
Comments: 12 pages, 29 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[91]  arXiv:2010.09393 (replaced) [pdf, other]
Title: Locality Sensitive Hashing with Extended Differential Privacy
Subjects: Cryptography and Security (cs.CR); Databases (cs.DB); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (cs.LG)
[92]  arXiv:2011.09349 (replaced) [pdf, ps, other]
Title: Bias-Variance Trade-off and Overlearning in Dynamic Decision Problems
Comments: 22 pages, 4 Tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
[93]  arXiv:2012.07598 (replaced) [pdf, other]
Title: StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking
Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
[94]  arXiv:2101.10799 (replaced) [pdf, other]
Title: ImageCHD: A 3D Computed Tomography Image Dataset for Classification of Congenital Heart Disease
Comments: 11 pages, 6 figures, 2 tables, published at MICCAI 2020. The diagnosis info of the dataset is updated (thanks to the help of Kadirbarut from Bilgiuzayi)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[95]  arXiv:2102.03200 (replaced) [pdf, other]
Title: Deep reinforcement learning for smart calibration of radio telescopes
Comments: MNRAS Accepted 2021 May 12. Received 2021 May 11; in original form 2021 February 5
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[96]  arXiv:2102.12206 (replaced) [pdf, other]
Title: PADA: A Prompt-based Autoregressive Approach for Adaptation to Unseen Domains
Comments: First two authors contributed equally to this work. Our code and data are available at: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[97]  arXiv:2102.13244 (replaced) [pdf, other]
Title: Cyclic Coordinate Dual Averaging with Extrapolation for Generalized Variational Inequalities
Comments: 20 pages, 8 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
[98]  arXiv:2103.07960 (replaced) [pdf, ps, other]
Title: Diagrammatic Differentiation for Quantum Machine Learning
Comments: Accepted to QPL2021
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Category Theory (math.CT)
[99]  arXiv:2103.15990 (replaced) [pdf, other]
Title: An Overview of Human Activity Recognition Using Wearable Sensors: Healthcare and Artificial Intelligence
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[100]  arXiv:2104.02443 (replaced) [pdf]
Title: CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing
Comments: 28 pages, 6 tables and 1 figure
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
[101]  arXiv:2104.08044 (replaced) [pdf, other]
Title: Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[102]  arXiv:2104.11892 (replaced) [pdf, other]
Title: A Survey of Modern Deep Learning based Object Detection Models
Comments: Preprint submitted to IET Computer Vision
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
[103]  arXiv:2104.13101 (replaced) [pdf, other]
Title: Initializing LSTM internal states via manifold learning
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Pattern Formation and Solitons (nlin.PS)
[104]  arXiv:2104.14543 (replaced) [pdf, other]
Title: Optimal training of variational quantum algorithms without barren plateaus
Authors: Tobias Haug, M.S. Kim
Comments: 13 pages, 14 figures
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG); Machine Learning (stat.ML)
[105]  arXiv:2105.03584 (replaced) [pdf, other]
Title: Adaptive Latent Space Tuning for Non-Stationary Distributions
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Accelerator Physics (physics.acc-ph)
[106]  arXiv:2105.03684 (replaced) [pdf, other]
Title: Quantum Machine Learning For Classical Data
Authors: Leonard Wossnig
Comments: PhD thesis. arXiv admin note: text overlap with arXiv:1905.09902
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)
[107]  arXiv:2105.03689 (replaced) [pdf, other]
Title: Self-Supervised Adversarial Example Detection by Disentangled Representation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[108]  arXiv:2105.03876 (replaced) [pdf, other]
Title: Selective Probabilistic Classifier Based on Hypothesis Testing
Comments: Accepted in EUVIP 2021 conference. Comments: Copyright statement added to first page in new version
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[109]  arXiv:2105.04241 (replaced) [pdf, other]
Title: ReadTwice: Reading Very Large Documents with Memories
Comments: To appear in the proceedings of NAACL 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[110]  arXiv:2105.04979 (replaced) [pdf, other]
Title: Surrogate assisted active subspace and active subspace assisted surrogate -- A new paradigm for high dimensional structural reliability analysis
Comments: 19 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[111]  arXiv:2105.05165 (replaced) [pdf, other]
Title: AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[ total of 111 entries: 1-111 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2105, contact, help  (Access key information)