We gratefully acknowledge support from
the Simons Foundation and member institutions.

Computer Science

New submissions

[ total of 636 entries: 1-636 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 23 Oct 20

[1]  arXiv:2010.11188 [pdf]
Title: AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies
Comments: 8 pages, 6 figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)

In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet. We take both audio and video into account and incorporate the relation among multiple modalities by applying self-attention mechanism in a novel manner into the extracted features for emotion prediction. We compare it to the typically temporal integration of the self-attention based model, which in our case, allows to capture the relation of temporal representations of the movie while considering the sequential dependencies of emotion responses. We demonstrate the effectiveness of our proposed architectures on the extended COGNIMUSE dataset [1], [2] and the MediaEval 2016 Emotional Impact of Movies Task [3], which consist of movies with emotion annotations. Our results show that applying the self-attention mechanism on the different audio-visual features, rather than in the time domain, is more effective for emotion prediction. Our approach is also proven to outperform many state-ofthe-art models for emotion prediction. The code to reproduce our results with the models' implementation is available at: https://github.com/ivyha010/AttendAffectNet.

[2]  arXiv:2010.11218 [pdf]
Title: DC Microgrid State Estimation and Sensor Placement Based on Compressive Sensing
Authors: Shutang You
Comments: 6 pages, 9 figures
Subjects: Systems and Control (eess.SY)

This paper proposes a DC microgrid state estimation and sensor placement method based on compressive sensing. Formulations of various types of measurements and components are developed under the proposed framework. A measurement placing strategy to minimize the coherence of the measurement matrix and thus increase estimation accuracy is presented. Simulation results show that the proposed state estimation and sensor placing approach can effectively reduce the number of sensors to achieve a certain level of estimation accuracy.

[3]  arXiv:2010.11223 [pdf, other]
Title: Meta-trained agents implement Bayes-optimal agents
Comments: Published at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.

[4]  arXiv:2010.11226 [pdf, ps, other]
Title: Dynamic Layer Customization for Noise Robust Speech Emotion Recognition in Heterogeneous Condition Training
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Robustness to environmental noise is important to creating automatic speech emotion recognition systems that are deployable in the real world. Prior work on noise robustness has assumed that systems would not make use of sample-by-sample training noise conditions, or that they would have access to unlabelled testing data to generalize across noise conditions. We avoid these assumptions and introduce the resulting task as heterogeneous condition training. We show that with full knowledge of the test noise conditions, we can improve performance by dynamically routing samples to specialized feature encoders for each noise condition, and with partial knowledge, we can use known noise conditions and domain adaptation algorithms to train systems that generalize well to unseen noise conditions. We then extend these improvements to the multimodal setting by dynamically routing samples to maintain temporal ordering, resulting in significant improvements over approaches that do not specialize or generalize based on noise type.

[5]  arXiv:2010.11228 [pdf, other]
Title: Trip Recovery in Lower-Limb Prostheses using Reachable Sets of Predicted Human Motion
Comments: 8 pages, 3 figures
Subjects: Robotics (cs.RO)

People with lower-limb loss, the majority of which use passive prostheses, exhibit a high incidence of falls each year. Powered lower-limb prostheses have the potential to reduce fall rates by actively helping the user recover from a stumble, but the unpredictability of the human response makes it difficult to design controllers that ensure a successful recovery. This paper presents a method called TRIP-RTD (Trip Recovery in Prostheses via Reachability-based Trajectory Design) for online trajectory planning in a knee prosthesis during and after a stumble that can accommodate a set of possible predictions of human behavior. Using this predicted set of human behavior, the proposed method computes a parameterized reachable set of trajectories for the human-prosthesis system. To ensure safety at run-time, TRIP-RTD selects a trajectory for the prosthesis that guarantees that all possible states of the human-prosthesis system at touchdown arrive in the basin of attraction of the nominal behavior of the system. In simulated stumble experiments where a nominal phase-based controller was unable to help the system recover, TRIP-RTD produced trajectories in under 101 ms that led to successful recoveries for all feasible solutions found.

[6]  arXiv:2010.11230 [pdf, other]
Title: Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Turn-level user satisfaction is one of the most important performance metrics for conversational agents. It can be used to monitor the agent's performance and provide insights about defective user experiences. Moreover, a powerful satisfaction model can be used as an objective function that a conversational agent continuously optimizes for. While end-to-end deep learning has shown promising results, having access to a large number of reliable annotated samples required by these methods remains challenging. In a large-scale conversational system, there is a growing number of newly developed skills, making the traditional data collection, annotation, and modeling process impractical due to the required annotation costs as well as the turnaround times. In this paper, we suggest a self-supervised contrastive learning approach that leverages the pool of unlabeled data to learn user-agent interactions. We show that the pre-trained models using the self-supervised objective are transferable to the user satisfaction prediction. In addition, we propose a novel few-shot transfer learning approach that ensures better transferability for very small sample sizes. The suggested few-shot method does not require any inner loop optimization process and is scalable to very large datasets and complex models. Based on our experiments using real-world data from a large-scale commercial system, the suggested approach is able to significantly reduce the required number of annotations, while improving the generalization on unseen out-of-domain skills.

[7]  arXiv:2010.11234 [pdf, other]
Title: Learning Spring Mass Locomotion: Guiding Policies with a Reduced-Order Model
Comments: 7 pages, 8 figures. Submitted to IEEE Robotics and Automation Letters (RA-L) with ICRA 2021 presentation option. Video supplement: this https URL
Subjects: Robotics (cs.RO)

In this paper, we describe an approach to achieve dynamic legged locomotion on physical robots which combines existing methods for control with reinforcement learning. Specifically, our goal is a control hierarchy in which highest-level behaviors are planned through reduced-order models, which describe the fundamental physics of legged locomotion, and lower level controllers utilize a learned policy that can bridge the gap between the idealized, simple model and the complex, full order robot. The high-level planner can use a model of the environment and be task specific, while the low-level learned controller can execute a wide range of motions so that it applies to many different tasks. In this letter we describe this learned dynamic walking controller and show that a range of walking motions from reduced-order models can be used as the command and primary training signal for learned policies. The resulting policies do not attempt to naively track the motion (as a traditional trajectory tracking controller would) but instead balance immediate motion tracking with long term stability. The resulting controller is demonstrated on a human scale, unconstrained, untethered bipedal robot at speeds up to 1.2 m/s. This letter builds the foundation of a generic, dynamic learned walking controller that can be applied to many different tasks.

[8]  arXiv:2010.11238 [pdf, other]
Title: Detection of COVID-19 informative tweets using RoBERTa
Subjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)

Social media such as Twitter is a hotspot of user-generated information. In this ongoing Covid-19 pandemic, there has been an abundance of data on social media which can be classified as informative and uninformative content. In this paper, we present our work to detect informative Covid-19 English tweets using RoBERTa model as a part of the W-NUT workshop 2020. We show the efficacy of our model on a public dataset with an F1-score of 0.89 on the validation dataset and 0.87 on the leaderboard.

[9]  arXiv:2010.11242 [pdf, other]
Title: Uncovering the Hidden Dangers: Finding Unsafe Go Code in the Wild
Authors: Johannes Lauinger (1), Lars Baumgärtner (1), Anna-Katharina Wickert (1), Mira Mezini (1) ((1) Technische Universität Darmstadt)
Comments: This is a copy of the accepted version at The 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom 2020)
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

The Go programming language aims to provide memory and thread safety through measures such as automated memory management with garbage collection and a strict type system. However, it also offers a way of circumventing this safety net through the use of the unsafe package. While there are legitimate use cases for unsafe, developers must exercise caution to avoid introducing vulnerabilities like buffer overflows or memory corruption in general. Using go-geiger, we conducted a study on the usage of unsafe in the top 500 most popular open-source Go projects on GitHub, including a manual analysis of 1,400 code samples on how unsafe is used. From the projects using Go's module system, 38% directly contain at least one unsafe usage, and 91% contain at least one unsafe usage in the project itself or one of its transitive dependencies. Based on the usage patterns found, we present possible exploit vectors in different scenarios. Finally, we present go-safer, a novel static analysis tool to identify dangerous and common usage patterns that were previously undetected with existing tools.

[10]  arXiv:2010.11243 [pdf, other]
Title: Solving Zero-Sum One-Sided Partially Observable Stochastic Games
Subjects: Computer Science and Game Theory (cs.GT)

Many security and other real-world situations are dynamic in nature and can be modelled as strictly competitive (or zero-sum) dynamic games. In these domains, agents perform actions to affect the environment and receive observations -- possibly imperfect -- about the situation and the effects of the opponent's actions. Moreover, there is no limitation on the total number of actions an agent can perform -- that is, there is no fixed horizon. These settings can be modelled as partially observable stochastic games (POSGs). However, solving general POSGs is computationally intractable, so we focus on a broad subclass of POSGs called one-sided POSGs. In these games, only one agent has imperfect information while their opponent has full knowledge of the current situation. We provide a full picture for solving one-sided POSGs: we (1) give a theoretical analysis of one-sided POSGs and their value functions, (2) show that a variant of a value-iteration algorithm converges in this setting, (3) adapt the heuristic search value-iteration algorithm for solving one-sided POSGs, (4) describe how to use approximate value functions to derive strategies in the game, and (5) demonstrate that our algorithm can solve one-sided POSGs of non-trivial sizes and analyze the scalability of our algorithm in three different domains: pursuit-evasion, patrolling, and search games.

[11]  arXiv:2010.11246 [pdf, other]
Title: On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries
Comments: Findings of ACL: EMNLP 2020
Journal-ref: Findings of ACL: EMNLP 2020
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Large-scale semantic parsing datasets annotated with logical forms have enabled major advances in supervised approaches. But can richer supervision help even more? To explore the utility of fine-grained, lexical-level supervision, we introduce Squall, a dataset that enriches 11,276 WikiTableQuestions English-language questions with manually created SQL equivalents plus alignments between SQL and question fragments. Our annotation enables new training possibilities for encoder-decoder models, including approaches from machine translation previously precluded by the absence of alignments. We propose and test two methods: (1) supervised attention; (2) adopting an auxiliary objective of disambiguating references in the input queries to table columns. In 5-fold cross validation, these strategies improve over strong baselines by 4.4% execution accuracy. Oracle experiments suggest that annotated alignments can support further accuracy gains of up to 23.9%.

[12]  arXiv:2010.11247 [pdf, other]
Title: Improving Simultaneous Translation with Pseudo References
Comments: 6 pages
Subjects: Computation and Language (cs.CL)

Simultaneous translation is vastly different from full-sentence translation, in the sense that it starts translation before the source sentence ends, with only a few words delay. However, due to the lack of large scale and publicly available simultaneous translation datasets, most simultaneous translation systems still train with ordinary full-sentence parallel corpora which are not suitable for the simultaneous scenario due to the existence of unnecessary long-distance reorderings. Instead of expensive, time-consuming annotation, we propose a novel method that rewrites the target side of existing full-sentence corpus into simultaneous-style translation. Experiments on Chinese-to-English translation demonstrate about +2.7 BLEU improvements with the addition of newly generated pseudo references.

[13]  arXiv:2010.11248 [pdf, other]
Title: Neural Star Domain as Primitive Representation
Comments: Accepted to NeurIPS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Reconstructing 3D objects from 2D images is a fundamental task in computer vision. Accurate structured reconstruction by parsimonious and semantic primitive representation further broadens its application. When reconstructing a target shape with multiple primitives, it is preferable that one can instantly access the union of basic properties of the shape such as collective volume and surface, treating the primitives as if they are one single shape. This becomes possible by primitive representation with unified implicit and explicit representations. However, primitive representations in current approaches do not satisfy all of the above requirements at the same time. To solve this problem, we propose a novel primitive representation named neural star domain (NSD) that learns primitive shapes in the star domain. We show that NSD is a universal approximator of the star domain and is not only parsimonious and semantic but also an implicit and explicit shape representation. We demonstrate that our approach outperforms existing methods in image reconstruction tasks, semantic capabilities, and speed and quality of sampling high-resolution meshes.

[14]  arXiv:2010.11251 [pdf, other]
Title: Learning Quadrupedal Locomotion over Challenging Terrain
Journal-ref: Science Robotics 2020 Vol. 5, Issue 47, eabc5986
Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)

Some of the most challenging environments on our planet are accessible to quadrupedal animals but remain out of reach for autonomous machines. Legged locomotion can dramatically expand the operational domains of robotics. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have escalated in complexity while falling short of the generality and robustness of animal locomotion. Here we present a radically robust controller for legged locomotion in challenging natural environments. We present a novel solution to incorporating proprioceptive feedback in locomotion control and demonstrate remarkable zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. It is based on a neural network that acts on a stream of proprioceptive signals. The trained controller has taken two generations of quadrupedal ANYmal robots to a variety of natural environments that are beyond the reach of prior published work in legged locomotion. The controller retains its robustness under conditions that have never been encountered during training: deformable terrain such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work opens new frontiers for robotics and indicates that radical robustness in natural environments can be achieved by training in much simpler domains.

[15]  arXiv:2010.11252 [pdf, other]
Title: On Adaptive Distance Estimation
Subjects: Data Structures and Algorithms (cs.DS)

We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset $X = \{x_i\}_{i = 1}^n$ of $n$ points in $\mathbb{R}^d$ and $0 < p \leq 2$, we construct a randomized data structure with low memory consumption and query time which, when later given any query point $q \in \mathbb{R}^d$, outputs a $(1+\epsilon)$-approximation of $\lVert q - x_i \rVert_p$ with high probability for all $i\in[n]$. The main novelty is our data structure's correctness guarantee holds even when the sequence of queries can be chosen adaptively: an adversary is allowed to choose the $j$th query point $q_j$ in a way that depends on the answers reported by the data structure for $q_1,\ldots,q_{j-1}$. Previous randomized Monte Carlo methods do not provide error guarantees in the setting of adaptively chosen queries. Our memory consumption is $\tilde O((n+d)d/\epsilon^2)$, slightly more than the $O(nd)$ required to store $X$ in memory explicitly, but with the benefit that our time to answer queries is only $\tilde O(\epsilon^{-2}(n + d))$, much faster than the naive $\Theta(nd)$ time obtained from a linear scan in the case of $n$ and $d$ very large. Here $\tilde O$ hides $\log(nd/\epsilon)$ factors. We discuss applications to nearest neighbor search and nonparametric estimation.
Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to $\ell_p$ norm estimation with negligible overhead in query time and a factor $d$ overhead in memory.

[16]  arXiv:2010.11253 [pdf, other]
Title: Clustering-based Inference for Zero-Shot Biomedical Entity Linking
Subjects: Computation and Language (cs.CL)

Due to large number of entities in biomedical knowledge bases, only a small fraction of entities have corresponding labelled training data. This necessitates a zero-shot entity linking model which is able to link mentions of unseen entities using learned representations of entities. Existing zero-shot entity linking models however link each mention independently, ignoring the inter/intra-document relationships between the entity mentions. These relations can be very useful for linking mentions in biomedical text where linking decisions are often difficult due mentions having a generic or a highly specialized form. In this paper, we introduce a model in which linking decisions can be made not merely by linking to a KB entity but also by grouping multiple mentions together via clustering and jointly making linking predictions. In experiments on the largest publicly available biomedical dataset, we improve the best independent prediction for zero-shot entity linking by 2.5 points of accuracy, and our joint inference model further improves entity linking by 1.8 points.

[17]  arXiv:2010.11255 [pdf, other]
Title: The IDLAB VoxSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification
Comments: Submitted to ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper we propose and analyse a large margin fine-tuning strategy and a quality-aware score calibration in text-independent speaker verification. Large margin fine-tuning is a secondary training stage for DNN based speaker verification systems trained with margin-based loss functions. It enables the network to create more robust speaker embeddings by enabling the use of longer training utterances in combination with a more aggressive margin penalty. Score calibration is a common practice in speaker verification systems to map output scores to well-calibrated log-likelihood-ratios, which can be converted to interpretable probabilities. By including quality features in the calibration system, the decision thresholds of the evaluation metrics become quality-dependent and more consistent across varying trial conditions. Applying both enhancements on the ECAPA-TDNN architecture leads to state-of-the-art results on all publicly available VoxCeleb1 test sets and contributed to our winning submissions in the supervised verification tracks of the VoxCeleb Speaker Recognition Challenge 2020.

[18]  arXiv:2010.11262 [pdf, other]
Title: Orthogonality sampling type methods for an inverse acoustic scattering problem
Authors: Dinh-Liem Nguyen
Comments: 17 pages
Subjects: Numerical Analysis (math.NA); Computational Engineering, Finance, and Science (cs.CE)

We consider the inverse acoustic scattering problem of determining the location and shape of penetrable scattering objects from multi-static Cauchy data of the scattered field. We propose two novel imaging functionals of orthogonality sampling type for solving the inverse problem. These imaging functionals, like the orthogonality sampling method, are fast, simple to implement, and robust with respect to noise in the data. A further advantage is that they are applicable to both near-field and far-field data. In particular, in the case of far-field data, the functionals can be easily modified such that only the scattered field data is needed. The theoretical analysis of the first imaging functional relies on the Factorization method along with a relation between the Cauchy data and the scattering amplitude of the scattered field. The second one is justified using the Helmholtz integral representation for the imaginary part of the Green's function of the direct scattering problem. Numerical examples are presented to illustrate the efficiency of the proposed imaging functionals.

[19]  arXiv:2010.11263 [pdf, other]
Title: A P4 Data Plane for the Quantum Internet
Subjects: Networking and Internet Architecture (cs.NI); Quantum Physics (quant-ph)

The quantum technology revolution brings with it the promise of a quantum internet. A new -- quantum -- network stack will be needed to account for the fundamentally new properties of quantum entanglement. The first realisations of quantum networks are imminent and research interest in quantum network protocols has started growing. In the non-quantum world, programmable data planes have broken the pattern of ossification of the protocol stack and enabled a new -- software-defined -- network software architecture. Similarly, a programmable quantum data plane could pave the way for a software-defined quantum network architecture. In this paper, we demonstrate how we use P4$_{16}$ to explore abstractions and device architectures for quantum networks.

[20]  arXiv:2010.11264 [pdf, ps, other]
Title: An Efficient Real-Time NMPC for Quadrotor Position Control under Communication Time-Delay
Comments: This paper has been accepted for publication at the 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), Shenzhen, China, December 13-15, 2020, IEEE
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)

The advances in computer processor technology have enabled the application of nonlinear model predictive control (NMPC) to agile systems, such as quadrotors. These systems are characterized by their underactuation, nonlinearities, bounded inputs, and time-delays. Classical control solutions fall short in overcoming these difficulties and fully exploiting the capabilities offered by such platforms. This paper presents the design and implementation of an efficient position controller for quadrotors based on real-time NMPC with time-delay compensation and bounds enforcement on the actuators. To deal with the limited computational resources onboard, an offboard control architecture is proposed. It is implemented using the high-performance software package acados, which solves optimal control problems and implements a real-time iteration (RTI) variant of a sequential quadratic programming (SQP) scheme with Gauss-Newton Hessian approximation. The quadratic subproblems (QP) in the SQP scheme are solved with HPIPM, an interior-point method solver, built on top of the linear algebra library BLASFEO, finely tuned for multiple CPU architectures. Solution times are further reduced by reformulating the QPs using the efficient partial condensing algorithm implemented in HPIPM. We demonstrate the capabilities of our architecture using the Crazyflie 2.1 nanoquadrotor.

[21]  arXiv:2010.11265 [pdf, other]
Title: Sobolev training of thermodynamic-informed neural networks for smoothed elasto-plasticity models with level set hardening
Comments: 42 pages, 28 figures
Subjects: Machine Learning (cs.LG)

We introduce a deep learning framework designed to train smoothed elastoplasticity models with interpretable components, such as a smoothed stored elastic energy function, a yield surface, and a plastic flow that are evolved based on a set of deep neural network predictions. By recasting the yield function as an evolving level set, we introduce a machine learning approach to predict the solutions of the Hamilton-Jacobi equation that governs the hardening mechanism. This machine learning hardening law may recover classical hardening models and discover new mechanisms that are otherwise very difficult to anticipate and hand-craft. This treatment enables us to use supervised machine learning to generate models that are thermodynamically consistent, interpretable, but also exhibit excellent learning capacity. Using a 3D FFT solver to create a polycrystal database, numerical experiments are conducted and the implementations of each component of the models are individually verified. Our numerical experiments reveal that this new approach provides more robust and accurate forward predictions of cyclic stress paths than these obtained from black-box deep neural network models such as a recurrent GRU neural network, a 1D convolutional neural network, and a multi-step feedforward model.

[22]  arXiv:2010.11266 [pdf, other]
Title: Convex Polytope Trees
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

A decision tree is commonly restricted to use a single hyperplane to split the covariate space at each of its internal nodes. It often requires a large number of nodes to achieve high accuracy, hurting its interpretability. In this paper, we propose convex polytope trees (CPT) to expand the family of decision trees by an interpretable generalization of their decision boundary. The splitting function at each node of CPT is based on the logical disjunction of a community of differently weighted probabilistic linear decision-makers, which also geometrically corresponds to a convex polytope in the covariate space. We use a nonparametric Bayesian prior at each node to infer the community's size, encouraging simpler decision boundaries by shrinking the number of polytope facets. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given. We empirically demonstrate the efficiency of CPT over existing state-of-the-art decision trees in several real-world classification and regression tasks from diverse domains.

[23]  arXiv:2010.11267 [pdf, other]
Title: MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers
Comments: 10 pages, 8 figures, 3 tables, appendix
Subjects: Machine Learning (cs.LG)

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source NN inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection.

[24]  arXiv:2010.11269 [pdf, other]
Title: Deep-Reinforcement-Learning-Based Scheduling with Contiguous Resource Allocation for Next-Generation Cellular Systems
Authors: Shu Sun, Xiaofeng Li
Comments: 6 pages, 4 figures
Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG); Signal Processing (eess.SP)

In this work, we propose a novel scheduling algorithm with contiguous frequency-domain resource allocation (FDRA) based on deep reinforcement learning (DRL) that jointly selects users and allocates resource blocks (RBs). The scheduling problem is modeled as a Markov decision process, and a DRL agent determines which user and how many consecutive RBs for that user should be scheduled at each RB allocation step. The state, action, and reward sets are delicately designed to train the DRL network. More specifically, the originally quasicontinuous action space, which is inherent to contiguous FDRA, is refined into a finite and discrete action space to obtain a tradeoff between the inference latency and system performance. Simulation results show that the proposed DRL-based algorithm outperforms other representative baseline schemes while having lower online computational complexity.

[25]  arXiv:2010.11270 [pdf, other]
Title: Learning second order coupled differential equations that are subject to non-conservative forces
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

In this article we address the question whether it is possible to learn the differential equations describing the physical properties of a dynamical system, subject to non-conservative forces, from observations of its realspace trajectory(ies) only. We introduce a network that incorporates a difference approximation for the second order derivative in terms of residual connections between convolutional blocks, whose shared weights represent the coefficients of a second order ordinary differential equation. We further combine this solver-like architecture with a convolutional network, capable of learning the relation between trajectories of coupled oscillators and therefore allows us to make a stable forecast even if the system is only partially observed. We optimize this map together with the solver network, while sharing their weights, to form a powerful framework capable of learning the complex physical properties of a dissipative dynamical system.

[26]  arXiv:2010.11271 [pdf]
Title: Robustness-aware 2-bit quantization with real-time performance for neural network
Comments: 14 pages
Subjects: Machine Learning (cs.LG)

Quantized neural network (NN) with a reduced bit precision is an effective solution to reduces the computational and memory resource requirements and plays a vital role in machine learning. However, it is still challenging to avoid the significant accuracy degradation due to its numerical approximation and lower redundancy. In this paper, a novel robustness-aware 2-bit quantization scheme is proposed for NN base on binary NN and generative adversarial network(GAN), witch improves the performance by enriching the information of binary NN, efficiently extract the structural information and considering the robustness of the quantized NN. Specifically, using shift addition operation to replace the multiply-accumulate in the quantization process witch can effectively speed the NN. Meanwhile, a structural loss between the original NN and quantized NN is proposed to such that the structural information of data is preserved after quantization. The structural information learned from NN not only plays an important role in improving the performance but also allows for further fine tuning of the quantization network by applying the Lipschitz constraint to the structural loss. In addition, we also for the first time take the robustness of the quantized NN into consideration and propose a non-sensitive perturbation loss function by introducing an extraneous term of spectral norm. The experiments are conducted on CIFAR-10 and ImageNet datasets with popular NN( such as MoblieNetV2, SqueezeNet, ResNet20, etc). The experimental results show that the proposed algorithm is more competitive under 2-bit-precision than the state-of-the-art quantization methods. Meanwhile, the experimental results also demonstrate that the proposed method is robust under the FGSM adversarial samples attack.

[27]  arXiv:2010.11272 [pdf, other]
Title: Predicting Chemical Properties using Self-Attention Multi-task Learning based on SMILES Representation
Comments: Accepted at ICPR2020
Subjects: Machine Learning (cs.LG)

In the computational prediction of chemical compound properties, molecular descriptors and fingerprints encoded to low dimensional vectors are used. The selection of proper molecular descriptors and fingerprints is both important and challenging as the performance of such models is highly dependent on descriptors. To overcome this challenge, natural language processing models that utilize simplified molecular input line-entry system as input were studied, and several transformer-variant models achieved superior results when compared with conventional methods. In this study, we explored the structural differences of the transformer-variant model and proposed a new self-attention based model. The representation learning performance of the self-attention module was evaluated in a multi-task learning environment using imbalanced chemical datasets. The experiment results showed that our model achieved competitive outcomes on several benchmark datasets. The source code of our experiment is available at https://github.com/arwhirang/sa-mtl and the dataset is available from the same URL.

[28]  arXiv:2010.11273 [pdf, other]
Title: The Need for Standardized Explainability
Comments: Accepted in 2nd ICML 2020 Workshop on Human in the Loop Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Explainable AI (XAI) is paramount in industry-grade AI; however existing methods fail to address this necessity, in part due to a lack of standardisation of explainability methods. The purpose of this paper is to offer a perspective on the current state of the area of explainability, and to provide novel definitions for Explainability and Interpretability to begin standardising this area of research. To do so, we provide an overview of the literature on explainability, and of the existing methods that are already implemented. Finally, we offer a tentative taxonomy of the different explainability methods, opening the door to future research.

[29]  arXiv:2010.11274 [pdf]
Title: A novel method of fuzzy time series forecasting based on interval index number and membership value using support vector machine
Subjects: Machine Learning (cs.LG)

Fuzzy time series forecasting methods are very popular among researchers for predicting future values as they are not based on the strict assumptions of traditional time series forecasting methods. Non-stochastic methods of fuzzy time series forecasting are preferred by the researchers as they provide more significant forecasting results. There are generally, four factors that determine the performance of the forecasting method (1) number of intervals (NOIs) and length of intervals to partition universe of discourse (UOD) (2) fuzzification rules or feature representation of crisp time series (3) method of establishing fuzzy logic rule (FLRs) between input and target values (4) defuzzification rule to get crisp forecasted value. Considering the first two factors to improve the forecasting accuracy, we proposed a novel non-stochastic method fuzzy time series forecasting in which interval index number and membership value are used as input features to predict future value. We suggested a simple rounding-off range and suitable step size method to find the optimal number of intervals (NOIs) and used fuzzy c-means clustering process to divide UOD into intervals of unequal length. We implement support vector machine (SVM) to establish FLRs. To test our proposed method we conduct a simulated study on five widely used real time series and compare the performance with some recently developed models. We also examine the performance of the proposed model by using multi-layer perceptron (MLP) instead of SVM. Two performance measures RSME and SMAPE are used for performance analysis and observed better forecasting accuracy by the proposed model.

[30]  arXiv:2010.11278 [pdf, other]
Title: Deep Surrogate Q-Learning for Autonomous Driving
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

Challenging problems of deep reinforcement learning systems with regard to the application on real systems are their adaptivity to changing environments and their efficiency w.r.t. computational resources and data. In the application of learning lane-change behavior for autonomous driving, agents have to deal with a varying number of surrounding vehicles. Furthermore, the number of required transitions imposes a bottleneck, since test drivers cannot perform an arbitrary amount of lane changes in the real world. In the off-policy setting, additional information on solving the task can be gained by observing actions from others. While in the classical RL setup this knowledge remains unused, we use other drivers as surrogates to learn the agent's value function more efficiently. We propose Surrogate Q-learning that deals with the aforementioned problems and reduces the required driving time drastically. We further propose an efficient implementation based on a permutation-equivariant deep neural network architecture of the Q-function to estimate action-values for a variable number of vehicles in sensor range. We show that the architecture leads to a novel replay sampling technique we call Scene-centric Experience Replay and evaluate the performance of Surrogate Q-learning and Scene-centric Experience Replay in the open traffic simulator SUMO. Additionally, we show that our methods enhance real-world applicability of RL systems by learning policies on the real highD dataset.

[31]  arXiv:2010.11289 [pdf]
Title: Shedding Light on Blind Spots: Developing a Reference Architecture to Leverage Video Data for Process Mining
Comments: Submitted to Information Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Process mining is one of the most active research streams in business process management. In recent years, numerous methods have been proposed for analyzing structured process data. Yet, in many cases, it is only the digitized parts of processes that are directly captured from process-aware information systems, and manual activities often result in blind spots. While the use of video cameras to observe these activities could help to fill this gap, a standardized approach to extracting event logs from unstructured video data remains lacking. Here, we propose a reference architecture to bridge the gap between computer vision and process mining. Various evaluation activities (i.e., competing artifact analysis, prototyping, and real-world application) ensured that the proposed reference architecture allows flexible, use-case-driven, and context-specific instantiations. Our results also show that an exemplary software prototype instantiation of the proposed reference architecture is capable of automatically extracting most of the process-relevant events from unstructured video data.

[32]  arXiv:2010.11290 [pdf, other]
Title: Unrolling of Deep Graph Total Variation for Image Denoising
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

While deep learning (DL) architectures like convolutional neural networks (CNNs) have enabled effective solutions in image denoising, in general their implementations overly rely on training data, lack interpretability, and require tuning of a large parameter set. In this paper, we combine classical graph signal filtering with deep feature learning into a competitive hybrid design---one that utilizes interpretable analytical low-pass graph filters and employs 80% fewer network parameters than state-of-the-art DL denoising scheme DnCNN. Specifically, to construct a suitable similarity graph for graph spectral filtering, we first adopt a CNN to learn feature representations per pixel, and then compute feature distances to establish edge weights. Given a constructed graph, we next formulate a convex optimization problem for denoising using a graph total variation (GTV) prior. Via a $l_1$ graph Laplacian reformulation, we interpret its solution in an iterative procedure as a graph low-pass filter and derive its frequency response. For fast filter implementation, we realize this response using a Lanczos approximation. Experimental results show that in the case of statistical mistmatch, our algorithm outperformed DnCNN by up to 3dB in PSNR.

[33]  arXiv:2010.11295 [pdf]
Title: Bidirectional Microrocker Bots with Sharp Tips Actuated by a Single Electromagnet
Comments: 7 pages, 3 figures
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

The recent advancements in nanoscale 3D printing and microfabrication techniques have reinvigorated research on microrobotics and nanomachines. However, precise control of the robot motion and navigation on biological environments have remained challenging to date. This work presents the first demonstration of magnetic microscale rocker robot (microrocker bot) capable of bidirectional movement on flat as well as biological surfaces, when actuated by a single compact electromagnet. The 100um by 113um by 36um robot was 3D printed via two-photon lithography and subsequently coated with a nickel (Ni) thin film. When actuated by an externally applied magnetic sawtooth field, the robot demonstrated stick-slip motion enabled by its rockers. The controllable bidirectional motion is enabled by adjusting the DC offset of the waveform, which tilts the robot and biases it towards either forward or backward motion. The microrocker bots are further equipped with sharp tips that can get engaged via application of DC-only or low frequency magnetic fields. This novel control method offers an attractive solution to replace the multiple bulky coils traditionally used for magnetic actuation and control, as well as allows for a more flexible and simple approach towards microrobotics motion control. When the frequency and offset of the sawtooth waveform are optimized, the robot travels up to 87ums (0.87 body length per second) forward and backward with minor deviance from linear trajectories. Finally, to prove the robot's capabilities in direct contact with biological environments, we demonstrate the microbot's ability to traverse forward and backward on the surface of a Dracaena Fragrans (corn plant), as well as upend on its mechanical tip.

[34]  arXiv:2010.11296 [pdf, other]
Title: System Design and Control of an Apple Harvesting Robot
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

There is a growing need for robotic apple harvesting due to decreasing availability and rising cost in labor. Towards the goal of developing a viable robotic system for apple harvesting, this paper presents synergistic mechatronic design and motion control of a robotic apple harvesting prototype, which lays a critical foundation for future advancements. Specifically, we develop a deep learning-based fruit detection and localization system using an RGB-D camera. A three degree-of-freedom manipulator is then designed with a hybrid pneumatic/motor actuation mechanism to achieve fast and dexterous movements. A vacuum-based end-effector is used for apple detaching. These three components are integrated into a robotic apple harvesting prototype with simplicity, compactness, and robustness. Moreover, a nonlinear velocity-based control scheme is developed for the manipulator to achieve accurate and agile motion control. Test experiments are conducted to demonstrate the performance of the developed apple harvesting robot.

[35]  arXiv:2010.11297 [pdf, ps, other]
Title: Performance Prediction for Convolutional Neural Networks in Edge Devices
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Running Convolutional Neural Network (CNN) based applications on edge devices near the source of data can meet the latency and privacy challenges. However due to their reduced computing resources and their energy constraints, these edge devices can hardly satisfy CNN needs in processing and data storage. For these platforms, choosing the CNN with the best trade-off between accuracy and execution time while respecting Hardware constraints is crucial. In this paper, we present and compare five (5) of the widely used Machine Learning based methods for execution time prediction of CNNs on two (2) edge GPU platforms. For these 5 methods, we also explore the time needed for their training and tuning their corresponding hyperparameters. Finally, we compare times to run the prediction models on different platforms. The utilization of these methods will highly facilitate design space exploration by providing quickly the best CNN on a target edge GPU. Experimental results show that eXtreme Gradient Boosting (XGBoost) provides a less than 14.73% average prediction error even for unexplored and unseen CNN models' architectures. Random Forest (RF) depicts comparable accuracy but needs more effort and time to be trained. The other 3 approaches (OLS, MLP and SVR) are less accurate for CNN performances estimation.

[36]  arXiv:2010.11300 [pdf, ps, other]
Title: How Do Fair Decisions Fare in Long-term Qualification?
Comments: Accepted to the 34th Conference on Neural Information Processing Systems (NeurIPS)
Subjects: Machine Learning (cs.LG); Computers and Society (cs.CY)

Although many fairness criteria have been proposed for decision making, their long-term impact on the well-being of a population remains unclear. In this work, we study the dynamics of population qualification and algorithmic decisions under a partially observed Markov decision problem setting. By characterizing the equilibrium of such dynamics, we analyze the long-term impact of static fairness constraints on the equality and improvement of group well-being. Our results show that static fairness constraints can either promote equality or exacerbate disparity depending on the driving factor of qualification transitions and the effect of sensitive attributes on feature distributions. We also consider possible interventions that can effectively improve group qualification or promote equality of group qualification. Our theoretical results and experiments on static real-world datasets with simulated dynamics show that our framework can be used to facilitate social science studies.

[37]  arXiv:2010.11302 [pdf]
Title: Frequency Response Study on the ERCOT under High Photovoltaic (PV) Penetration Conditions
Comments: 5 pages, 6 figures
Subjects: Systems and Control (eess.SY)

Solar photovoltaic (PV) generation is growing rapidly around the world. However, PV generation, based on inverter, is fundamentally different from conventional synchronous generators. It is of vital importance to understand the impacts of increased penetration of PV generation on power system dynamic performance. This paper investigates frequency response of the Electric Reliability Council of Texas (ERCOT) with high PV penetration in the future year. In this work, a realistic baseline dynamic model is validated using synchrophasor measurements. Then, dynamic simulation is performed to evaluate the impacts of high PV generation on frequency response.

[38]  arXiv:2010.11304 [pdf, other]
Title: Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling
Subjects: Computation and Language (cs.CL)

Document-level relation extraction (RE) poses new challenges compared to its sentence-level RE counterpart. One document commonly contains multiple entity pairs, and one entity pair occurs multiple times in the document associated with multiple possible relations. In this paper, we propose two novel techniques, adaptive thresholding and localized context pooling, to solve the multilabel and multi-entity problems. The adaptive thresholding replaces the global threshold for multi-label classification in the prior work by a learnable entities-dependent threshold. The localized context pooling directly transfers attention from pre-trained language models to locate relevant context that is useful to decide the relation. We experiment on three document-level RE benchmark datasets: DocRED, a recently released large-scale RE dataset, and two datasets CDR and GDA in the biomedical domain. Our ATLOP (Adaptive Thresholding and Localized cOntext Pooling) model achieves an F1 score of 63.4; and also significantly outperforms existing models on both CDR and GDA.

[39]  arXiv:2010.11305 [pdf, other]
Title: Mixed-Precision Embedding Using a Cache
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

In recommendation systems, practitioners observed that increase in the number of embedding tables and their sizes often leads to significant improvement in model performances. Given this and the business importance of these models to major internet companies, embedding tables for personalization tasks have grown to terabyte scale and continue to grow at a significant rate. Meanwhile, these large-scale models are often trained with GPUs where high-performance memory is a scarce resource, thus motivating numerous work on embedding table compression during training. We propose a novel change to embedding tables using a cache memory architecture, where the majority of rows in an embedding is trained in low precision, and the most frequently or recently accessed rows cached and trained in full precision. The proposed architectural change works in conjunction with standard precision reduction and computer arithmetic techniques such as quantization and stochastic rounding. For an open source deep learning recommendation model (DLRM) running with Criteo-Kaggle dataset, we achieve 3x memory reduction with INT8 precision embedding tables and full-precision cache whose size are 5% of the embedding tables, while maintaining accuracy. For an industrial scale model and dataset, we achieve even higher >7x memory reduction with INT4 precision and cache size 1% of embedding tables, while maintaining accuracy, and 16% end-to-end training speedup by reducing GPU-to-host data transfers.

[40]  arXiv:2010.11307 [pdf, other]
Title: Speculative Container Scheduling for Deep Learning Applications in a Kubernetes Cluster
Comments: Under Review
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

In the past decade, we have witnessed a dramatically increasing volume of data collected from varied sources. The explosion of data has transformed the world as more information is available for collection and analysis than ever before. To maximize the utilization, various machine and deep learning models have been developed, e.g. CNN [1] and RNN [2], to study data and extract valuable information from different perspectives. While data-driven applications improve countless products, training models for hyperparameter tuning is still a time-consuming and resource-intensive process. Cloud computing provides infrastructure support for the training of deep learning applications. The cloud service providers, such as Amazon Web Services [3], create an isolated virtual environment (virtual machines and containers) for clients, who share physical resources, e.g., CPU and memory. On the cloud, resource management schemes are implemented to enable better sharing among users and boost the system-wide performance. However, general scheduling approaches, such as spread priority and balanced resource schedulers, do not work well with deep learning workloads. In this project, we propose SpeCon, a novel container scheduler that is optimized for shortlived deep learning applications. Based on virtualized containers, such as Kubernetes [4] and Docker [5], SpeCon analyzes the common characteristics of training processes. We design a suite of algorithms to monitor the progress of the training and speculatively migrate the slow-growing models to release resources for fast-growing ones. Specifically, the extensive experiments demonstrate that SpeCon improves the completion time of an individual job by up to 41.5%, 14.8% system-wide and 24.7% in terms of makespan.

[41]  arXiv:2010.11310 [pdf, other]
Title: Uncertainty-Aware Deep Ensembles for Reliable and Explainable Predictions of Clinical Time Series
Comments: 11 pages, 9 figures, code at this https URL
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Deep learning-based support systems have demonstrated encouraging results in numerous clinical applications involving the processing of time series data. While such systems often are very accurate, they have no inherent mechanism for explaining what influenced the predictions, which is critical for clinical tasks. However, existing explainability techniques lack an important component for trustworthy and reliable decision support, namely a notion of uncertainty. In this paper, we address this lack of uncertainty by proposing a deep ensemble approach where a collection of DNNs are trained independently. A measure of uncertainty in the relevance scores is computed by taking the standard deviation across the relevance scores produced by each model in the ensemble, which in turn is used to make the explanations more reliable. The class activation mapping method is used to assign a relevance score for each time step in the time series. Results demonstrate that the proposed ensemble is more accurate in locating relevant time steps and is more consistent across random initializations, thus making the model more trustworthy. The proposed methodology paves the way for constructing trustworthy and dependable support systems for processing clinical time series for healthcare related tasks.

[42]  arXiv:2010.11317 [pdf, other]
Title: Full-Duplex and Dynamic-TDD: Pushing the Limits of Spectrum Reuse in Multi-Cell Communications
Comments: 15 pages, 6 figures. Accepted to IEEE Wireless Communications - Special Issue on Full Duplex Communications Theory, Standardization and Practice
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)

Although in cellular networks full-duplex and dynamic time-division duplexing promise increased spectrum efficiency, their potential is so far challenged by increased interference. While previous studies have shown that self-interference can be suppressed to a sufficient level, we show that the cross-link interference for both duplexing modes, especially from base station to base station, is the remaining challenge in multi-cell networks, restricting the uplink performance. Using beamforming techniques of low-complexity, we show that this interference can be mitigated, and that full-duplex and dynamic time-division duplexing can substantially increase the capacity of multi-cell networks. Our results suggest that if we can control the cross link interference in full-duplex, then we can almost double the multi cell network capacity as well as user throughput. Therefore, the techniques in this paper have the potentiality to enable a smooth introduction of full-duplex into cellular systems.

[43]  arXiv:2010.11320 [pdf, other]
Title: Serverless Containers -- rising viable approach to Scientific Workflows
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Increasing popularity of the serverless computing approach has led to the emergence of new cloud infrastructures working in Container-as-a-Service (CaaS) model like AWS Fargate, Google Cloud Run, or Azure Container Instances. They introduce an innovative approach to running cloud containers where developers are freed from managing underlying resources. In this paper, we focus on evaluating capabilities of elastic containers and their usefulness for scientific computing in the scientific workflow paradigm using AWS Fargate and Google Cloud Run infrastructures. For experimental evaluation of our approach, we extended HyperFlow engine to support these CaaS platform, together with adapting four real-world scientific workflows composed of several dozen to over a hundred of tasks organized into a dependency graph. We used these workflows to create cost-performance benchmarks and flow execution plots, measuring delays, elasticity, and scalability. The experiments proved that serverless containers can be successfully applied for scientific workflows. Also, the results allow us to gain insights on specific advantages and limits of such platforms.

[44]  arXiv:2010.11321 [pdf, ps, other]
Title: MRI Image Recovery using Damped Denoising Vector AMP
Subjects: Information Theory (cs.IT)

Motivated by image recovery in magnetic resonance imaging (MRI), we propose a new approach to solving linear inverse problems based on iteratively calling a deep neural-network, sometimes referred to as plug-and-play recovery. Our approach is based on the vector approximate message passing (VAMP) algorithm, which is known for mean-squared error (MSE)-optimal recovery under certain conditions. The forward operator in MRI, however, does not satisfy these conditions, and thus we design new damping and initialization schemes to help VAMP. The resulting DD-VAMP++ algorithm is shown to outperform existing algorithms in convergence speed and accuracy when recovering images from the fastMRI database for the practical case of Cartesian sampling.

[45]  arXiv:2010.11322 [pdf, ps, other]
Title: Learning to Summarize Long Texts with Memory Compression and Transfer
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures and we explore its use for abstractive document summarization. Mem2Mem transfers "memories" via readable/writable external memory modules that augment both the encoder and decoder. Our memory regularization compresses an encoded input article into a more compact set of sentence representations. Most importantly, the memory compression step performs implicit extraction without labels, sidestepping issues with suboptimal ground-truth data and exposure bias of hybrid extractive-abstractive summarization techniques. By allowing the decoder to read/write over the encoded input memory, the model learns to read salient information about the input article while keeping track of what has been generated. Our Mem2Mem approach yields results that are competitive with state of the art transformer based summarization methods, but with 16 times fewer parameters

[46]  arXiv:2010.11323 [pdf, other]
Title: Learning to Plan Optimally with Flow-based Motion Planner
Authors: Tin Lai, Fabio Ramos
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Sampling-based motion planning is the predominant paradigm in many real-world robotic applications, but its performance is immensely dependent on the quality of the samples. The majority of traditional planners are inefficient as they use uninformative sampling distributions as opposed to exploiting structures and patterns in the problem to guide better sampling strategies. Moreover, most current learning-based planners are susceptible to posterior collapse or mode collapse due to the sparsity and highly varying nature of C-Space and motion plan configurations. In this work, we introduce a conditional normalising flow based distribution learned through previous experiences to improve sampling of these methods. Our distribution can be conditioned on the current problem instance to provide an informative prior for sampling configurations within promising regions. When we train our sampler with an expert planner, the resulting distribution is often near-optimal, and the planner can find a solution faster, with less invalid samples, and less initial cost. The normalising flow based distribution uses simple invertible transformations that are very computationally efficient, and our optimisation formulation explicitly avoids mode collapse in contrast to other existing learning-based planners. Finally, we provide a formulation and theoretical foundation to efficiently sample from the distribution; and demonstrate experimentally that, by using our normalising flow based distribution, a solution can be found faster, with less samples and better overall runtime performance.

[47]  arXiv:2010.11325 [pdf, other]
Title: Probing and Fine-tuning Reading Comprehension Models for Few-shot Event Extraction
Subjects: Computation and Language (cs.CL)

We study the problem of event extraction from text data, which requires both detecting target event types and their arguments. Typically, both the event detection and argument detection subtasks are formulated as supervised sequence labeling problems. We argue that the event extraction models so trained are inherently label-hungry, and can generalize poorly across domains and text genres.We propose a reading comprehension framework for event extraction.Specifically, we formulate event detection as a textual entailment prediction problem, and argument detection as a question answer-ing problem. By constructing proper query templates, our approach can effectively distill rich knowledge about tasks and label semantics from pretrained reading comprehension models. Moreover, our model can be fine-tuned with a small amount of data to boost its performance. Our experiment results show that our method performs strongly for zero-shot and few-shot event extraction, and it achieves state-of-the-art performance on the ACE 2005 benchmark when trained with full supervision.

[48]  arXiv:2010.11326 [pdf, other]
Title: Fast and Robust Bio-inspired Teach and Repeat Navigation
Comments: 8 pages, 8 figures, paper is currently under review
Subjects: Robotics (cs.RO)

Fully autonomous mobile robots have a multitude of potential applications, but guaranteeing robust navigation performance remains an open research problem. For many tasks such as repeated infrastructure inspection, item delivery or inventory transport, a route repeating capability rather than full navigation stack can be sufficient and offers potential practical advantages. Previous teach and repeat research has achieved high performance in difficult conditions generally by using sophisticated, often expensive sensors, and has often had high computational requirements. Biological systems, such as small animals and insects like seeing ants, offer a proof of concept that robust and generalisable navigation can be achieved with extremely limited visual systems and computing power. In this work we create a novel asynchronous formulation for teach and repeat navigation that fully utilises odometry information, paired with a correction signal driven by much more computationally lightweight visual processing than is typically required. This correction signal is also decoupled from the robot's motor control, allowing its rate to be modulated by the available computing capacity. We evaluate this approach with extensive experimentation on two different robotic platforms, the Consequential Robotics Miro and the Clearpath Jackal robots, across navigation trials totalling more than 6000 metres in a range of challenging indoor and outdoor environments. Our approach is more robust and requires significantly less compute than the state-of-the-art. It is also capable of intervention-free -- no parameter changes required -- cross-platform generalisation, learning to navigate a route on one robot and repeating that route on a different type of robot with different camera.

[49]  arXiv:2010.11327 [pdf, other]
Title: Meta-Learning Guarantees for Online Receding Horizon Control
Comments: arXiv admin note: substantial text overlap with arXiv:2008.13265, arXiv:2010.07269
Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)

In this paper we provide provable regret guarantees for an online meta-learning receding horizon control algorithm in an iterative control setting, where in each iteration the system to be controlled is a linear deterministic system that is different and unknown, the cost for the controller in an iteration is a general additive cost function and the control input is required to be constrained, which if violated incurs an additional cost. We prove (i) that the algorithm achieves a regret for the controller cost and constraint violation that are $O(T^{3/4})$ for an episode of duration $T$ with respect to the best policy that satisfies the control input control constraints and (ii) that the average of the regret for the controller cost and constraint violation with respect to the same policy vary as $O((1+1/\sqrt{N})T^{3/4})$ with the number of iterations $N$, showing that the worst regret for the learning within an iteration continuously improves with experience of more iterations.

[50]  arXiv:2010.11328 [pdf, other]
Title: Logic Guided Genetic Algorithms
Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Symbolic Computation (cs.SC)

We present a novel Auxiliary Truth enhanced Genetic Algorithm (GA) that uses logical or mathematical constraints as a means of data augmentation as well as to compute loss (in conjunction with the traditional MSE), with the aim of increasing both data efficiency and accuracy of symbolic regression (SR) algorithms. Our method, logic-guided genetic algorithm (LGGA), takes as input a set of labelled data points and auxiliary truths (ATs) (mathematical facts known a priori about the unknown function the regressor aims to learn) and outputs a specially generated and curated dataset that can be used with any SR method. Three key insights underpin our method: first, SR users often know simple ATs about the function they are trying to learn. Second, whenever an SR system produces a candidate equation inconsistent with these ATs, we can compute a counterexample to prove the inconsistency, and further, this counterexample may be used to augment the dataset and fed back to the SR system in a corrective feedback loop. Third, the value addition of these ATs is that their use in both the loss function and the data augmentation process leads to better rates of convergence, accuracy, and data efficiency. We evaluate LGGA against state-of-the-art SR tools, namely, Eureqa and TuringBot on 16 physics equations from "The Feynman Lectures on Physics" book. We find that using these SR tools in conjunction with LGGA results in them solving up to 30.0% more equations, needing only a fraction of the amount of data compared to the same tool without LGGA, i.e., resulting in up to a 61.9% improvement in data efficiency.

[51]  arXiv:2010.11333 [pdf, other]
Title: Linking Entities to Unseen Knowledge Bases with Arbitrary Schemas
Subjects: Computation and Language (cs.CL)

In entity linking, mentions of named entities in raw text are disambiguated against a knowledge base (KB). This work focuses on linking to unseen KBs that do not have training data and whose schema is unknown during training. Our approach relies on methods to flexibly convert entities from arbitrary KBs with several attribute-value pairs into flat strings, which we use in conjunction with state-of-the-art models for zero-shot linking. To improve the generalization of our model, we use two regularization schemes based on shuffling of entity attributes and handling of unseen attributes. Experiments on English datasets where models are trained on the CoNLL dataset, and tested on the TAC-KBP 2010 dataset show that our models outperform baseline models by over 12 points of accuracy. Unlike prior work, our approach also allows for seamlessly combining multiple training datasets. We test this ability by adding both a completely different dataset (Wikia), as well as increasing amount of training data from the TAC-KBP 2010 training set. Our models perform favorably across the board.

[52]  arXiv:2010.11334 [pdf, other]
Title: NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
Comments: Accepted in WANLP 2020
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). The shared task includes two subtasks: country level dialect identification (Subtask 1) and province level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries, and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions to Subtask 2 from 9 teams.

[53]  arXiv:2010.11338 [pdf, other]
Title: A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks
Subjects: Computation and Language (cs.CL)

Attention-based sequence-to-sequence modeling provides a powerful and elegant solution for applications that need to map one sequence to a different sequence. Its success heavily relies on the availability of large amounts of training data. This presents a challenge for speech applications where labelled speech data is very expensive to obtain, such as automatic speech recognition (ASR) and speech translation (ST). In this study, we propose a general multi-task learning framework to leverage text data for ASR and ST tasks. Two auxiliary tasks, a denoising autoencoder task and machine translation task, are proposed to be co-trained with ASR and ST tasks respectively. We demonstrate that representing text input as phoneme sequences can reduce the difference between speech and text inputs, and enhance the knowledge transfer from text corpora to the speech to text tasks. Our experiments show that the proposed method achieves a relative 10~15% word error rate reduction on the English Librispeech task, and improves the speech translation quality on the MuST-C tasks by 4.2~11.1 BLEU.

[54]  arXiv:2010.11339 [pdf, ps, other]
Title: Voronoi Convolutional Neural Networks
Comments: Technical report
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this technical report, we investigate extending convolutional neural networks to the setting where functions are not sampled in a grid pattern. We show that by treating the samples as the average of a function within a cell, we can find a natural equivalent of most layers used in CNN. We also present an algorithm for running inference for these models exactly using standard convex geometry algorithms.

[55]  arXiv:2010.11340 [pdf]
Title: Photovoltaic (PV) Virtual Inertia and Fast Frequency Regulation in High PV Power Grids
Authors: Shutang You
Comments: 7 pages, 14 figures
Subjects: Systems and Control (eess.SY)

This paper studies the frequency response using PV. Multiple control strategies are considered and simulated in the high PV ERCOT model, including inertia control, synthetic governor control, and AGC control. The impact of different parameters in PV inertia control and their correlation and impact on frequency response are analyzed. The simulation results show that PV farm has potential to provide multiple types of grid service to support system frequency. This paper also proposed a distributed fast frequency control approach that can better leverage the PV headroom reserve to improve the system frequency nadir after contingencies.

[56]  arXiv:2010.11341 [pdf, other]
Title: Density of States Graph Kernels
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Numerical Analysis (math.NA)

An important problem on graph-structured data is that of quantifying similarity between graphs. Graph kernels are an established technique for such tasks; in particular, those based on random walks and return probabilities have proven to be effective in wide-ranging applications, from bioinformatics to social networks to computer vision. However, random walk kernels generally suffer from slowness and tottering, an effect which causes walks to overemphasize local graph topology, undercutting the importance of global structure. To correct for these issues, we recast return probability graph kernels under the more general framework of density of states -- a framework which uses the lens of spectral analysis to uncover graph motifs and properties hidden within the interior of the spectrum -- and use our interpretation to construct scalable, composite density of states based graph kernels which balance local and global information, leading to higher classification accuracies on a host of benchmark datasets.

[57]  arXiv:2010.11342 [pdf, ps, other]
Title: Contextual Linear Types for Differential Privacy
Comments: Journal submission
Subjects: Programming Languages (cs.PL); Logic in Computer Science (cs.LO)

Language support for differentially-private programming is both crucial and delicate. While elaborate program logics can be very expressive, type-system based approaches using linear types tend to be more lightweight and amenable to automatic checking and inference, and in particular in the presence of higher-order programming. Since the seminal design of Fuzz, which is restricted to ${\epsilon}$-differential privacy, a lot of effort has been made to support more advanced variants of differential privacy, like $({\epsilon},{\delta})$-differential privacy. However, no existing type system supports these advanced privacy variants while also supporting higher-order programming in full generality. We present Jazz, a language and type system which uses linear types and latent contextual effects to support both advanced variants of differential privacy and higher order programming . Even when avoiding advanced variants and higher order programming, our system achieves higher precision than prior work for a large class of programming patterns. We formalize the core of the Jazz language, prove it sound for privacy via a logical relation for metric preservation, and illustrate its expressive power through a number of case studies drawn from the recent differential privacy literature.

[58]  arXiv:2010.11344 [pdf, other]
Title: Trajectory Prediction using Equivariant Continuous Convolution
Comments: 16 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Trajectory prediction is a critical part of many AI applications, for example, the safe operation of autonomous vehicles. However, current methods are prone to making inconsistent and physically unrealistic predictions. We leverage insights from fluid dynamics to overcome this limitation by considering internal symmetry in trajectories. We propose a novel model, Equivariant Continous COnvolution (ECCO) for improved trajectory prediction. ECCO uses rotationally-equivariant continuous convolutions to embed the symmetries of the system. On two real-world vehicle and pedestrian trajectory datasets, ECCO attains competitive accuracy with significantly fewer parameters. It is also more sample efficient, generalizing automatically from few data points in any orientation. Lastly, ECCO improves generalization with equivariance, resulting in more physically consistent predictions. Our method provides a fresh perspective towards increasing trust and transparency in deep learning models.

[59]  arXiv:2010.11349 [pdf, ps, other]
Title: LSTM-LM with Long-Term History for First-Pass Decoding in Conversational Speech Recognition
Comments: 5 pages
Subjects: Computation and Language (cs.CL)

LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM is composed with a WFST decoder on-the-fly for the first-pass decoding. Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of context beyond the current utterance is explored for the first-pass decoding in conversational speech recognition. The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively. The experimental results in our internal meeting transcription system show that significant performance improvements can be obtained by incorporating the contextual information with LSTM-LMs in the first-pass decoding, compared to applying the contextual information in the second-pass rescoring.

[60]  arXiv:2010.11351 [pdf, other]
Title: Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures
Subjects: Computation and Language (cs.CL)

Measuring sentence semantic similarity using pre-trained language models such as BERT generally yields unsatisfactory zero-shot performance, and one main reason is ineffective token aggregation methods such as mean pooling. In this paper, we demonstrate under a Bayesian framework that distance between primitive statistics such as the mean of word embeddings are fundamentally flawed for capturing sentence-level semantic similarity. To remedy this issue, we propose to learn a categorical variational autoencoder (VAE) based on off-the-shelf pre-trained language models. We theoretically prove that measuring the distance between the latent categorical mixtures, namely Latte-Mix, can better reflect the true sentence semantic similarity. In addition, our Bayesian framework provides explanations for why models finetuned on labelled sentence pairs have better zero-shot performance. We also empirically demonstrate that these finetuned models could be further improved by Latte-Mix. Our method not only yields the state-of-the-art zero-shot performance on semantic similarity datasets such as STS, but also enjoy the benefits of fast training and having small memory footprints.

[61]  arXiv:2010.11352 [pdf, other]
Title: Class-Conditional Defense GAN Against End-to-End Speech Attacks
Comments: 5 pages
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In this paper we propose a novel defense approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo. Unlike conventional defense approaches, the proposed approach does not directly employ low-level transformations such as autoencoding a given input signal aiming at removing potential adversarial perturbation. Instead of that, we find an optimal input vector for a class conditional generative adversarial network through minimizing the relative chordal distance adjustment between a given test input and the generator network. Then, we reconstruct the 1D signal from the synthesized spectrogram and the original phase information derived from the given input signal. Hence, this reconstruction does not add any extra noise to the signal and according to our experimental results, our defense-GAN considerably outperforms conventional defense algorithms both in terms of word error rate and sentence level recognition accuracy.

[62]  arXiv:2010.11353 [pdf, ps, other]
Title: Bandwidth-Adaptive Feature Sharing for Cooperative LIDAR Object Detection
Comments: 8 pages, 4 figures, 2 table, 2020 IEEE 3rd Connected and Automated Vehicles Symposium: IEEE CAVS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Situational awareness as a necessity in the connected and autonomous vehicles (CAV) domain is the subject of a significant number of researches in recent years. The driver's safety is directly dependent on the robustness, reliability, and scalability of such systems. Cooperative mechanisms have provided a solution to improve situational awareness by utilizing high speed wireless vehicular networks. These mechanisms mitigate problems such as occlusion and sensor range limitation. However, the network capacity is a factor determining the maximum amount of information being shared among cooperative entities. The notion of feature sharing, proposed in our previous work, aims to address these challenges by maintaining a balance between computation and communication load. In this work, we propose a mechanism to add flexibility in adapting to communication channel capacity and a novel decentralized shared data alignment method to further improve cooperative object detection performance. The performance of the proposed framework is verified through experiments on Volony dataset. The results confirm that our proposed framework outperforms our previous cooperative object detection method (FS-COD) in terms of average precision.

[63]  arXiv:2010.11354 [pdf, other]
Title: PHEW: Paths with higher edge-weights give "winning tickets" without training data
Comments: 14 pages, 12 figures, 3 tables
Subjects: Machine Learning (cs.LG)

Sparse neural networks have generated substantial interest recently because they can be more efficient in learning and inference, without any significant drop in performance. The "lottery ticket hypothesis" has showed the existence of such sparse subnetworks at initialization. Given a fully-connected initialized architecture, our aim is to find such "winning ticket" networks, without any training data. We first show the advantages of forming input-output paths, over pruning individual connections, to avoid bottlenecks in gradient propagation. Then, we show that Paths with Higher Edge-Weights (PHEW) at initialization have higher loss gradient magnitude, resulting in more efficient training. Selecting such paths can be performed without any data. We empirically validate the effectiveness of the proposed approach against pruning-before-training methods on CIFAR10, CIFAR100 and Tiny-ImageNet for VGG-Net and ResNet. PHEW achieves significant improvements on the current state-of-the-art methods at 10\%, 5\% and 2\% network density. We also evaluate the structural similarity relationship between PHEW networks and pruned networks constructed through Iterated Magnitude Pruning (IMP), concluding that the former belong in the family of winning tickets networks.

[64]  arXiv:2010.11358 [pdf, other]
Title: N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

We use neural ordinary differential equations to formulate a variant of the Transformer that is depth-adaptive in the sense that an input-dependent number of time steps is taken by the ordinary differential equation solver. Our goal in proposing the N-ODE Transformer is to investigate whether its depth-adaptivity may aid in overcoming some specific known theoretical limitations of the Transformer in handling nonlocal effects. Specifically, we consider the simple problem of determining the parity of a binary sequence, for which the standard Transformer has known limitations that can only be overcome by using a sufficiently large number of layers or attention heads. We find, however, that the depth-adaptivity of the N-ODE Transformer does not provide a remedy for the inherently nonlocal nature of the parity problem, and provide explanations for why this is so. Next, we pursue regularization of the N-ODE Transformer by penalizing the arclength of the ODE trajectories, but find that this fails to improve the accuracy or efficiency of the N-ODE Transformer on the challenging parity problem. We suggest future avenues of research for modifications and extensions of the N-ODE Transformer that may lead to improved accuracy and efficiency for sequence modelling tasks such as neural machine translation.

[65]  arXiv:2010.11362 [pdf, other]
Title: NU-GAN: High resolution neural upsampling with GAN
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling). Audio upsampling is an important problem since productionizing generative speech technology requires operating at high sampling rates. Such applications use audio at a resolution of 44.1 kHz or 48 kHz, whereas current speech synthesis methods are equipped to handle a maximum of 24 kHz resolution. NU-GAN takes a leap towards solving audio upsampling as a separate component in the text-to-speech (TTS) pipeline by leveraging techniques for audio generation using GANs. ABX preference tests indicate that our NU-GAN resampler is capable of resampling 22 kHz to 44.1 kHz audio that is distinguishable from original audio only 7.4% higher than random chance for single speaker dataset, and 10.8% higher than chance for multi-speaker dataset.

[66]  arXiv:2010.11363 [pdf, other]
Title: QISTA-Net: DNN Architecture to Solve $\ell_q$-norm Minimization Problem and Image Compressed Sensing
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

In this paper, we reformulate the non-convex $\ell_q$-norm minimization problem with $q\in(0,1)$ into a 2-step problem, which consists of one convex and one non-convex subproblems, and propose a novel iterative algorithm called QISTA ($\ell_q$-ISTA) to solve the $\left(\ell_q\right)$-problem. By taking advantage of deep learning in accelerating optimization algorithms, together with the speedup strategy that using the momentum from all previous layers in the network, we propose a learning-based method, called QISTA-Net-s, to solve the sparse signal reconstruction problem. Extensive experimental comparisons demonstrate that the QISTA-Net-s yield better reconstruction qualities than state-of-the-art $\ell_1$-norm optimization (plus learning) algorithms even if the original sparse signal is noisy. On the other hand, based on the network architecture associated with QISTA, with considering the use of convolution layers, we proposed the QISTA-Net-n for solving the image CS problem, and the performance of the reconstruction still outperforms most of the state-of-the-art natural images reconstruction methods. QISTA-Net-n is designed in unfolding QISTA and adding the convolutional operator as the dictionary. This makes QISTA-Net-s interpretable. We provide complete experimental results that QISTA-Net-s and QISTA-Net-n contribute the better reconstruction performance than the competing.

[67]  arXiv:2010.11364 [pdf, other]
Title: Sample Efficient Reinforcement Learning with REINFORCE
Comments: 35 pages
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works have either required exact gradients or state-action visitation measure based mini-batch stochastic gradients with a diverging batch size, which limit their applicability in practical scenarios. In this paper, we consider classical policy gradient methods that compute an approximate gradient with a single trajectory or a fixed size mini-batch of trajectories, along with the widely-used REINFORCE gradient estimation procedure. By controlling the number of "bad" episodes and resorting to the classical doubling trick, we establish an anytime sub-linear high probability regret bound as well as almost sure global convergence of the average regret with an asymptotically sub-linear rate. These provide the first set of global convergence and sample efficiency results for the well-known REINFORCE algorithm and contribute to a better understanding of its performance in practice.

[68]  arXiv:2010.11365 [pdf, other]
Title: On a Guided Nonnegative Matrix Factorization
Comments: 6 pages, 6 tables
Subjects: Machine Learning (cs.LG)

Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-than-meaningful or even redundant topics when the data is biased towards a set of features. For this reason, we propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed \textit{Guided NMF}, that incorporates user-designed seed word supervision. Our experimental results demonstrate the promise of this model and illustrate that it is competitive with other methods of this ilk with only very little supervision information.

[69]  arXiv:2010.11367 [pdf, other]
Title: TeX-Graph: Coupled tensor-matrix knowledge-graph embedding for COVID-19 drug repurposing
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG); Signal Processing (eess.SP); Methodology (stat.ME)

Knowledge graphs (KGs) are powerful tools that codify relational behaviour between entities in knowledge bases. KGs can simultaneously model many different types of subject-predicate-object and higher-order relations. As such, they offer a flexible modeling framework that has been applied to many areas, including biology and pharmacology -- most recently, in the fight against COVID-19. The flexibility of KG modeling is both a blessing and a challenge from the learning point of view. In this paper we propose a novel coupled tensor-matrix framework for KG embedding. We leverage tensor factorization tools to learn concise representations of entities and relations in knowledge bases and employ these representations to perform drug repurposing for COVID-19. Our proposed framework is principled, elegant, and achieves 100% improvement over the best baseline in the COVID-19 drug repurposing task using a recently developed biological KG.

[70]  arXiv:2010.11369 [pdf, other]
Title: Learning Graph-Based Priors for Generalized Zero-Shot Learning
Comments: Presented at AAAI 2020 Workshop on Deep Learning on Graphs: Methodologies and Applications (DLGMA'20)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The task of zero-shot learning (ZSL) requires correctly predicting the label of samples from classes which were unseen at training time. This is achieved by leveraging side information about class labels, such as label attributes or word embeddings. Recently, attention has shifted to the more realistic task of generalized ZSL (GZSL) where test sets consist of seen and unseen samples. Recent approaches to GZSL have shown the value of generative models, which are used to generate samples from unseen classes. In this work, we incorporate an additional source of side information in the form of a relation graph over labels. We leverage this graph in order to learn a set of prior distributions, which encourage an aligned variational autoencoder (VAE) model to learn embeddings which respect the graph structure. Using this approach we are able to achieve improved performance on the CUB and SUN benchmarks over a strong baseline.

[71]  arXiv:2010.11372 [pdf, other]
Title: Symmetrical Z-Complementary Code Sets (SZCCSs) for Optimal Training in Generalized Spatial Modulation
Comments: 13 pages, 7 figures, submitted to IEEE Transactions on Signal Processing
Subjects: Information Theory (cs.IT)

This paper introduces a novel class of code sets, called "symmetrical Z-complementary code sets (SZCCSs)" , whose aperiodic auto- and cross- correlation sums exhibit zero-correlation zones (ZCZs) at both the front-end and tail-end of the entire correlation window. Three constructions of (optimal) SZCCSs based on general Boolean functions are presented. As a second major contribution, we apply SZCCSs to design optimal training sequences for broadband generalized spatial modulation (GSM) systems over frequency-selective channels.
Key words: Complementary code set, channel estimation, training sequence design, generalized spatial modulation, frequency-selective channels.

[72]  arXiv:2010.11374 [pdf, other]
Title: Stronger Transformers for Neural Multi-Hop Question Generation
Comments: Code will be made available
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Prior work on automated question generation has almost exclusively focused on generating simple questions whose answers can be extracted from a single document. However, there is an increasing interest in developing systems that are capable of more complex multi-hop question generation, where answering the questions requires reasoning over multiple documents. In this work, we introduce a series of strong transformer models for multi-hop question generation, including a graph-augmented transformer that leverages relations between entities in the text. While prior work has emphasized the importance of graph-based models, we show that we can substantially outperform the state-of-the-art by 5 BLEU points using a standard transformer architecture. We further demonstrate that graph-based augmentations can provide complimentary improvements on top of this foundation. Interestingly, we find that several important factors--such as the inclusion of an auxiliary contrastive objective and data filtering could have larger impacts on performance. We hope that our stronger baselines and analysis provide a constructive foundation for future work in this area.

[73]  arXiv:2010.11376 [pdf, other]
Title: Heterogeneous Vehicle Routing and Teaming with Gaussian Distributed Energy Uncertainty
Comments: IROS 2020
Subjects: Robotics (cs.RO); Multiagent Systems (cs.MA)

For robot swarms operating on complex missions in an uncertain environment, it is important that the decision-making algorithm considers both heterogeneity and uncertainty. This paper presents a stochastic programming framework for the vehicle routing problem with stochastic travel energy costs and heterogeneous vehicles and tasks. We represent the heterogeneity as linear constraints, estimate the uncertain energy cost through Gaussian process regression, formulate this stochasticity as chance constraints or stochastic recourse costs, and then solve the stochastic programs using branch and cut algorithms to minimize the expected energy cost. The performance and practicality are demonstrated through extensive computational experiments and a practical test case.

[74]  arXiv:2010.11377 [pdf, other]
Title: A New Block Preconditioner for Implicit Runge-Kutta Methods for Parabolic PDE
Comments: 15 pages, 10 figures
Subjects: Numerical Analysis (math.NA)

A new preconditioner based on a block LDU factorization with algebraic multigrid subsolves for scalability is introduced for the large, structured systems appearing in implicit Runge-Kutta time integration of parabolic partial differential equations. This preconditioner is compared in condition number and eigenvalue distribution, and in numerical experiments with others in the literature: block Jacobi, block Gauss-Seidel, and the optimized block Gauss-Seidel method of 10.4173/mic.2006.2.3. Experiments are run with implicit Runge-Kutta stages up to s=7, and it is found that the new preconditioner outperforms the others, with the improvement becoming more pronounced as spatial discretization is refined and as temporal order is increased.

[75]  arXiv:2010.11378 [pdf, other]
Title: Learning Occupancy Function from Point Clouds for Surface Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Implicit function based surface reconstruction has been studied for a long time to recover 3D shapes from point clouds sampled from surfaces. Recently, Signed Distance Functions (SDFs) and Occupany Functions are adopted in learning-based shape reconstruction methods as implicit 3D shape representation. This paper proposes a novel method for learning occupancy functions from sparse point clouds and achieves better performance on challenging surface reconstruction tasks. Unlike the previous methods, which predict point occupancy with fully-connected multi-layer networks, we adapt the point cloud deep learning architecture, Point Convolution Neural Network (PCNN), to build our learning model. Specifically, we create a sampling operator and insert it into PCNN to continuously sample the feature space at the points where occupancy states need to be predicted. This method natively obtains point cloud data's geometric nature, and it's invariant to point permutation. Our occupancy function learning can be easily fit into procedures of point cloud up-sampling and surface reconstruction. Our experiments show state-of-the-art performance for reconstructing With ShapeNet dataset and demonstrate this method's well-generalization by testing it with McGill 3D dataset \cite{siddiqi2008retrieving}. Moreover, we find the learned occupancy function is relatively more rotation invariant than previous shape learning methods.

[76]  arXiv:2010.11380 [pdf, ps, other]
Title: A Hybrid Approach to Coded Compressed Sensing where Coupling Takes Place via the Outer Code
Subjects: Information Theory (cs.IT)

This article seeks to advance coded compressed sensing (CCS) as a practical scheme for unsourced random access. The original CCS algorithm features a concatenated structure where an inner code is tasked with support recovery, and an outer tree code conducts message disambiguation. Recently, a link between CCS and sparse regression codes was established, leading to the application of approximate message passing (AMP) to CCS. This connection was subsequently strengthened by integrating AMP and belief propagation on the outer code through a dynamic denoiser. Along these lines, this work shows how block diagonal sensing matrices akin to those used in traditional CCS, together with the aforementioned dynamic denoiser, form an effective means to get good performance at low-complexity. This novel architecture can be used to scale this scheme to dimensions that were previously impractical. Findings are supported by numerical simulations.

[77]  arXiv:2010.11381 [pdf, ps, other]
Title: Query strategies for priced information, revisited
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

We consider the problem of designing query strategies for priced information, introduced by Charikar et al. In this problem the algorithm designer is given a function $f : \{0,1\}^n \to \{-1,1\}$ and a price associated with each of the $n$ coordinates. The goal is to design a query strategy for determining $f$'s value on unknown inputs for minimum cost.
Prior works on this problem have focused on specific classes of functions. We analyze a simple and natural strategy that applies to all functions $f$, and show that its performance relative to the optimal strategy can be expressed in terms of a basic complexity measure of $f$, its influence. For $\varepsilon \in (0,\frac1{2})$, writing $\mathsf{opt}$ to denote the expected cost of the optimal strategy that errs on at most an $\varepsilon$-fraction of inputs, our strategy has expected cost $\mathsf{opt} \cdot \mathrm{Inf}(f)/\varepsilon^2$ and also errs on at most an $O(\varepsilon)$-fraction of inputs. This connection yields new guarantees that complement existing ones for a number of function classes that have been studied in this context, as well as new guarantees for new classes.
Finally, we show that improving on the parameters that we achieve will require making progress on the longstanding open problem of properly learning decision trees.

[78]  arXiv:2010.11383 [pdf, ps, other]
Title: Exploit Multiple Reference Graphs for Semi-supervised Relation Extraction
Authors: Wanli Li, Tieyun Qian
Subjects: Computation and Language (cs.CL)

Manual annotation of the labeled data for relation extraction is time-consuming and labor-intensive. Semi-supervised methods can offer helping hands for this problem and have aroused great research interests. Existing work focuses on mapping the unlabeled samples to the classes to augment the labeled dataset. However, it is hard to find an overall good mapping function, especially for the samples with complicated syntactic components in one sentence.
To tackle this limitation, we propose to build the connection between the unlabeled data and the labeled ones rather than directly mapping the unlabeled samples to the classes. Specifically, we first use three kinds of information to construct reference graphs, including entity reference, verb reference, and semantics reference. The goal is to semantically or lexically connect the unlabeled sample(s) to the labeled one(s). Then, we develop a Multiple Reference Graph (MRefG) model to exploit the reference information for better recognizing high-quality unlabeled samples. The effectiveness of our method is demonstrated by extensive comparison experiments with the state-of-the-art baselines on two public datasets.

[79]  arXiv:2010.11384 [pdf, other]
Title: A Disentangled Adversarial Neural Topic Model for Separating Opinions from Plots in User Reviews
Comments: 12 pages, 4 figures
Subjects: Computation and Language (cs.CL)

The flexibility of the inference process in Variational Autoencoders (VAEs) has recently led to revising traditional probabilistic topic models giving rise to Neural Topic Models (NTM). Although these approaches have achieved significant results, surprisingly very little work has been done on how to disentangle the latent topics. Existing topic models when applied to reviews may extract topics associated with writers' subjective opinions mixed with those related to factual descriptions such as plot summaries in movie and book reviews. It is thus desirable to automatically separate opinion topics from plot/neutral ones enabling a better interpretability. In this paper, we propose a neural topic model combined with adversarial training to disentangle opinion topics from plot and neutral ones. We conduct an extensive experimental assessment introducing a new collection of movie and book reviews paired with their plots, namely MOBO dataset, showing an improved coherence and variety of topics, a consistent disentanglement rate, and sentiment classification performance superior to other supervised topic models.

[80]  arXiv:2010.11386 [pdf, other]
Title: Distilling Dense Representations for Ranking using Tightly-Coupled Teachers
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)

We present an approach to ranking with dense representations that applies knowledge distillation to improve the recently proposed late-interaction ColBERT model. Specifically, we distill the knowledge from ColBERT's expressive MaxSim operator for computing relevance scores into a simple dot product, thus enabling single-step ANN search. Our key insight is that during distillation, tight coupling between the teacher model and the student model enables more flexible distillation strategies and yields better learned representations. We empirically show that our approach improves query latency and greatly reduces the onerous storage requirements of ColBERT, while only making modest sacrifices in terms of effectiveness. By combining our dense representations with sparse representations derived from document expansion, we are able to approach the effectiveness of a standard cross-encoder reranker using BERT that is orders of magnitude slower.

[81]  arXiv:2010.11387 [pdf, other]
Title: Kwame: A Bilingual AI Teaching Assistant for Online SuaCode Courses
Authors: George Boateng
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Introductory hands-on courses such as our smartphone-based coding courses, SuaCode require a lot of support for students to accomplish learning goals. Online environments make it even more difficult to get assistance especially more recently because of COVID-19. Given the multilingual context of our students (learners across 38 African countries), in this work, we developed an AI Teaching Assistant (Kwame) that provides answers to students' coding questions from our SuaCode courses in English and French. Kwame is a Sentence-BERT(SBERT)-based question-answering (QA) system that we trained and evaluated using question-answer pairs created from our course's quizzes and students' questions in past cohorts. It finds the paragraph most semantically similar to the question via cosine similarity. We compared the system with TF-IDF and Universal Sentence Encoder. Our results showed that SBERT performed the worst for the duration of 6 secs per question but the best for accuracy and fine-tuning on our course data improved the result.

[82]  arXiv:2010.11388 [pdf, other]
Title: Adversarial Attacks on Deep Algorithmic Trading Policies
Comments: 17 pages - under submission
Subjects: Machine Learning (cs.LG); Trading and Market Microstructure (q-fin.TR)

Deep Reinforcement Learning (DRL) has become an appealing solution to algorithmic trading such as high frequency trading of stocks and cyptocurrencies. However, DRL have been shown to be susceptible to adversarial attacks. It follows that algorithmic trading DRL agents may also be compromised by such adversarial techniques, leading to policy manipulation. In this paper, we develop a threat model for deep trading policies, and propose two attack techniques for manipulating the performance of such policies at test-time. Furthermore, we demonstrate the effectiveness of the proposed attacks against benchmark and real-world DQN trading agents.

[83]  arXiv:2010.11389 [pdf, other]
Title: UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data
Subjects: Machine Learning (cs.LG)

Successful health risk prediction demands accuracy and reliability of the model. Existing predictive models mainly depend on mining electronic health records (EHR) with advanced deep learning techniques to improve model accuracy. However, they all ignore the importance of publicly available online health data, especially socioeconomic status, environmental factors, and detailed demographic information for each location, which are all strong predictive signals and can definitely augment precision medicine. To achieve model reliability, the model needs to provide accurate prediction and uncertainty score of the prediction. However, existing uncertainty estimation approaches often failed in handling high-dimensional data, which are present in multi-sourced data. To fill the gap, we propose UNcertaInTy-based hEalth risk prediction (UNITE) model. Building upon an adaptive multimodal deep kernel and a stochastic variational inference module, UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data including EHR data, patient demographics, and public health data collected from the web. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD). UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19\%$ over the best baseline. We also show UNITE can model meaningful uncertainties and can provide evidence-based clinical support by clustering similar patients.

[84]  arXiv:2010.11395 [pdf, other]
Title: Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Comments: 5 pages
Subjects: Computation and Language (cs.CL)

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition. However, compared to LSTM models, the heavy computational cost of the Transformer during inference is a key issue to prevent their applications. In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. We combine the idea of Transformer-XL and chunk-wise streaming processing to design a streamable Transformer Transducer model. We demonstrate that T-T outperforms the hybrid model, RNN Transducer (RNN-T), and streamable Transformer attention-based encoder-decoder model in the streaming scenario. Furthermore, the runtime cost and latency can be optimized with a relatively small look-ahead.

[85]  arXiv:2010.11397 [pdf, other]
Title: When Machine Learning Meets Congestion Control: A Survey and Comparison
Subjects: Networking and Internet Architecture (cs.NI)

Machine learning (ML) has seen a significant surge and uptake across many diverse applications. The high flexibility, adaptability and computing capabilities it provides extends traditional approaches used in multiple fields including network operation and management. Numerous surveys have explored ML in the context of networking, such as traffic engineering, performance optimization and network security. Many ML approaches focus on clustering, classification, regression and reinforcement learning (RL). The innovation of this research and contribution of this paper lies in the detailed summary and comparison of learning-based congestion control (CC) approaches. Compared with traditional CC algorithms which are typically rule-based, capabilities to learn from historical experience are highly desirable. From the literature, it is observed that RL is a crucial trend among learning-based CC algorithms. In this paper, we explore the performance of RL-based CC algorithms and present current problems with RL-based CC algorithms. We outline challenges and trends related to learning-based CC algorithms.

[86]  arXiv:2010.11398 [pdf, other]
Title: DPD-InfoGAN: Differentially Private Distributed InfoGAN
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Generative Adversarial Networks (GANs) are deep learning architectures capable of generating synthetic datasets. Despite producing high-quality synthetic images, the default GAN has no control over the kinds of images it generates. The Information Maximizing GAN (InfoGAN) is a variant of the default GAN that introduces feature-control variables that are automatically learned by the framework, hence providing greater control over the different kinds of images produced. Due to the high model complexity of InfoGAN, the generative distribution tends to be concentrated around the training data points. This is a critical problem as the models may inadvertently expose the sensitive and private information present in the dataset. To address this problem, we propose a differentially private version of InfoGAN (DP-InfoGAN). We also extend our framework to a distributed setting (DPD-InfoGAN) to allow clients to learn different attributes present in other clients' datasets in a privacy-preserving manner. In our experiments, we show that both DP-InfoGAN and DPD-InfoGAN can synthesize high-quality images with flexible control over image attributes while preserving privacy.

[87]  arXiv:2010.11400 [pdf, ps, other]
Title: Energy-Efficient Node Deployment in Heterogeneous Two-Tier Wireless Sensor Networks with Limited Communication Range
Comments: arXiv admin note: substantial text overlap with arXiv:1901.06742
Subjects: Information Theory (cs.IT)

We study a heterogeneous two-tier wireless sensor network in which N heterogeneous access points (APs) collect sensing data from densely distributed sensors and then forward the data to M heterogeneous fusion centers (FCs). This heterogeneous node deployment problem is modeled as an optimization problem with the total power consumption of the network as its cost function. The necessary conditions of the optimal AP and FC node deployment are explored in this paper. We provide a variation of Voronoi Diagram as the optimal cell partition for this network and show that each AP should be placed between its connected FC and the geometric center of its cell partition. In addition, we propose a heterogeneous two-tier Lloyd algorithm to optimize the node deployment. Furthermore, we study the sensor deployment when the communication range is limited for sensors and APs. Simulation results show that our proposed algorithms outperform the existing clustering methods like Minimum Energy Routing, Agglomerative Clustering, Divisive Clustering, Particle Swarm Optimization, Relay Node placement in Double-tiered Wireless Sensor Networks, and Improved Relay Node Placement, on average.

[88]  arXiv:2010.11401 [pdf, other]
Title: Learning Transferrable Parameters for Long-tailed Sequential User Behavior Modeling
Comments: this https URL
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Sequential user behavior modeling plays a crucial role in online user-oriented services, such as product purchasing, news feed consumption, and online advertising. The performance of sequential modeling heavily depends on the scale and quality of historical behaviors. However, the number of user behaviors inherently follows a long-tailed distribution, which has been seldom explored. In this work, we argue that focusing on tail users could bring more benefits and address the long tails issue by learning transferrable parameters from both optimization and feature perspectives. Specifically, we propose a gradient alignment optimizer and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail. Such methods can also deal with the cold-start problem of new users. Moreover, it could be directly adaptive to various well-established sequential models. Extensive experiments on four real-world datasets verify the superiority of our framework compared with the state-of-the-art baselines.

[89]  arXiv:2010.11411 [pdf, other]
Title: Value Cards: An Educational Toolkit for Teaching Social Impacts of Machine Learning through Deliberation
Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG)

Recently, there have been increasing calls for computer science curricula to complement existing technical training with topics related to Fairness, Accountability, Transparency, and Ethics. In this paper, we present Value Card, an educational toolkit to inform students and practitioners of the social impacts of different machine learning models via deliberation. This paper presents an early use of our approach in a college-level computer science course. Through an in-class activity, we report empirical data for the initial effectiveness of our approach. Our results suggest that the use of the Value Cards toolkit can improve students' understanding of both the technical definitions and trade-offs of performance metrics and apply them in real-world contexts, help them recognize the significance of considering diverse social values in the development of deployment of algorithmic systems, and enable them to communicate, negotiate and synthesize the perspectives of diverse stakeholders. Our study also demonstrates a number of caveats we need to consider when using the different variants of the Value Cards toolkit. Finally, we discuss the challenges as well as future applications of our approach.

[90]  arXiv:2010.11413 [pdf, other]
Title: Predicting Human Decision Making in Psychological Tasks with Recurrent Neural Networks
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)

Unlike traditional time series, the action sequences of human decision making usually involve many cognitive processes such as beliefs, desires, intentions and theory of mind, i.e. what others are thinking. This makes predicting human decision making challenging to be treated agnostically to the underlying psychological mechanisms. We propose to use a recurrent neural network architecture based on long short-term memory networks (LSTM) to predict the time series of the actions taken by the human subjects at each step of their decision making, the first application of such methods in this research domain. We trained our prediction networks on the behavioral data from several published psychological experiments of human decision making, and demonstrated a clear advantage over the state-of-the-art methods in predicting human decision making trajectories in both single-agent scenarios such as Iowa Gambling Task and multi-agent scenarios such as Iterated Prisoner's Dilemma.

[91]  arXiv:2010.11415 [pdf, other]
Title: Maximum Mean Discrepancy is Aware of Adversarial Attacks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The maximum mean discrepancy (MMD) test, as a representative two-sample test, could in principle detect any distributional discrepancy between two datasets. However, it has been shown that MMD is unaware of adversarial attacks---MMD failed to detect the discrepancy between natural data and adversarial data generated by adversarial attacks. Given this phenomenon, we raise a question: are natural and adversarial data really from different distributions but previous use of MMD on the purpose missed some key factors? The answer is affirmative. We find the previous use missed three factors and accordingly we propose three components: (a) Gaussian kernel has limited representation power, and we replace it with a novel semantic-aware deep kernel; (b) test power of MMD was neglected, and we maximize it in order to optimize our deep kernel; (c) adversarial data may be non-independent, and to this end we apply wild bootstrap for validity of the test power. By taking care of the three factors, we validate that MMD is aware of adversarial attacks, which lights up a novel road for adversarial attack detection based on two-sample tests.

[92]  arXiv:2010.11416 [pdf, other]
Title: Rank-structured QR for Chebyshev rootfinding
Subjects: Numerical Analysis (math.NA)

We consider the computation of roots of polynomials expressed in the Chebyshev basis. We extend the QR iteration presented in [Eidelman, Y., Gemignani, L., and Gohberg, I., Numer. Algorithms, 47.3 (2008): pp. 253-273] introducing an aggressive early deflation strategy, and showing that the rank-structure allows to parallelize the algorithm avoiding data dependencies which would be present in the unstructured QR. The method exploits the particular structure of the colleague linearization to achieve quadratic complexity and linear storage requirements. The (unbalanced) QR iteration used for Chebyshev rootfinding does not guarantee backward stability on the polynomial coefficients, unless the vector of coefficients satisfy $\lVert p\rVert \approx 1$, an hypothesis which is almost never verified for polynomials approximating smooth functions. Even though the presented method is mathematically equivalent to the latter algorithm, we show that exploiting the rank structure allows to guarantee a small backward error on the polynomial, up to an explicitly computable amplification factor $\hat\gamma_1(p)$, which depends on the polynomial under consideration. We show that this parameter is almost always of moderate size, making the method accurate on several numerical tests, in contrast with what happens in the unstructured unbalanced QR. We also discuss the connection between the size of this amplification factor and the existence of a good balancing. This provides some insight on why the accuracy of our method is often very close to balanced QR iteration.

[93]  arXiv:2010.11418 [pdf, other]
Title: Rethinking pooling in graph neural networks
Comments: Accepted to NeurIPS 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Graph pooling is a central component of a myriad of graph neural network (GNN) architectures. As an inheritance from traditional CNNs, most approaches formulate graph pooling as a cluster assignment problem, extending the idea of local patches in regular grids to graphs. Despite the wide adherence to this design choice, no work has rigorously evaluated its influence on the success of GNNs. In this paper, we build upon representative GNNs and introduce variants that challenge the need for locality-preserving representations, either using randomization or clustering on the complement graph. Strikingly, our experiments demonstrate that using these variants does not result in any decrease in performance. To understand this phenomenon, we study the interplay between convolutional layers and the subsequent pooling ones. We show that the convolutions play a leading role in the learned representations. In contrast to the common belief, local pooling is not responsible for the success of GNNs on relevant and widely-used benchmarks.

[94]  arXiv:2010.11419 [pdf, other]
Title: Basket Recommendation with Multi-Intent Translation Graph Neural Network
Comments: Accepted to IEEE Bigdata 2020. Code is available online at this https URL
Journal-ref: 978-1-7281-6251-5/20/\$31.00~\copyright2020 IEEE
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)

The problem of basket recommendation~(BR) is to recommend a ranking list of items to the current basket. Existing methods solve this problem by assuming the items within the same basket are correlated by one semantic relation, thus optimizing the item embeddings. However, this assumption breaks when there exist multiple intents within a basket. For example, assuming a basket contains \{\textit{bread, cereal, yogurt, soap, detergent}\} where \{\textit{bread, cereal, yogurt}\} are correlated through the "breakfast" intent, while \{\textit{soap, detergent}\} are of "cleaning" intent, ignoring multiple relations among the items spoils the ability of the model to learn the embeddings. To resolve this issue, it is required to discover the intents within the basket. However, retrieving a multi-intent pattern is rather challenging, as intents are latent within the basket. Additionally, intents within the basket may also be correlated. Moreover, discovering a multi-intent pattern requires modeling high-order interactions, as the intents across different baskets are also correlated. To this end, we propose a new framework named as \textbf{M}ulti-\textbf{I}ntent \textbf{T}ranslation \textbf{G}raph \textbf{N}eural \textbf{N}etwork~({\textbf{MITGNN}}). MITGNN models $T$ intents as tail entities translated from one corresponding basket embedding via $T$ relation vectors. The relation vectors are learned through multi-head aggregators to handle user and item information. Additionally, MITGNN propagates multiple intents across our defined basket graph to learn the embeddings of users and items by aggregating neighbors. Extensive experiments on two real-world datasets prove the effectiveness of our proposed model on both transductive and inductive BR. The code is available online at https://github.com/JimLiu96/MITGNN.

[95]  arXiv:2010.11420 [pdf, other]
Title: Deterministic Approximation for Submodular Maximization over a Matroid in Nearly Linear Time
Comments: Accepted to appear in the Thirty-Fourth Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Optimization and Control (math.OC)

We study the problem of maximizing a non-monotone, non-negative submodular function subject to a matroid constraint. The prior best-known deterministic approximation ratio for this problem is $\frac{1}{4}-\epsilon$ under $\mathcal{O}(({n^4}/{\epsilon})\log n)$ time complexity. We show that this deterministic ratio can be improved to $\frac{1}{4}$ under $\mathcal{O}(nr)$ time complexity, and then present a more practical algorithm dubbed TwinGreedyFast which achieves $\frac{1}{4}-\epsilon$ deterministic ratio in nearly-linear running time of $\mathcal{O}(\frac{n}{\epsilon}\log\frac{r}{\epsilon})$. Our approach is based on a novel algorithmic framework of simultaneously constructing two candidate solution sets through greedy search, which enables us to get improved performance bounds by fully exploiting the properties of independence systems. As a byproduct of this framework, we also show that TwinGreedyFast achieves $\frac{1}{2p+2}-\epsilon$ deterministic ratio under a $p$-set system constraint with the same time complexity. To showcase the practicality of our approach, we empirically evaluated the performance of TwinGreedyFast on two network applications, and observed that it outperforms the state-of-the-art deterministic and randomized algorithms with efficient implementations for our problem.

[96]  arXiv:2010.11421 [pdf, ps, other]
Title: Pool-based sequential active learning with multi kernels
Subjects: Machine Learning (cs.LG)

We study a pool-based sequential active learning (AL), in which one sample is queried at each time from a large pool of unlabeled data according to a selection criterion. For this framework, we propose two selection criteria, named expected-kernel-discrepancy (EKD) and expected-kernel-loss (EKL), by leveraging the particular structure of multiple kernel learning (MKL). Also, it is identified that the proposed EKD and EKL successfully generalize the concepts of popular query-by-committee (QBC) and expected-model-change (EMC), respectively. Via experimental results with real-data sets, we verify the effectiveness of the proposed criteria compared with the existing methods.

[97]  arXiv:2010.11422 [pdf, other]
Title: Learning Loss for Test-Time Augmentation
Comments: Accepted at NeurIPS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Data augmentation has been actively studied for robust neural networks. Most of the recent data augmentation methods focus on augmenting datasets during the training phase. At the testing phase, simple transformations are still widely used for test-time augmentation. This paper proposes a novel instance-level test-time augmentation that efficiently selects suitable transformations for a test input. Our proposed method involves an auxiliary module to predict the loss of each possible transformation given the input. Then, the transformations having lower predicted losses are applied to the input. The network obtains the results by averaging the prediction results of augmented inputs. Experimental results on several image classification benchmarks show that the proposed instance-aware test-time augmentation improves the model's robustness against various corruptions.

[98]  arXiv:2010.11425 [pdf, other]
Title: Differentially-Private Federated Linear Bandits
Comments: 22 pages. Camera-ready for NeurIPS 2020
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA); Machine Learning (stat.ML)

The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning. We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well. Our algorithms provide competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings.

[99]  arXiv:2010.11426 [pdf, other]
Title: Efficient Scale-Permuted Backbone with Learned Resource Distribution
Comments: ECCV2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Recently, SpineNet has demonstrated promising results on object detection and image classification over ResNet model. However, it is unclear if the improvement adds up when combining scale-permuted backbone with advanced efficient operations and compound scaling. Furthermore, SpineNet is built with a uniform resource distribution over operations. While this strategy seems to be prevalent for scale-decreased models, it may not be an optimal design for scale-permuted models. In this work, we propose a simple technique to combine efficient operations and compound scaling with a previously learned scale-permuted architecture. We demonstrate the efficiency of scale-permuted model can be further improved by learning a resource distribution over the entire network. The resulting efficient scale-permuted models outperform state-of-the-art EfficientNet-based models on object detection and achieve competitive performance on image classification and semantic segmentation. Code and models will be open-sourced soon.

[100]  arXiv:2010.11429 [pdf, ps, other]
Title: An efficient spectral-Galerkin method for fractional reaction-diffusion equations in unbounded domains
Authors: Huifang Yuan
Comments: 23 pages
Subjects: Numerical Analysis (math.NA)

In this work, we apply a fast and accurate numerical method for solving fractional reaction-diffusion equations in unbounded domains. By using the Fourier-like spectral approach in space, this method can effectively handle the fractional Laplace operator, leading to a fully diagonal representation of the fractional Laplacian. To fully discretize the underlying nonlinear reaction-diffusion systems, we propose to use an accurate time marching scheme based on ETDRK4. Numerical examples are presented to illustrate the effectiveness of the proposed method.

[101]  arXiv:2010.11430 [pdf, other]
Title: Self-training and Pre-training are Complementary for Speech Recognition
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data. However, it is not clear whether they learn similar patterns or if they can be effectively combined. In this paper, we show that pseudo-labeling and pre-training with wav2vec 2.0 are complementary in a variety of labeled data setups. Using just 10 minutes of labeled data from Libri-light as well as 53k hours of unlabeled data from LibriVox achieves WERs of 3.0%/5.2% on the clean and other test sets of Librispeech - rivaling the best published systems trained on 960 hours of labeled data only a year ago. Training on all labeled data of Librispeech achieves WERs of 1.5%/3.1%.

[102]  arXiv:2010.11437 [pdf, other]
Title: Task-Adaptive Feature Transformer for Few-Shot Segmentation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Few-shot learning allows machines to classify novel classes using only a few labeled samples. Recently, few-shot segmentation aiming at semantic segmentation on low sample data has also seen great interest. In this paper, we propose a learnable module for few-shot segmentation, the task-adaptive feature transformer (TAFT). TAFT linearly transforms task-specific high-level features to a set of task-agnostic features well-suited to the segmentation job. Using this task-conditioned feature transformation, the model is shown to effectively utilize the semantic information in novel classes to generate tight segmentation masks. The proposed TAFT module can be easily plugged into existing semantic segmentation algorithms to achieve few-shot segmentation capability with only a few added parameters. We combine TAFT with Deeplab V3+, a well-known segmentation architecture; experiments on the PASCAL-$5^i$ dataset confirm that this combination successfully adds few-shot learning capability to the segmentation algorithm, achieving the state-of-the-art few-shot segmentation performance in some key representative cases.

[103]  arXiv:2010.11438 [pdf, other]
Title: GAN based Unsupervised Segmentation: Should We Match the Exact Number of Objects
Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)

The unsupervised segmentation is an increasingly popular topic in biomedical image analysis. The basic idea is to approach the supervised segmentation task as an unsupervised synthesis problem, where the intensity images can be transferred to the annotation domain using cycle-consistent adversarial learning. The previous studies have shown that the macro-level (global distribution level) matching on the number of the objects (e.g., cells, tissues, protrusions etc.) between two domains resulted in better segmentation performance. However, no prior studies have exploited whether the unsupervised segmentation performance would be further improved when matching the exact number of objects at micro-level (mini-batch level). In this paper, we propose a deep learning based unsupervised segmentation method for segmenting highly overlapped and dynamic sub-cellular microvilli. With this challenging task, both micro-level and macro-level matching strategies were evaluated. To match the number of objects at the micro-level, the novel fluorescence-based micro-level matching approach was presented. From the experimental results, the micro-level matching did not improve the segmentation performance, compared with the simpler macro-level matching.

[104]  arXiv:2010.11439 [pdf, other]
Title: Parallel Tacotron: Non-Autoregressive and Controllable TTS
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Although neural end-to-end text-to-speech models can synthesize highly natural speech, there is still room for improvements to its efficiency and naturalness. This paper proposes a non-autoregressive neural text-to-speech model augmented with a variational autoencoder-based residual encoder. This model, called \emph{Parallel Tacotron}, is highly parallelizable during both training and inference, allowing efficient synthesis on modern parallel hardware. The use of the variational autoencoder relaxes the one-to-many mapping nature of the text-to-speech problem and improves naturalness. To further improve the naturalness, we use lightweight convolutions, which can efficiently capture local contexts, and introduce an iterative spectrogram loss inspired by iterative refinement. Experimental results show that Parallel Tacotron matches a strong autoregressive baseline in subjective evaluations with significantly decreased inference time.

[105]  arXiv:2010.11440 [pdf, ps, other]
Title: Vertex deletion into bipartite permutation graphs
Comments: Extended abstract accepted to International Symposium on Parameterized and Exact Computation (IPEC'20)
Subjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM)

A permutation graph can be defined as an intersection graph of segments whose endpoints lie on two parallel lines $l_1$ and $l_2$, one on each. A bipartite permutation graph is a permutation graph which is bipartite. In this paper we study the parameterized complexity of the bipartite permutation vertex deletion problem, which asks, for a given n-vertex graph, whether we can remove at most k vertices to obtain a bipartite permutation graph. This problem is NP-complete by the classical result of Lewis and Yannakakis. We analyze the structure of the so-called almost bipartite permutation graphs which may contain holes (large induced cycles) in contrast to bipartite permutation graphs. We exploit the structural properties of the shortest hole in a such graph. We use it to obtain an algorithm for the bipartite permutation vertex deletion problem with running time $O(9^k\cdot n^9)$, and also give a polynomial-time 9-approximation algorithm.

[106]  arXiv:2010.11441 [pdf, other]
Title: Fusing Keys for Secret Communications: Towards Information-Theoretic Security
Comments: 7 pages, 5 figures
Subjects: Cryptography and Security (cs.CR)

Modern cryptography is essential to communication and information security for performing all kinds of security actions, such as encryption, authentication, and signature. However, the exposure possibility of keys poses a great threat to almost all modern cryptography. This article proposes a key-fusing framework, which enables a high resilience to key exposure by fusing multiple imperfect keys. The correctness of the scheme is strictly verified through a toy model that is general enough to abstract the physical-layer key generation (PLKG) mechanisms. Analysis and results demonstrate that the proposed scheme can dramatically reduce secret outage probability, so that key sources with even high exposure probability can be practically beneficial for actual secret communication. Our framework paves the way for achieving information-theoretic security by integrating various key sources, such as physical layer key generation, lattice-based cryptography, and quantum cryptography.

[107]  arXiv:2010.11443 [pdf, other]
Title: Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms
Comments: To appear at NeurIPS 2020
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)

We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions. The goal is to design algorithms that are both consistent and robust, meaning that the algorithm performs well when predictions are accurate and maintains worst-case guarantees. Such algorithms have been studied in a recent line of works due to Lykouris and Vassilvitskii (ICML '18) and Purohit et al (NeurIPS '18). They provide robustness-consistency trade-offs for a variety of online problems. However, they leave open the question of whether these trade-offs are tight, i.e., to what extent to such trade-offs are necessary. In this paper, we provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions. We focus on the classic problems of ski-rental and non-clairvoyant scheduling and provide optimal trade-offs in various settings.

[108]  arXiv:2010.11445 [pdf, other]
Title: MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Comments: 10 pages
Subjects: Computation and Language (cs.CL)

End-to-end Speech-to-text Translation (E2E- ST), which directly translates source language speech to target language text, is widely useful in practice, but traditional cascaded approaches (ASR+MT) often suffer from error propagation in the pipeline. On the other hand, existing end-to-end solutions heavily depend on the source language transcriptions for pre-training or multi-task training with Automatic Speech Recognition (ASR). We instead propose a simple technique to learn a robust speech encoder in a self-supervised fashion only on the speech side, which can utilize speech data without transcription. This technique, termed Masked Acoustic Modeling (MAM), can also perform pre-training, for the first time, on any acoustic signals (including non-speech ones) without annotation. Compared with current state-of-the-art models on ST, our technique achieves +1.4 BLEU improvement without using transcriptions, and +1.2 BLEU using transcriptions. The pre-training of MAM with arbitrary acoustic signals also boosts the downstream speech-related tasks.

[109]  arXiv:2010.11446 [pdf, ps, other]
Title: Probabilistic Circuits for Variational Inference in Discrete Graphical Models
Comments: In Advances in Neural Information Processing Systems 34 (NeurIPS), 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Inference in discrete graphical models with variational methods is difficult because of the inability to re-parameterize gradients of the Evidence Lower Bound (ELBO). Many sampling-based methods have been proposed for estimating these gradients, but they suffer from high bias or variance. In this paper, we propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN), to compute ELBO gradients exactly (without sampling) for a certain class of densities. In particular, we show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is a polynomial the corresponding ELBO can be computed analytically. To scale to graphical models with thousands of variables, we develop an efficient and effective construction of selective-SPNs with size $O(kn)$, where $n$ is the number of variables and $k$ is an adjustable hyperparameter. We demonstrate our approach on three types of graphical models -- Ising models, Latent Dirichlet Allocation, and factor graphs from the UAI Inference Competition. Selective-SPNs give a better lower bound than mean-field and structured mean-field, and is competitive with approximations that do not provide a lower bound, such as Loopy Belief Propagation and Tree-Reweighted Belief Propagation. Our results show that probabilistic circuits are promising tools for variational inference in discrete graphical models as they combine tractability and expressivity.

[110]  arXiv:2010.11447 [pdf, other]
Title: Krylov Subspace Recycling for Evolving Structures
Subjects: Numerical Analysis (math.NA)

Krylov subspace recycling is a powerful tool for solving long series of large, sparse linear systems that change slowly. In PDE constrained shape optimization, these appear naturally, as hundreds or more optimization steps are needed with only small changes in the geometry. In this setting, however, applying Krylov subspace recycling can be difficult. As the geometry evolves, so does the finite element mesh, especially if re-meshing is needed. As a result, the number of algebraic degrees of freedom in the system may change from one optimization step to the next, and with it the size of the finite element system matrix. Changes in the mesh also lead to structural changes in the matrices. In the case of remeshing, even if the geometry changes only a little, the corresponding mesh might differ substantially from the previous one. This prevents any straightforward mapping of the approximate invariant subspace of the linear system matrix (the focus of recycling in this paper) from one step to the next; similar problems arise for other selected subspaces. We present an algorithm for general meshes to map an approximate invariant subspace of the system matrix for the previous optimization step to an approximate invariant subspace of the system matrix for the current optimization step. We exploit the map from coefficient vectors to finite element functions on the mesh combined with function approximation on the finite element mesh. In addition, we develop a straightforward warm-start adaptation of the Krylov-Schur algorithm [G.W. Stewart, SIAM J. Matrix Anal. Appl. 23, 2001] to improve the approximate invariant subspace at the start of a new optimization step if needed. We demonstrate the effectiveness of our approach numerically with several proof of concept studies for a specific meshing technique.

[111]  arXiv:2010.11448 [pdf, other]
Title: Efficient Computation of High-Order Line Graphs of Hypergraphs
Comments: 11 pages
Subjects: Discrete Mathematics (cs.DM)

This paper considers structures of systems beyond dyadic (pairwise) interactions and investigates mathematical modeling of multi-way interactions and connections as hypergraphs, where captured relationships among system entities are set-valued. To date, in most situations, entities in a hypergraph are considered connected as long as there is at least one common "neighbor". However, minimal commonality sometimes discards the "strength" of connections and interactions among groups. To this end, considering the "width" of a connection, referred to as the \emph{$s$-overlap} of neighbors, provides more meaningful insights into how closely the communities or entities interact with each other. In addition, $s$-overlap computation is the fundamental kernel to construct the line graph of a hypergraph, a low-order approximation which can carry significant information about the original hypergraph. Subsequent stages of a data analytics pipeline then can apply highly-tuned graph algorithms on the line graph to reveal important features. Given a hypergraph, computing the $s$-overlaps by exhaustively considering all pairwise entities can be computationally prohibitive. To tackle this challenge, we develop efficient algorithms to compute $s$-overlaps and the corresponding line graph of a hypergraph. We propose several heuristics to avoid execution of redundant work and improve performance of the $s$-overlap computation. Our parallel algorithm, combined with these heuristics, demonstrates better performance.

[112]  arXiv:2010.11450 [pdf, other]
Title: Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions
Comments: Accepted for spotlight presentation at NeurIPS 2020
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)

A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application. The most commonly used soft-max function, called exponential mechanism, has optimal tradeoff between approximation measured in terms of expected additive approximation and smoothness measured with respect to R\'enyi Divergence. We introduce a soft-max function, called "piecewise linear soft-max", with optimal tradeoff between approximation, measured in terms of worst-case additive approximation and smoothness, measured with respect to $\ell_q$-norm. The worst-case approximation guarantee of the piecewise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications [Martins et al. '16, Laha et al. '18] and is not satisfied by the exponential mechanism. Moreover, the $\ell_q$-smoothness is suitable for applications in Mechanism Design and Game Theory where the piecewise linear mechanism outperforms the exponential mechanism. Finally, we investigate another soft-max function, called power mechanism, with optimal tradeoff between expected \textit{multiplicative} approximation and smoothness with respect to the R\'enyi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization.

[113]  arXiv:2010.11453 [pdf, other]
Title: Machine Learning-Based Early Detection of IoT Botnets Using Network-Edge Traffic
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

In this work, we present a lightweight IoT botnet detection solution, EDIMA, which is designed to be deployed at the edge gateway installed in home networks and targets early detection of botnets prior to the launch of an attack. EDIMA includes a novel two-stage Machine Learning (ML)-based detector developed specifically for IoT bot detection at the edge gateway. The ML-based bot detector first employs ML algorithms for aggregate traffic classification and subsequently Autocorrelation Function (ACF)-based tests to detect individual bots. The EDIMA architecture also comprises a malware traffic database, a policy engine, a feature extractor and a traffic parser. Performance evaluation results show that EDIMA achieves high bot scanning and bot-CnC traffic detection accuracies with very low false positive rates. The detection performance is also shown to be robust to an increase in the number of IoT devices connected to the edge gateway where EDIMA is deployed. Further, the runtime performance analysis of a Python implementation of EDIMA deployed on a Raspberry Pi reveals low bot detection delays and low RAM consumption. EDIMA is also shown to outperform existing detection techniques for bot scanning traffic and bot-CnC server communication.

[114]  arXiv:2010.11454 [pdf, other]
Title: Fast-HotStuff: A Fast and Resilient HotStuff Protocol
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The HotStuff protocol is a recent breakthrough inByzantine Fault Tolerant (BFT) consensus that enjoys responsiveness and linear view change. It uses a clever three-chain commit rule to achieve responsiveness while the vast majority of BFT protocols are using the standard two-chain commit rule. This brings us to a fundamental question: Is a three-chain commit rule really necessary to achieve responsiveness? In this paper, we answer this question by designing a two-chain variant ofHotStuff called Fast-HotStuff that still enjoys responsiveness with simplified view change. Compared to the three-chain HotStuff, Fast-HotStuff has lower latency and is more resilient against forking attacks. Moreover, Fast-HotStuff can be combined with Proof-of-Stake (PoS) while still maintaining safety and liveness. In sharp contrast, HotStuff and its variant LibraBFT (which plans to use PoS) fail to have this property as malicious nodes can takeover the network. In order to achieve all of these advantages, Fast-HotStuff adds a small amount of overhead information during the block proposal phase, which is only needed if the previous primary node fails. The correctness of Fast-HotStuff is established in terms of safety and liveness. The effectiveness of Fast-HotStuff is demonstrated through experimental results.

[115]  arXiv:2010.11456 [pdf, other]
Title: An Investigation of the Recoverable Robust Assignment Problem
Subjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)

We investigate the so-called recoverable robust assignment problem on balanced bipartite graphs with $2n$ vertices, a mainstream problem in robust optimization: For two given linear cost functions $c_1$ and $c_2$ on the edges and a given integer $k$, the goal is to find two perfect matchings $M_1$ and $M_2$ that minimize the objective value $c_1(M_1)+c_2(M_2)$, subject to the constraint that $M_1$ and $M_2$ have at least $k$ edges in common.
We derive a variety of results on this problem. First, we show that the problem is W[1]-hard with respect to the parameter $k$, and also with respect to the recoverability parameter $k'=n-k$. This hardness result holds even in the highly restricted special case where both cost functions $c_1$ and $c_2$ only take the values $0$ and $1$. (On the other hand, containment of the problem in XP is straightforward to see.) Next, as a positive result we construct a polynomial time algorithm for the special case where one cost function is Monge, whereas the other one is Anti-Monge. Finally, we study the variant where matching $M_1$ is frozen, and where the optimization goal is to compute the best corresponding matching $M_2$, the second stage recoverable assignment problem. We show that this problem variant is contained in the randomized parallel complexity class $\text{RNC}_2$, and that it is at least as hard as the infamous problem \probl{Exact Matching in Red-Blue Bipartite Graphs} whose computational complexity is a long-standing open problem

[116]  arXiv:2010.11459 [pdf, other]
Title: A Framework for Contrastive and Generative Learning of Audio Representations
Comments: 6 pages, 2 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In this paper, we present a framework for contrastive learning for audio representations, in a self supervised frame work without access to any ground truth labels. The core idea in self supervised contrastive learning is to map an audio signal and its various augmented versions (representative of salient aspects of audio like pitch, timbre etc.) to a space where they are close together, and are separated from other different signals. In addition we also explore generative models based on state of the art transformer based architectures for learning latent spaces for audio signals, without access to any labels. Here, we map audio signals on a smaller scale to discrete dictionary elements and train transformers to predict the next dictionary element. We only use data as a method of supervision, bypassing the need of labels needed to act as a supervision for training the deep neural networks. We then use a linear classifier head in order to evaluate the performance of our models, for both self supervised contrastive and generative transformer based representations that are learned. Our system achieves considerable performance, compared to a fully supervised method, with access to ground truth labels to train the neural network model. These representations, with avail-ability of large scale audio data show promise in various tasks for audio understanding tasks

[117]  arXiv:2010.11461 [pdf]
Title: Selection of the optimal embedding positions of digital audio watermarking in wavelet domain
Subjects: Cryptography and Security (cs.CR)

This work studied embedding positions of digital audio watermarking in wavelet domain, to make beginners understand the nature of watermarking in a short time. Based on the theory of wavelet transform, this paper analyzed statistical distributions of each level after transformation and the features of watermark embedded in different transform levels. Through comparison and analysis, we found that watermark was suitable for embedding into the coefficients of the first four levels of wavelet transform. In current state-of-art approaches, the embedding algorithms were always to replace the coefficient values of the embedded positions. In contrast this paper proposed an embedding algorithm of selfadaptive interpolation to achieve a better imperceptibility. In order to reduce the computational complexity, we took a pseudo random sequence with a length of 31 bits as the watermark. In the experiments, watermark was embedded in different locations, including different transform levels, high-frequency coefficients and low-frequency coefficients, high-energy regions and low-frequency regions. Results showed that the imperceptibility was better than traditional embedding algorithms. The bit error rates of the extracted watermark were calculated and we analyzed the robustness and fragility of each embedded signal. At last we concluded the best embedding positions of watermark for different applications and our future work.

[118]  arXiv:2010.11462 [pdf, other]
Title: Polynomial Delay Enumeration for Minimal Steiner Problems
Subjects: Data Structures and Algorithms (cs.DS)

Let $G = (V, E)$ be a undirected graph and let $W \subseteq V$ be a set of terminals. A \emph{Steiner subgraph} of $(G, W)$ is a subgraph of $G$ that contains all vertices of $W$ and there is a path between every pair of vertices of $W$ in the subgraph. We say that a Steiner subgraph is minimal if it has no proper Steiner subgraph. It is easy to observe that every minimal Steiner subgraph forms a tree, which is called a minimal Steiner tree. We propose a linear delay and polynomial space algorithm for enumerating all minimal Steiner trees of $(G, W)$, which improves a previously known polynomial delay enumeration algorithm in [Kimelfeld and Sagiv, Inf. Syst., 2008]. Our enumeration algorithm can be extended to other Steiner problems: minimal Steiner forests, minimal terminal Steiner trees, minimal directed Steiner trees. As another variant of the minimal Steiner subgraph problem, we study the problem of enumerating minimal induced Steiner subgraphs. We propose a polynomial delay and exponential space enumeration algorithm of minimal induced Steiner subgraphs for claw-free graphs, whereas the problem on general graphs is shown to be at least as hard as the problem of enumerating minimal transversals in hypergraphs. Contrary to these tractable results, we show that the problem of enumerating minimal group Steiner trees is at least as hard as the minimal transversal enumeration problem on hypergraphs.

[119]  arXiv:2010.11463 [pdf, other]
Title: MixCon: Adjusting the Separability of Data Representations for Harder Data Recovery
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks. Our method is motivated by the theoretical insights of data separability in neural networking training and results on the hardness of model inversion. Empirically, by adjusting the separability of data representation, we show that there exist sweet-spots for data separability such that it is difficult to recover data during inference while maintaining data utility.

[120]  arXiv:2010.11464 [pdf]
Title: Fine-tuned Pre-trained Mask R-CNN Models for Surface Object Detection
Comments: 12 page, 12 tables, 15 figures, to be published in one of professional journals
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This study evaluates road surface object detection tasks using four Mask R-CNN models as a pre-study of surface deterioration detection of stone-made archaeological objects. The models were pre-trained and fine-tuned by COCO datasets and 15,188 segmented road surface annotation tags. The quality of the models were measured using Average Precisions and Average Recalls. Result indicates substantial number of counts of false negatives, i.e. left detection and unclassified detections. A modified confusion matrix model to avoid prioritizing IoU is tested and there are notable true positive increases in bounding box detection, but almost no changes in segmentation masks.

[121]  arXiv:2010.11465 [pdf, other]
Title: Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs
Comments: NeurIPS 2020
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG)

One of the fundamental problems in Artificial Intelligence is to perform complex multi-hop logical reasoning over the facts captured by a knowledge graph (KG). This problem is challenging, because KGs can be massive and incomplete. Recent approaches embed KG entities in a low dimensional space and then use these embeddings to find the answer entities. However, it has been an outstanding challenge of how to handle arbitrary first-order logic (FOL) queries as present methods are limited to only a subset of FOL operators. In particular, the negation operator is not supported. An additional limitation of present methods is also that they cannot naturally model uncertainty. Here, we present BetaE, a probabilistic embedding framework for answering arbitrary FOL queries over KGs. BetaE is the first method that can handle a complete set of first-order logical operations: conjunction ($\wedge$), disjunction ($\vee$), and negation ($\neg$). A key insight of BetaE is to use probabilistic distributions with bounded support, specifically the Beta distribution, and embed queries/entities as distributions, which as a consequence allows us to also faithfully model uncertainty. Logical operations are performed in the embedding space by neural operators over the probabilistic embeddings. We demonstrate the performance of BetaE on answering arbitrary FOL queries on three large, incomplete KGs. While being more general, BetaE also increases relative performance by up to 25.4% over the current state-of-the-art KG reasoning methods that can only handle conjunctive queries without negation.

[122]  arXiv:2010.11468 [pdf, other]
Title: Novel View Synthesis from only a 6-DoF Camera Pose by Two-stage Networks
Comments: Accepted by International Conference on Pattern Recognition (ICPR 2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Novel view synthesis is a challenging problem in computer vision and robotics. Different from the existing works, which need the reference images or 3D models of the scene to generate images under novel views, we propose a novel paradigm to this problem. That is, we synthesize the novel view from only a 6-DoF camera pose directly. Although this setting is the most straightforward way, there are few works addressing it. While, our experiments demonstrate that, with a concise CNN, we could get a meaningful parametric model that could reconstruct the correct scenery images only from the 6-DoF pose. To this end, we propose a two-stage learning strategy, which consists of two consecutive CNNs: GenNet and RefineNet. GenNet generates a coarse image from a camera pose. RefineNet is a generative adversarial network that refines the coarse image. In this way, we decouple the geometric relationship between mapping and texture detail rendering. Extensive experiments conducted on the public datasets prove the effectiveness of our method. We believe this paradigm is of high research and application value and could be an important direction in novel view synthesis.

[123]  arXiv:2010.11472 [pdf]
Title: An explainable deep vision system for animal classification and detection in trail-camera images with automatic post-deployment retraining
Authors: Golnaz Moallem (1), Don Pathirage (1), Joel Reznick (1), James Gallagher (2), Hamed Sari-Sarraf (1) ((1) Applied Vision Lab Texas Tech University (2) Texas Parks and Wildlife Department)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper introduces an automated vision system for animal detection in trail-camera images taken from a field under the administration of the Texas Parks and Wildlife Department. As traditional wildlife counting techniques are intrusive and labor intensive to conduct, trail-camera imaging is a comparatively non-intrusive method for capturing wildlife activity. However, given the large volume of images produced from trail-cameras, manual analysis of the images remains time-consuming and inefficient. We implemented a two-stage deep convolutional neural network pipeline to find animal-containing images in the first stage and then process these images to detect birds in the second stage. The animal classification system classifies animal images with more than 87% sensitivity and 96% specificity. The bird detection system achieves better than 93% sensitivity, 92% specificity, and 68% average Intersection-over-Union rate. The entire pipeline processes an image in less than 0.5 seconds as opposed to an average 30 seconds for a human labeler. We also addressed post-deployment issues related to data drift for the animal classification system as image features vary with seasonal changes. This system utilizes an automatic retraining algorithm to detect data drift and update the system. We introduce a novel technique for detecting drifted images and triggering the retraining procedure. Two statistical experiments are also presented to explain the prediction behavior of the animal classification system. These experiments investigate the cues that steers the system towards a particular decision. Statistical hypothesis testing demonstrates that the presence of an animal in the input image significantly contributes to the system's decisions.

[124]  arXiv:2010.11473 [pdf, other]
Title: A Novel Variable Stiffness Soft Robotic Gripper
Comments: This paper has been submitted to IEEE International Conference on Robotics and Automation 2021
Subjects: Robotics (cs.RO)

We propose a novel tri-fingered soft robotic gripper with decoupled stiffness and shape control capability for performing adaptive grasping with minimum system complexity. The proposed soft fingers adaptively conform to object shapes facilitating the handling of objects of different types, shapes, and sizes. Each soft gripper finger has an inextensible articulable backbone and is actuated by pneumatic muscles. We derive a kinematic model of the gripper and use an empirical approach to map input pressures to stiffness and bending deformation of fingers. We use these mappings to achieve decoupled stiffness and shape control. We conduct tests to quantify the ability to hold objects as the gripper changes orientation, the ability to maintain the grasping status as the gripper moves, and the amount of force required to release the object from the gripped fingers, respectively. The results validate the proposed gripper's performance and show how stiffness control can improve the grasping quality.

[125]  arXiv:2010.11475 [pdf, other]
Title: High resolution weakly supervised localization architectures for medical images
Comments: submitted to ICASSP 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In medical imaging, Class-Activation Map (CAM) serves as the main explainability tool by pointing to the region of interest. Since the localization accuracy from CAM is constrained by the resolution of the model's feature map, one may expect that segmentation models, which generally have large feature maps, would produce more accurate CAMs. However, we have found that this is not the case due to task mismatch. While segmentation models are developed for datasets with pixel-level annotation, only image-level annotation is available in most medical imaging datasets. Our experiments suggest that Global Average Pooling (GAP) and Group Normalization are the main culprits that worsen the localization accuracy of CAM. To address this issue, we propose Pyramid Localization Network (PYLON), a model for high-accuracy weakly-supervised localization that achieved 0.62 average point localization accuracy on NIH's Chest X-Ray 14 dataset, compared to 0.45 for a traditional CAM model. Source code and extended results are available at https://github.com/cmb-chula/pylon.

[126]  arXiv:2010.11476 [pdf, other]
Title: Modeling and Validation of Soft Robotic Snake Locomotion
Comments: This paper has been submitted to IEEE International Conference on Robotics and Automation 2021
Subjects: Robotics (cs.RO)

Snakes are a remarkable evolutionary success story. Many snake-inspired robots have been proposed over the years. Soft robotic snakes (SRS) with their continuous and smooth bending capability better mimic their biological counterparts' unique characteristics. Prior SRSs are limited to planar operation with a limited number of planar gaits. We propose a novel SRS with spatial bending and investigate snake locomotion gaits beyond the capabilities of the state-of-the-art systems. We derive a complete floating-base kinematic model of the robot and use the model to derive jointspace trajectories for serpentine and inward/outward rolling locomotion gaits. The locomotion gaits for the proposed SRS are experimentally validated under varying frequency and amplitude of gait cycles. The results qualitatively and quantitatively validate the SRS ability to leverage spatial bending to achieve locomotion gaits not possible with current SRS.

[127]  arXiv:2010.11478 [pdf, other]
Title: Knowledge Distillation for BERT Unsupervised Domain Adaptation
Authors: Minho Ryu, Kichun Lee
Subjects: Computation and Language (cs.CL)

A pre-trained language model, BERT, has brought significant performance improvements across a range of natural language processing tasks. Since the model is trained on a large corpus of diverse topics, it shows robust performance for domain shift problems in which data distributions at training (source data) and testing (target data) differ while sharing similarities. Despite its great improvements compared to previous models, it still suffers from performance degradation due to domain shifts. To mitigate such problems, we propose a simple but effective unsupervised domain adaptation method, \emph{adversarial adaptation with distillation} (AAD), which combines the adversarial discriminative domain adaptation (ADDA) framework with knowledge distillation. We evaluate our approach in the task of cross-domain sentiment classification on 30 domain pairs, advancing the state-of-the-art performance for unsupervised domain adaptation in text sentiment classification.

[128]  arXiv:2010.11486 [pdf, ps, other]
Title: Computing Diverse Sets of Solutions for Monotone Submodular Optimisation Problems
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Submodular functions allow to model many real-world optimisation problems. This paper introduces approaches for computing diverse sets of high quality solutions for submodular optimisation problems. We first present diversifying greedy sampling approaches and analyse them with respect to the diversity measured by entropy and the approximation quality of the obtained solutions. Afterwards, we introduce an evolutionary diversity optimisation approach to further improve diversity of the set of solutions. We carry out experimental investigations on popular submodular benchmark functions that show that the combined approaches achieve high quality solutions of large diversity.

[129]  arXiv:2010.11487 [pdf, other]
Title: Faithful Euclidean Distance Field from Log-Gaussian Process Implicit Surfaces
Subjects: Robotics (cs.RO)

In this letter, we introduce the Log-Gaussian Process Implicit Surface (Log-GPIS), a novel continuous and probabilistic mapping representation suitable for surface reconstruction and local navigation. To derive the proposed representation, Varadhan's formula is exploited to approximate the non-linear Eikonal partial differential equation (PDE) of the Euclidean distance field~(EDF) by the logarithm of the screen Poisson equation that is linear. We show that members of the Matern covariance family directly satisfy this linear PDE. Thus, our key contribution is the realisation that the regularised Eikonal equation can be simply solved by applying the logarithmic function to a GPIS formulation to recover the accurate EDF and, at the same time, the implicit surface. The proposed approach does not require post-processing steps to recover the EDF. Moreover, unlike sampling-based methods, Log-GPIS does not use sample points inside and outside the surface as the derivative of the covariance allow direct estimation of the surface normals and distance gradients. We benchmarked the proposed method on simulated and real data against state-of-the-art mapping frameworks that also aim at recovering both the surface and a distance field. Our experiments show that Log-GPIS produces the most accurate results for the EDF and comparable results for surface reconstruction and its computation time still allows online operations.

[130]  arXiv:2010.11488 [pdf, other]
Title: SEG-MAT: 3D Shape Segmentation Using Medial Axis Transform
Comments: IEEE Transactions on Visualization and Computer Graphics (TVCG), to appear
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)

Segmenting arbitrary 3D objects into constituent parts that are structurally meaningful is a fundamental problem encountered in a wide range of computer graphics applications. Existing methods for 3D shape segmentation suffer from complex geometry processing and heavy computation caused by using low-level features and fragmented segmentation results due to the lack of global consideration. We present an efficient method, called SEG-MAT, based on the medial axis transform (MAT) of the input shape. Specifically, with the rich geometrical and structural information encoded in the MAT, we are able to develop a simple and principled approach to effectively identify the various types of junctions between different parts of a 3D shape. Extensive evaluations and comparisons show that our method outperforms the state-of-the-art methods in terms of segmentation quality and is also one order of magnitude faster.

[131]  arXiv:2010.11490 [pdf, other]
Title: On the Effects of Using word2vec Representations in Neural Networks for Dialogue Act Recognition
Journal-ref: Computer Speech and Language, Elsevier, 2018, 47, pp.175 - 193
Subjects: Computation and Language (cs.CL)

Dialogue act recognition is an important component of a large number of natural language processing pipelines. Many research works have been carried out in this area, but relatively few investigate deep neural networks and word embeddings. This is surprising, given that both of these techniques have proven exceptionally good in most other language-related domains. We propose in this work a new deep neural network that explores recurrent models to capture word sequences within sentences, and further study the impact of pretrained word embeddings. We validate this model on three languages: English, French and Czech. The performance of the proposed approach is consistent across these languages and it is comparable to the state-of-the-art results in English. More importantly, we confirm that deep neural networks indeed outperform a Maximum Entropy classifier, which was expected. However , and this is more surprising, we also found that standard word2vec em-beddings do not seem to bring valuable information for this task and the proposed model, whatever the size of the training corpus is. We thus further analyse the resulting embeddings and conclude that a possible explanation may be related to the mismatch between the type of lexical-semantic information captured by the word2vec embeddings, and the kind of relations between words that is the most useful for the dialogue act recognition task.

[132]  arXiv:2010.11491 [pdf, other]
Title: Overview of Networked Supervisory Control with Imperfect Communication Channels
Subjects: Systems and Control (eess.SY)

This paper presents an overview of the networked supervisory control framework for discrete event systems with imperfect communication networks, which can be divided into the centralized supervisory control setup and the decentralized supervisory control setup. We review the state-of-art networked control frameworks with observation channel delays and control channel delays, for untimed and timed models. Data losses in communication channels are also considered. The review of the state-of-art networked control frameworks will be focused on the following parts: 1) the construction of the networked control closed-loop system 2) the condition to ensure the existence of a networked supervisor 3) the synthesis procedure for networked-delay resilient supervisor 4) the possibility of improving the synthesis efficiency.

[133]  arXiv:2010.11497 [pdf, other]
Title: Cluster-and-Conquer: When Randomness Meets Graph Locality
Authors: George Giakkoupis (WIDE), Anne-Marie Kermarrec (EPFL), Olivier Ruas (SPIRALS), François Taïani (WIDE, IRISA)
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning applications. Some of the most efficient KNN graph algorithms are incremental and local: they start from a random graph, which they incrementally improve by traversing neighbors-of-neighbors links. Paradoxically, this random start is also one of the key weaknesses of these algorithms: nodes are initially connected to dissimilar neighbors, that lie far away according to the similarity metric. As a result, incremental algorithms must first laboriously explore spurious potential neighbors before they can identify similar nodes, and start converging. In this paper, we remove this drawback with Cluster-and-Conquer (C 2 for short). Cluster-and-Conquer boosts the starting configuration of greedy algorithms thanks to a novel lightweight clustering mechanism, dubbed FastRandomHash. FastRandomHash leverages random-ness and recursion to pre-cluster similar nodes at a very low cost. Our extensive evaluation on real datasets shows that Cluster-and-Conquer significantly outperforms existing approaches, including LSH, yielding speed-ups of up to x4.42 while incurring only a negligible loss in terms of KNN quality.

[134]  arXiv:2010.11503 [pdf, ps, other]
Title: On Finite and Unrestricted Query Entailment beyond SQ with Number Restrictions on Transitive Roles
Comments: Full version of a paper accepted at IJCAI'19
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

We study the description logic SQ with number restrictions applicable to transitive roles, extended with either nominals or inverse roles. We show tight 2EXPTIME upper bounds for unrestricted entailment of regular path queries for both extensions and finite entailment of positive existential queries for nominals. For inverses, we establish 2EXPTIME-completeness for unrestricted and finite entailment of instance queries (the latter under restriction to a single, transitive role).

[135]  arXiv:2010.11504 [pdf, other]
Title: 3D Meta-Registration: Learning to Learn Registration of 3D Point Clouds
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning-based point cloud registration models are often generalized from extensive training over a large volume of data to learn the ability to predict the desired geometric transformation to register 3D point clouds. In this paper, we propose a meta-learning based 3D registration model, named 3D Meta-Registration, that is capable of rapidly adapting and well generalizing to new 3D registration tasks for unseen 3D point clouds. Our 3D Meta-Registration gains a competitive advantage by training over a variety of 3D registration tasks, which leads to an optimized model for the best performance on the distribution of registration tasks including potentially unseen tasks. Specifically, the proposed 3D Meta-Registration model consists of two modules: 3D registration learner and 3D registration meta-learner. During the training, the 3D registration learner is trained to complete a specific registration task aiming to determine the desired geometric transformation that aligns the source point cloud with the target one. In the meantime, the 3D registration meta-learner is trained to provide the optimal parameters to update the 3D registration learner based on the learned task distribution. After training, the 3D registration meta-learner, which is learned with the optimized coverage of distribution of 3D registration tasks, is able to dynamically update 3D registration learners with desired parameters to rapidly adapt to new registration tasks. We tested our model on synthesized dataset ModelNet and FlyingThings3D, as well as real-world dataset KITTI. Experimental results demonstrate that 3D Meta-Registration achieves superior performance over other previous techniques (e.g. FlowNet3D).

[136]  arXiv:2010.11505 [pdf, other]
Title: NightOwl: Robotic Platform for Wheeled Service Robot
Comments: 13 pages, 14 figures
Subjects: Robotics (cs.RO)

NightOwl is a robotic platform designed exclusively for a wheeled service robot. The robot navigates autonomously in omnidirectional fashion movement and equipped with LIDAR to sense the surrounding area. The platform itself was built using the Robot Operating System (ROS) and written in two different programming languages (C++ and Python). NightOwl is composed of several modular programs, namely hardware controller, light detection and ranging (LIDAR), simultaneous localization and mapping (SLAM), world model, path planning, robot control, communication, and behaviour. The programs run in parallel and communicate reciprocally to share various information. This paper explains the role of modular programs in the term of input, process, and output. In addition, NightOwl provides simulation visualized in both Gazebo and RViz. The robot in its environment is visualized by Gazebo. Sensor data from LIDAR and results from SLAM will be visualized by RViz.

[137]  arXiv:2010.11506 [pdf, other]
Title: Calibrated Language Model Fine-Tuning for In- and Out-of-Distribution Data
Comments: EMNLP2020 long paper
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Fine-tuned pre-trained language models can suffer from severe miscalibration for both in-distribution and out-of-distribution (OOD) data due to over-parameterization. To mitigate this issue, we propose a regularized fine-tuning method. Our method introduces two types of regularization for better calibration: (1) On-manifold regularization, which generates pseudo on-manifold samples through interpolation within the data manifold. Augmented training with these pseudo samples imposes a smoothness regularization to improve in-distribution calibration. (2) Off-manifold regularization, which encourages the model to output uniform distributions for pseudo off-manifold samples to address the over-confidence issue for OOD data. Our experiments demonstrate that the proposed method outperforms existing calibration methods for text classification in terms of expectation calibration error, misclassification detection, and OOD detection on six datasets. Our code can be found at https://github.com/Lingkai-Kong/Calibrated-BERT-Fine-Tuning.

[138]  arXiv:2010.11510 [pdf, other]
Title: F-Siamese Tracker: A Frustum-based Double Siamese Network for 3D Single Object Tracking
Comments: 7pages, 5 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper presents F-Siamese Tracker, a novel approach for single object tracking prominently characterized by more robustly integrating 2D and 3D information to reduce redundant search space. A main challenge in 3D single object tracking is how to reduce search space for generating appropriate 3D candidates. Instead of solely relying on 3D proposals, firstly, our method leverages the Siamese network applied on RGB images to produce 2D region proposals which are then extruded into 3D viewing frustums. Besides, we perform an online accuracy validation on the 3D frustum to generate refined point cloud searching space, which can be embedded directly into the existing 3D tracking backbone. For efficiency, our approach gains better performance with fewer candidates by reducing search space. In addition, benefited from introducing the online accuracy validation, for occasional cases with strong occlusions or very sparse points, our approach can still achieve high precision, even when the 2D Siamese tracker loses the target. This approach allows us to set a new state-of-the-art in 3D single object tracking by a significant margin on a sparse outdoor dataset (KITTI tracking). Moreover, experiments on 2D single object tracking show that our framework boosts 2D tracking performance as well.

[139]  arXiv:2010.11512 [pdf, other]
Title: Mood Classification Using Listening Data
Comments: Appears in Proc. of the International Society for Music Information Retrieval Conference 2020 (ISMIR 2020)
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)

The mood of a song is a highly relevant feature for exploration and recommendation in large collections of music. These collections tend to require automatic methods for predicting such moods. In this work, we show that listening-based features outperform content-based ones when classifying moods: embeddings obtained through matrix factorization of listening data appear to be more informative of a track mood than embeddings based on its audio content. To demonstrate this, we compile a subset of the Million Song Dataset, totalling 67k tracks, with expert annotations of 188 different moods collected from AllMusic. Our results on this novel dataset not only expose the limitations of current audio-based models, but also aim to foster further reproducible research on this timely topic.

[140]  arXiv:2010.11522 [pdf, other]
Title: An Industry Evaluation of Embedding-based Entity Alignment
Journal-ref: Coling'2020
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Embedding-based entity alignment has been widely investigated in recent years, but most proposed methods still rely on an ideal supervised learning setting with a large number of unbiased seed mappings for training and validation, which significantly limits their usage. In this study, we evaluate those state-of-the-art methods in an industrial context, where the impact of seed mappings with different sizes and different biases is explored. Besides the popular benchmarks from DBpedia and Wikidata, we contribute and evaluate a new industrial benchmark that is extracted from two heterogeneous knowledge graphs (KGs) under deployment for medical applications. The experimental results enable the analysis of the advantages and disadvantages of these alignment methods and the further discussion of suitable strategies for their industrial deployment.

[141]  arXiv:2010.11523 [pdf, other]
Title: Exploring search space trees using an adapted version of Monte Carlo tree search for a combinatorial optimization problem
Subjects: Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Optimization and Control (math.OC)

In this article, a novel approach to solve combinatorial optimization problems is proposed. This approach makes use of a heuristic algorithm to explore the search space tree of a problem instance. The algorithm is based on Monte Carlo tree search, a popular algorithm in game playing that is used to explore game trees. By leveraging the combinatorial structure of a problem, several enhancements to the algorithm are proposed. These enhancements aim to efficiently explore the search space tree by pruning subtrees, using a heuristic simulation policy, reducing the domain of variables by eliminating dominated solutions and using a beam width. They are demonstrated for a specific combinatorial optimization problem: the quay crane scheduling problem with non-crossing constraints. Computational results show that the proposed algorithm is competitive with the state-of-the-art for this problem and eight new best solutions for a benchmark set of instances are found. Apart from this, the results also show evidence that the algorithm is able to learn to correct the incorrect choices of a standard heuristic, yielding an average improvement of 10.0 % with respect to the objective function value of the solution.

[142]  arXiv:2010.11524 [pdf, other]
Title: slimIPL: Language-Model-Free Iterative Pseudo-Labeling
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Recent results in end-to-end ASR have demonstrated the efficacy of simple pseudo-labeling for semi-supervised models trained both with Connectionist Temporal Classification (CTC) and Sequence-to-Sequence (seq2seq) losses. Iterative Pseudo-Labeling (IPL), which continuously trains a single model using pseudo-labels iteratively re-generated as the model learns, has been shown to further increase performance in ASR. We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens) assignments, that is without a language model. We call this approach Language-Model-Free IPL (slimIPL) and we give a resultant training setup for CTC and seq2seq models. At inference, our experiments show that decoding with a strong language model is more beneficial with slimIPL than IPL, asIPL exhibits some language model over-fitting issues. Compared to prior work on semi-supervised and unsupervised approaches, slimIPL not only simplifies the training process, but also achieves competitive and state-of-the-art results on LibriSpeech test sets in both standard and low-resource settings.

[143]  arXiv:2010.11525 [pdf, other]
Title: Quiver Signal Processing (QSP)
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

In this paper we state the basics for a signal processing framework on quiver representations. A quiver is a directed graph and a quiver representation is an assignment of vector spaces to the nodes of the graph and of linear maps between the vector spaces associated to the nodes. Leveraging the tools from representation theory, we propose a signal processing framework that allows us to handle heterogeneous multidimensional information in networks. We provide a set of examples where this framework provides a natural set of tools to understand apparently hidden structure in information. We remark that the proposed framework states the basis for building graph neural networks where information can be processed and handled in alternative ways.

[144]  arXiv:2010.11526 [pdf, other]
Title: Fault diagnosis for linear heterodirectional hyperbolic ODE-PDE systems using backstepping-based trajectory planning
Comments: 14 pages, 6 figures, submitted to Automatica
Subjects: Systems and Control (eess.SY); Signal Processing (eess.SP)

This paper is concerned with the fault diagnosis problem for general linear heterodirectional hyperbolic ODE-PDE systems. A systematic solution is presented for additive time-varying actuator, process and sensor faults in the presence of disturbances. The faults and disturbances are represented by the solutions of finite-dimensional signal models, which allow to take a large class of signals into account. For disturbances, that are only bounded, a threshold for secured fault diagnosis is derived. By applying integral transformations to the system an algebraic fault detection equation to detect faults in finite time is obtained. The corresponding integral kernels result from the realization of a finite-time transition between a non-equilibrium initial state and a vanishing final state of a hyperbolic ODE-PDE system. For this new challenging problem, a systematic trajectory planning approach is presented. In particular, this problem is facilitated by mapping the kernel equations into backstepping coordinates and tracing the solution of the transition problem back to a simple trajectory planning. The fault diagnosis for a $4\times 4$ heterodirectional hyperbolic system coupled with a second order ODE demonstrates the results of the paper.

[145]  arXiv:2010.11531 [pdf, other]
Title: Convolutional Autoencoders for Human Motion Infilling
Comments: Accepted to 3DV 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper we propose a convolutional autoencoder to address the problem of motion infilling for 3D human motion data. Given a start and end sequence, motion infilling aims to complete the missing gap in between, such that the filled in poses plausibly forecast the start sequence and naturally transition into the end sequence. To this end, we propose a single, end-to-end trainable convolutional autoencoder. We show that a single model can be used to create natural transitions between different types of activities. Furthermore, our method is not only able to fill in entire missing frames, but it can also be used to complete gaps where partial poses are available (e.g. from end effectors), or to clean up other forms of noise (e.g. Gaussian). Also, the model can fill in an arbitrary number of gaps that potentially vary in length. In addition, no further post-processing on the model's outputs is necessary such as smoothing or closing discontinuities at the end of the gap. At the heart of our approach lies the idea to cast motion infilling as an inpainting problem and to train a convolutional de-noising autoencoder on image-like representations of motion sequences. At training time, blocks of columns are removed from such images and we ask the model to fill in the gaps. We demonstrate the versatility of the approach via a number of complex motion sequences and report on thorough evaluations performed to better understand the capabilities and limitations of the proposed approach.

[146]  arXiv:2010.11533 [pdf, other]
Title: Exponential Negation of a Probability Distribution
Authors: Qinyuan Wu, Yong Deng
Comments: 6 pages, 6 figures
Subjects: Artificial Intelligence (cs.AI); Information Theory (cs.IT)

Negation operation is important in intelligent information processing. Different with existing arithmetic negation, an exponential negation is presented in this paper. The new negation can be seen as a kind of geometry negation. Some basic properties of the proposed negation is investigated, we find that the fix point is the uniform probability distribution.The negation is an entropy increase operation and all the probability distributions will converge to the uniform distribution after multiple negation iterations. The number of iterations of convergence is inversely proportional to the number of elements in the distribution. Some numerical examples are used to illustrate the efficiency of the proposed negation.

[147]  arXiv:2010.11535 [pdf]
Title: Defense-guided Transferable Adversarial Attacks
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

Though deep neural networks perform challenging tasks excellently, they are susceptible to adversarial exmaples, which mislead classifiers by applying human-imperceptible perturbations on clean inputs. Under the query-free black-box scenario, adversarial examples are hard to transfer to unknown models, and several methods have been proposed with low transferability. To settle such issue, we design a max-min framework inspired by input transformations, which are benificial to both the adversarial attack and defense. Explicitly, we decrease loss values with affline transformations as a defense in the minimum procedure, and then increase loss values with the momentum iterative algorithm as an attack in the maximum procedure. To further promote transferability, we determine transformed values with the max-min theory. Extensive experiments on Imagenet demonstrate that our defense-guided transferable attacks achieve impressive increase on transferability. Experimentally, our best black-box attack fools normally trained models at an 85.3% attack success rate and adversarially trained models at a 40.43% attack success rate on average, respectively. Additionally, we provide elucidative insights on the improvement of transferability, and our method is expected to be a benchmark for assessing the robustness of deep models.

[148]  arXiv:2010.11536 [pdf, other]
Title: Joint Use of Node Attributes and Proximity for Semi-Supervised Classification on Graphs
Comments: 9 pages, 7 figures
Subjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)

The node classification problem is to infer unknown node labels in a graph given its structure and node attributes along with labels for some of the nodes. Approaches for this task typically assume that adjacent nodes have similar attributes and thus, that a node's label can be predicted from the labels of its neighbors. While such homophily is often observed (e.g., for political affiliation in social networks), the assumption may not hold for arbitrary graph datasets and classification tasks. In fact, nodes that share the same label may be adjacent but differ in their attributes; or may not be adjacent but have similar attributes. We aim to develop a node classification approach that can flexibly adapt to a range of settings wherein labels are correlated with graph structure, or node attributes, or both. To this end, we propose JANE (Jointly using Attributes and Node Embeddings): a novel and principled approach based on a generative probabilistic model that weighs the role of node proximity and attribute similarity in predicting labels. Our experiments on a variety of graph datasets and comparison with standard baselines demonstrate that JANE exhibits a superior combination of versatility and competitive performance.

[149]  arXiv:2010.11538 [pdf, other]
Title: Efficient RDF Graph Storage based on Reinforcement Learning
Subjects: Databases (cs.DB)

Knowledge graph is an important cornerstone of artificial intelligence. The construction and release of large-scale knowledge graphs in various fields pose new challenges to knowledge graph data management. Due to the maturity and stability, relational database is also suitable for RDF data storage. However, the complex structure of RDF graph brings challenges to storage structure design for RDF graph in the relational database. To address the difficult problem, this paper adopts reinforcement learning (RL) to optimize the storage partition method of RDF graph based on the relational database. We transform the graph storage into a Markov decision process, and develop the reinforcement learning algorithm for graph storage design. For effective RL-based storage design, we propose the data feature extraction method of RDF tables and the query rewriting priority policy during model training. The extensive experimental results demonstrate that our approach outperforms existing RDF storage design methods.

[150]  arXiv:2010.11539 [pdf, other]
Title: Cross Copy Network for Dialogue Generation
Comments: 11 pages, 4 figures
Subjects: Computation and Language (cs.CL)

In the past few years, audiences from different fields witness the achievements of sequence-to-sequence models (e.g., LSTM+attention, Pointer Generator Networks, and Transformer) to enhance dialogue content generation. While content fluency and accuracy often serve as the major indicators for model training, dialogue logics, carrying critical information for some particular domains, are often ignored. Take customer service and court debate dialogue as examples, compatible logics can be observed across different dialogue instances, and this information can provide vital evidence for utterance generation. In this paper, we propose a novel network architecture - Cross Copy Networks(CCN) to explore the current dialog context and similar dialogue instances' logical structure simultaneously. Experiments with two tasks, court debate and customer service content generation, proved that the proposed algorithm is superior to existing state-of-art content generation models.

[151]  arXiv:2010.11541 [pdf]
Title: Understanding the drivers of sustainable land expansion using a patch-level simulation model: A case study in Wuhan, China
Subjects: Computers and Society (cs.CY)

Cellular Automata (CA) are widely used to model the dynamics within complex land use and land cover (LULC) systems. Past CA model research has focused on improving the technical modeling procedures, and only a few studies have sought to improve our understanding of the nonlinear relationships that underlie LULC change. The change complexity lies in the detailed scale of high granularity data, and in the geometric units used to simulate the change. Many CA models lack the ability to simulate the detailed patch-level evolution of multiple land use types. This study introduces a patch-level land use simulation model that integrates a land expansion analysis strategy and a CA model based on multi-type random patch seeds. These were used to understand the drivers of land expansion and to investigate the landscape dynamics in Wuhan, China. The proposed model achieved a higher simulation accuracy and more similar landscape pattern metrics to the true landscape than other CA models tested. The land expansion analysis strategy also uncovered some underlying transition rules, such as that grassland is most likely to be found where it is not strongly impacted by human activities, and that deciduous forest areas tend to grow adjacent to arterial roads. We also projected the structure of land use under different optimizing scenarios for 2035 by combining the proposed model with multi-objective programming. The results indicate that the proposed model can help policymakers to manage future land use dynamics and so to realize more sustainable land use patterns for future development.

[152]  arXiv:2010.11544 [pdf, ps, other]
Title: Stability of Algebraic Neural Networks to Small Perturbations
Subjects: Machine Learning (cs.LG)

Algebraic neural networks (AlgNNs) are composed of a cascade of layers each one associated to and algebraic signal model, and information is mapped between layers by means of a nonlinearity function. AlgNNs provide a generalization of neural network architectures where formal convolution operators are used, like for instance traditional neural networks (CNNs) and graph neural networks (GNNs). In this paper we study stability of AlgNNs on the framework of algebraic signal processing. We show how any architecture that uses a formal notion of convolution can be stable beyond particular choices of the shift operator, and this stability depends on the structure of subsets of the algebra involved in the model. We focus our attention on the case of algebras with a single generator.

[153]  arXiv:2010.11545 [pdf, other]
Title: Online Structured Meta-learning
Comments: Accepted by NeurIPS 2020
Subjects: Machine Learning (cs.LG)

Learning quickly is of great importance for machine intelligence deployed in online platforms. With the capability of transferring knowledge from learned tasks, meta-learning has shown its effectiveness in online scenarios by continuously updating the model with the learned prior. However, current online meta-learning algorithms are limited to learn a globally-shared meta-learner, which may lead to sub-optimal results when the tasks contain heterogeneous information that are distinct by nature and difficult to share. We overcome this limitation by proposing an online structured meta-learning (OSML) framework. Inspired by the knowledge organization of human and hierarchical feature representation, OSML explicitly disentangles the meta-learner as a meta-hierarchical graph with different knowledge blocks. When a new task is encountered, it constructs a meta-knowledge pathway by either utilizing the most relevant knowledge blocks or exploring new blocks. Through the meta-knowledge pathway, the model is able to quickly adapt to the new task. In addition, new knowledge is further incorporated into the selected blocks. Experiments on three datasets demonstrate the effectiveness and interpretability of our proposed framework in the context of both homogeneous and heterogeneous tasks.

[154]  arXiv:2010.11546 [pdf, other]
Title: Systematic edge uncertainty in attributed social networks and its effects on rankings of minority nodes
Subjects: Social and Information Networks (cs.SI)

Network analysis provides powerful tools to learn about a variety of social systems. However, most analyses implicitly assume that the considered data is error-free and reliable. Especially if the network consists of multiple groups, this assumption conflicts with the range of systematic reporting biases, measurement errors and other inaccuracies that are well documented in our community. In this paper, we model how such systematic uncertainty on edges of an attributed network can impact network analysis, in particular the ranking of nodes. We discuss how erroneous edge observations can be driven by external node attributes and the relative edge positions in the network, thereby opening a path towards a systematic study of the effects of edge-uncertainty for various network analysis tasks. To show how conclusions drawn from network analyses can get distorted due to such inaccuracies, we focus on the effects of edge-uncertainty on minority group representations in degree-based rankings. For that purpose, we analyze synthetic and real networks with varying homophily and group sizes. We find that introducing edge uncertainty can significantly alter the relative density of networks and result both in a strongly increased or decreased ranking of the minority, depending on the type of edge error and homophily. Our model enables researchers to include systematic edge-uncertainty in their analyses and thereby better account for the role of minorities in social networks.

[155]  arXiv:2010.11547 [pdf, other]
Title: TLGAN: document Text Localization using Generative Adversarial Nets
Comments: 17 pages, three figures, 4 tables, methods for IEEE ICDAR RRC SROIE task1 leader board
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Text localization from the digital image is the first step for the optical character recognition task. Conventional image processing based text localization performs adequately for specific examples. Yet, a general text localization are only archived by recent deep-learning based modalities. Here we present document Text Localization Generative Adversarial Nets (TLGAN) which are deep neural networks to perform the text localization from digital image. TLGAN is an versatile and easy-train text localization model requiring a small amount of data. Training only ten labeled receipt images from Robust Reading Challenge on Scanned Receipts OCR and Information Extraction (SROIE), TLGAN achieved 99.83% precision and 99.64% recall for SROIE test data. Our TLGAN is a practical text localization solution requiring minimal effort for data labeling and model training and producing a state-of-art performance.

[156]  arXiv:2010.11548 [pdf]
Title: Method of noun phrase detection in Ukrainian texts
Comments: 25 pages, in Ukrainian, 5 figures, 2 tables
Journal-ref: Control Systems and Computers. 2019. Issue 5. P. 48-59
Subjects: Computation and Language (cs.CL)

Introduction. The area of natural language processing considers AI-complete tasks that cannot be solved using traditional algorithmic actions. Such tasks are commonly implemented with the usage of machine learning methodology and means of computer linguistics. One of the preprocessing tasks of a text is the search of noun phrases. The accuracy of this task has implications for the effectiveness of many other tasks in the area of natural language processing. In spite of the active development of research in the area of natural language processing, the investigation of the search for noun phrases within Ukrainian texts are still at an early stage. Results. The different methods of noun phrases detection have been analyzed. The expediency of the representation of sentences as a tree structure has been justified. The key disadvantage of many methods of noun phrase detection is the severe dependence of the effectiveness of their detection from the features of a certain language. Taking into account the unified format of sentence processing and the availability of the trained model for the building of sentence trees for Ukrainian texts, the Universal Dependency model has been chosen. The complex method of noun phrases detection in Ukrainian texts utilizing Universal Dependencies means and named-entity recognition model has been suggested. Experimental verification of the effectiveness of the suggested method on the corpus of Ukrainian news has been performed. Different metrics of method accuracy have been calculated. Conclusions. The results obtained can indicate that the suggested method can be used to find noun phrases in Ukrainian texts. An accuracy increase of the method can be made with the usage of appropriate named-entity recognition models according to a subject area.

[157]  arXiv:2010.11550 [pdf, other]
Title: Learning Dual Semantic Relations with Graph Attention for Image-Text Matching
Comments: 14pages, 9 figures. Accepted at: IEEE Transactions on Circuits and Systems for Video Technology (Early Access Print) | |Codes Available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)

Image-Text Matching is one major task in cross-modal information processing. The main challenge is to learn the unified visual and textual representations. Previous methods that perform well on this task primarily focus on not only the alignment between region features in images and the corresponding words in sentences, but also the alignment between relations of regions and relational words. However, the lack of joint learning of regional features and global features will cause the regional features to lose contact with the global context, leading to the mismatch with those non-object words which have global meanings in some sentences. In this work, in order to alleviate this issue, it is necessary to enhance the relations between regions and the relations between regional and global concepts to obtain a more accurate visual representation so as to be better correlated to the corresponding text. Thus, a novel multi-level semantic relations enhancement approach named Dual Semantic Relations Attention Network(DSRAN) is proposed which mainly consists of two modules, separate semantic relations module and the joint semantic relations module. DSRAN performs graph attention in both modules respectively for region-level relations enhancement and regional-global relations enhancement at the same time. With these two modules, different hierarchies of semantic relations are learned simultaneously, thus promoting the image-text matching process by providing more information for the final visual representation. Quantitative experimental results have been performed on MS-COCO and Flickr30K and our method outperforms previous approaches by a large margin due to the effectiveness of the dual semantic relations learning scheme. Codes are available at https://github.com/kywen1119/DSRAN.

[158]  arXiv:2010.11552 [pdf, ps, other]
Title: Nonvacuous Loss Bounds with Fast Rates for Neural Networks via Conditional Information Measures
Subjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)

We present a framework to derive bounds on the test loss of randomized learning algorithms for the case of bounded loss functions. This framework leads to bounds that depend on the conditional information density between the the output hypothesis and the choice of the training set, given a larger set of data samples from which the training set is formed. Furthermore, the bounds pertain to the average test loss as well as to its tail probability, both for the PAC-Bayesian and the single-draw settings. If the conditional information density is bounded uniformly in the size $n$ of the training set, our bounds decay as $1/n$, which is referred to as a fast rate. This is in contrast with the tail bounds involving conditional information measures available in the literature, which have a less benign $1/\sqrt{n}$ dependence. We demonstrate the usefulness of our tail bounds by showing that they lead to estimates of the test loss achievable with several neural network architectures trained on MNIST and Fashion-MNIST that match the state-of-the-art bounds available in the literature.

[159]  arXiv:2010.11553 [pdf, other]
Title: Incorporating Stylistic Lexical Preferences in Generative Language Models
Comments: To Appear in Findings of EMNLP 2020
Subjects: Computation and Language (cs.CL)

While recent advances in language modeling have resulted in powerful generation models, their generation style remains implicitly dependent on the training data and can not emulate a specific target style. Leveraging the generative capabilities of a transformer-based language models, we present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lexical preferences of an author into generative language models. We introduce rewarding strategies in a reinforcement learning framework that encourages the use of words across multiple categorical dimensions, to varying extents. Our experiments demonstrate that the proposed approach can generate text that distinctively aligns with a given target author's lexical style. We conduct quantitative and qualitative comparisons with competitive and relevant baselines to illustrate the benefits of the proposed approach.

[160]  arXiv:2010.11559 [pdf, other]
Title: Learning Graph Laplacian with MCP
Comments: 26 pages, 15 figures
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

Motivated by the observation that the ability of the $\ell_1$ norm in promoting sparsity in graphical models with Laplacian constraints is much weakened, this paper proposes to learn graph Laplacian with a non-convex penalty: minimax concave penalty (MCP). For solving the MCP penalized graphical model, we design an inexact proximal difference-of-convex algorithm (DCA) and prove its convergence to critical points. We note that each subproblem of the proximal DCA enjoys the nice property that the objective function in its dual problem is continuously differentiable with a semismooth gradient. Therefore, we apply an efficient semismooth Newton method to subproblems of the proximal DCA. Numerical experiments on various synthetic and real data sets demonstrate the effectiveness of the non-convex penalty MCP in promoting sparsity. Compared with the state-of-the-art method \cite[Algorithm~1]{ying2020does}, our method is demonstrated to be more efficient and reliable for learning graph Laplacian with MCP.

[161]  arXiv:2010.11560 [pdf, other]
Title: Deep Learning is Singular, and That's Good
Subjects: Machine Learning (cs.LG)

In singular models, the optimal set of parameters forms an analytic set with singularities and classical statistical inference cannot be applied to such models. This is significant for deep learning as neural networks are singular and thus "dividing" by the determinant of the Hessian or employing the Laplace approximation are not appropriate. Despite its potential for addressing fundamental issues in deep learning, singular learning theory appears to have made little inroads into the developing canon of deep learning theory. Via a mix of theory and experiment, we present an invitation to singular learning theory as a vehicle for understanding deep learning and suggest important future work to make singular learning theory directly applicable to how deep learning is performed in practice.

[162]  arXiv:2010.11561 [pdf, other]
Title: FUEL: Fast UAV Exploration using Incremental Frontier Structure and Hierarchical Planning
Comments: Video: this https URL; Demo: this https URL
Subjects: Robotics (cs.RO)

Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles. Existing methods, however, were demonstrated to have low efficiency, due to the lack of optimality consideration, conservative motion plans and low decision frequencies. In this paper, we propose FUEL, a hierarchical framework that can support Fast UAV Exploration in complex unknown environments. We maintain crucial information in the entire space required by exploration planning by a frontier information structure (FIS), which can be updated incrementally when the space is explored. Supported by the FIS, a hierarchical planner plan exploration motions in three steps, which find efficient global coverage paths, refine a local set of viewpoints and generate minimum-time trajectories in sequence. We present extensive benchmark and real-world tests, in which our method completes the exploration tasks with unprecedented efficiency (3-8 times faster) compared to state-of-the-art approaches. Our method will be made open source to benefit the community.

[163]  arXiv:2010.11562 [pdf, other]
Title: Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models
Comments: Published in Lecture Notes in Computer Science, Springer International Publishing
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

We consider the task of incorporating real-world commonsense knowledge into deep Natural Language Inference (NLI) models. Existing external knowledge incorporation methods are limited to lexical level knowledge and lack generalization across NLI models, datasets, and commonsense knowledge sources. To address these issues, we propose a novel NLI model-independent neural framework, BiCAM. BiCAM incorporates real-world commonsense knowledge into NLI models. Combined with convolutional feature detectors and bilinear feature fusion, BiCAM provides a conceptually simple mechanism that generalizes well. Quantitative evaluations with two state-of-the-art NLI baselines on SNLI and SciTail datasets in conjunction with ConceptNet and Aristo Tuple KGs show that BiCAM considerably improves the accuracy the incorporated NLI baselines. For example, our BiECAM model, an instance of BiCAM, on the challenging SciTail dataset, improves the accuracy of incorporated baselines by 7.0% with ConceptNet, and 8.0% with Aristo Tuple KG.

[164]  arXiv:2010.11563 [pdf, other]
Title: Fingerprint Orientation Estimation: Challenges and Opportunities
Subjects: Computer Vision and Pattern Recognition (cs.CV)

There is an exponential increase in portable electronic devices with biometric security mechanisms, in particular fingerprint biometric. A person has a limited number of fingerprints and it remains unchanged throughout his lifetime, once leaked to the adversary, it leaks for a lifetime. So, there is a need to secure the biometric template itself. In this survey paper, we review the different security models and fingerprint template protection techniques. The research challenges in different fingerprint template protection techniques are also highlighted in respective sections of the paper. This survey provides a comprehensive study of template protection techniques for fingerprint biometric systems and highlights the challenges and future opportunities.

[165]  arXiv:2010.11567 [pdf, other]
Title: AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

In this paper, we present AISHELL-3, a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Accordingly, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. We present a baseline system that uses AISHELL-3 for multi-speaker Madarin speech synthesis. The multi-speaker speech synthesis system is an extension on Tacotron-2 where a speaker verification model and a corresponding loss regarding voice similarity are incorporated as the feedback constraint. We aim to use the presented corpus to build a robust synthesis model that is able to achieve zero-shot voice cloning. The system trained on this dataset also generalizes well on speakers that are never seen in the training process. Objective evaluation results from our experiments show that the proposed multi-speaker synthesis system achieves high voice similarity concerning both speaker embedding similarity and equal error rate measurement. The dataset, baseline system code and generated samples are available online.

[166]  arXiv:2010.11568 [pdf, other]
Title: Quantile Bandits for Best Arms Identification with Concentration Inequalities
Subjects: Machine Learning (cs.LG)

We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems in fields like medicine, biology and finance, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values under a fixed budget. We propose Quantile Successive Accepts and Rejects algorithm (Q-SAR), the first quantile based algorithm for fixed budget multiple arms identification. We prove two-sided asymmetric concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With the proposed concentration inequalities, we upper bound the probability of arm misidentification for the bandit task. We show illustrative experiments for best arm identification.

[167]  arXiv:2010.11571 [pdf, other]
Title: A 4-Approximation of the $\frac{2π}{3}$-MST
Subjects: Computational Geometry (cs.CG)

Bounded-angle (minimum) spanning trees were first introduced in the context of wireless networks with directional antennas. They are reminiscent of bounded-degree spanning trees, which have received significant attention. Let $P = \{p_1,\ldots,p_n\}$ be a set of $n$ points in the plane, let $\Pi$ be the polygonal path $(p_1,\ldots,p_n)$, and let $0 < \alpha < 2\pi$ be an angle. An $\alpha$-spanning tree ($\alpha$-ST) of $P$ is a spanning tree of the complete Euclidean graph over $P$, with the following property: For each vertex $p_i \in P$, the (smallest) angle that is spanned by all the edges incident to $p_i$ is at most $\alpha$. An $\alpha$-minimum spanning tree ($\alpha$-MST) is an $\alpha$-ST of $P$ of minimum weight, where the weight of an $\alpha$-ST is the sum of the lengths of its edges. In this paper, we consider the problem of computing an $\alpha$-MST, for the important case where $\alpha = \frac{2\pi}{3}$. We present a simple 4-approximation algorithm, thus improving upon the previous results of Aschner and Katz and Biniaz et al., who presented algorithms with approximation ratios 6 and $\frac{16}{3}$, respectively.
In order to obtain this result, we devise a simple $O(n)$-time algorithm for constructing a $\frac{2\pi}{3}$-ST\, ${\cal T}$ of $P$, such that ${\cal T}$'s weight is at most twice that of $\Pi$ and, moreover, ${\cal T}$ is a 3-hop spanner of $\Pi$. This latter result is optimal in the sense that for any $\varepsilon > 0$ there exists a polygonal path for which every $\frac{2\pi}{3}$-ST has weight greater than $2-\varepsilon$ times the weight of the path.

[168]  arXiv:2010.11574 [pdf, other]
Title: Investigating the True Performance of Transformers in Low-Resource Languages: A Case Study in Automatic Corpus Creation
Comments: Code and data available at this https URL
Subjects: Computation and Language (cs.CL)

Transformers represent the state-of-the-art in Natural Language Processing (NLP) in recent years, proving effective even in tasks done in low-resource languages. While pretrained transformers for these languages can be made, it is challenging to measure their true performance and capacity due to the lack of hard benchmark datasets, as well as the difficulty and cost of producing them. In this paper, we present three contributions: First, we propose a methodology for automatically producing Natural Language Inference (NLI) benchmark datasets for low-resource languages using published news articles. Through this, we create and release NewsPH-NLI, the first sentence entailment benchmark dataset in the low-resource Filipino language. Second, we produce new pretrained transformers based on the ELECTRA technique to further alleviate the resource scarcity in Filipino, benchmarking them on our dataset against other commonly-used transfer learning techniques. Lastly, we perform analyses on transfer learning techniques to shed light on their true performance when operating in low-data domains through the use of degradation tests.

[169]  arXiv:2010.11575 [pdf, other]
Title: Face Hallucination Using Split-Attention in Split-Attention Network
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face hallucination is a domain-specific super-resolution (SR), that generates high-resolution (HR) facial images from the observed one/multiple low-resolution (LR) input/s. Recently, convolutional neural networks(CNNs) are successfully applied into face hallucination to model the complex nonlinear mapping between HR and LR images. Although global attention mechanism equipped into CNNs naturally focus on the facial structure information, it always ignore the local and cross feature structure information, resulting in limited reconstruction performance. In order to solve this problem, we propose global-local split-attention mechanism and design a Split-Attention in Split-Attention (SIS) network to enable local attention across feature-map groups attaining global attention and to improve the ability of feature representations. SIS can generate and focus the local attention of neural network on the interaction of face key structure information in channel-level, thereby improve the performance of face image reconstruction. Experimental results show that the proposed approach consistently and significantly improves the reconstruction performances for face hallucination.

[170]  arXiv:2010.11578 [pdf, other]
Title: Multi-dimensional Style Transfer for Partially Annotated Data using Language Models as Discriminators
Subjects: Computation and Language (cs.CL)

Style transfer has been widely explored in natural language generation with non-parallel corpus by directly or indirectly extracting a notion of style from source and target domain corpus. A common aspect among the existing approaches is the prerequisite of joint annotations across all the stylistic dimensions under consideration. Availability of such dataset across a combination of styles is a limiting factor in extending state-of-the art style transfer setups to multiple style dimensions. While cascading single-dimensional models across multiple styles is a possibility, it suffers from content loss, especially when the style dimensions are not completely independent of each other. In our work, we attempt to relax this restriction on requirement of jointly annotated data across multiple styles being inspected and make use of independently acquired data across different style dimensions without any additional annotations. We initialize an encoder-decoder setup with large transformer-based language models pre-trained on a generic corpus and enhance its re-writing capability to multiple styles by employing multiple language models as discriminators. Through quantitative and qualitative evaluation, we show the ability of our model to control for styles across multiple style-dimensions while preserving content of the input text and compare it against baselines which involve cascaded state-of-the-art uni-dimensional style transfer models.

[171]  arXiv:2010.11584 [pdf, other]
Title: Flexibility management with virtual batteries of thermostatically controlled loads: real-time control system and potential in Spain
Comments: 15 pages, 21 figures
Subjects: Systems and Control (eess.SY)

Virtual batteries composed of aggregated thermostatically controlled loads are able to provide real-time frequency regulation to electrical grids. Load flexibility management can be helpful in solving the problem of balancing generation and demand, which is becoming more complex due to the variability of renewable energies. A real-time virtual battery control system is presented in this paper. As an example, a virtual battery of 1000 thermostatically controlled loads is operated. In order to quantify the potential of virtual batteries, a study focused on residential thermostatically controlled loads in Spain is reported.

[172]  arXiv:2010.11585 [pdf]
Title: A simulation-based evaluation of a Cargo-Hitching service for E-commerce using mobility-on-demand vehicles
Comments: 19 pages, 4 tables, 7 figures. Submitted to Transportation (Springer)
Subjects: Multiagent Systems (cs.MA); Systems and Control (eess.SY)

Time-sensitive parcel deliveries, shipments requested for delivery in a day or less, are an increasingly important research subject. It is challenging to deal with these deliveries from a carrier perspective since it entails additional planning constraints, preventing an efficient consolidation of deliveries which is possible when demand is well known in advance. Furthermore, such time-sensitive deliveries are requested to a wider spatial scope than retail centers, including homes and offices. Therefore, an increase in such deliveries is considered to exacerbate negative externalities such as congestion and emissions. One of the solutions is to leverage spare capacity in passenger transport modes. This concept is often denominated as cargo-hitching. While there are various possible system designs, it is crucial that such solution does not deteriorate the quality of service of passenger trips. This research aims to evaluate the use of Mobility-On-Demand services to perform same-day parcel deliveries. For this purpose, we use SimMobility, a high-resolution agent-based simulation platform of passenger and freight flows, applied in Singapore. E-commerce demand carrier data are used to characterize simulated parcel delivery demand. Operational scenarios that aim to minimize the adverse effect of fulfilling deliveries with Mobility-On-Demand vehicles on Mobility-On-Demand passenger flows (fulfillment, wait and travel times) are explored. Results indicate that the Mobility-On-Demand services have potential to fulfill a considerable amount of parcel deliveries and decrease freight vehicle traffic and total vehicle-kilometers-travelled without compromising the quality of Mobility On-Demand for passenger travel.

[173]  arXiv:2010.11593 [pdf, other]
Title: A Technical Report: BUT Speech Translation Systems
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

The paper describes the BUT's speech translation systems. The systems are English$\longrightarrow$German offline speech translation systems. The systems are based on our previous works \cite{Jointly_trained_transformers}. Though End-to-End and cascade~(ASR-MT) spoken language translation~(SLT) systems are reaching comparable performances, a large degradation is observed when translating ASR hypothesis compared to the oracle input text. To reduce this performance degradation, we have jointly-trained ASR and MT modules with ASR objective as an auxiliary loss. Both the networks are connected through the neural hidden representations. This model has an End-to-End differentiable path with respect to the final objective function and also utilizes the ASR objective for better optimization. During the inference both the modules(i.e., ASR and MT) are connected through the hidden representations corresponding to the n-best hypotheses. Ensembling with independently trained ASR and MT models have further improved the performance of the system.

[174]  arXiv:2010.11594 [pdf, other]
Title: Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization
Comments: ECCV 2020 spotlight
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and localize all action instances in an untrimmed video under only video-level supervision. However, without frame-level annotations, it is challenging for W-TAL methods to identify false positive action proposals and generate action proposals with precise temporal boundaries. In this paper, we present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges. The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated, and used to provide frame-level supervision for improved model training and false positive action proposal elimination. Furthermore, we propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries. Experiments conducted on the THUMOS14 and ActivityNet datasets show that the proposed TSCN outperforms current state-of-the-art methods, and even achieves comparable results with some recent fully-supervised methods.

[175]  arXiv:2010.11598 [pdf, other]
Title: An Efficient Adversarial Attack for Tree Ensembles
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG)

We study the problem of efficient adversarial attacks on tree based ensembles such as gradient boosting decision trees (GBDTs) and random forests (RFs). Since these models are non-continuous step functions and gradient does not exist, most existing efficient adversarial attacks are not applicable. Although decision-based black-box attacks can be applied, they cannot utilize the special structure of trees. In our work, we transform the attack problem into a discrete search problem specially designed for tree ensembles, where the goal is to find a valid "leaf tuple" that leads to mis-classification while having the shortest distance to the original input. With this formulation, we show that a simple yet effective greedy algorithm can be applied to iteratively optimize the adversarial example by moving the leaf tuple to its neighborhood within hamming distance 1. Experimental results on several large GBDT and RF models with up to hundreds of trees demonstrate that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach, while also providing smaller (better) adversarial examples than decision-based black-box attacks on general $\ell_p$ ($p=1, 2, \infty$) norm perturbations. Our code is available at https://github.com/chong-z/tree-ensemble-attack.

[176]  arXiv:2010.11600 [pdf, other]
Title: On the Power of Deep but Naive Partial Label Learning
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

Partial label learning (PLL) is a class of weakly supervised learning where each training instance consists of a data and a set of candidate labels containing a unique ground truth label. To tackle this problem, a majority of current state-of-the-art methods employs either label disambiguation or averaging strategies. So far, PLL methods without such techniques have been considered impractical. In this paper, we challenge this view by revealing the hidden power of the oldest and naivest PLL method when it is instantiated with deep neural networks. Specifically, we show that, with deep neural networks, the naive model can achieve competitive performances against the other state-of-the-art methods, suggesting it as a strong baseline for PLL. We also address the question of how and why such a naive model works well with deep neural networks. Our empirical results indicate that deep neural networks trained on partially labeled examples generalize very well even in the over-parametrized regime and without label disambiguations or regularizations. We point out that existing learning theories on PLL are vacuous in the over-parametrized regime. Hence they cannot explain why the deep naive method works. We propose an alternative theory on how deep learning generalize in PLL problems.

[177]  arXiv:2010.11601 [pdf]
Title: Revisiting Wireless Internet Connectivity: 5G vs Wi-Fi 6
Subjects: Networking and Internet Architecture (cs.NI)

In recent years, significant attention has been directed toward the fifth generation of wireless broadband connectivity known as '5G', currently being deployed by Mobile Network Operators. Surprisingly, there has been considerably less attention paid to 'Wi-Fi 6', the new IEEE 802.11ax standard in the family of Wireless Local Area Network technologies with features targeting private, edge-networks. This paper revisits the suitability of cellular and Wi-Fi in delivering high-speed wire-less Internet connectivity. Both technologies aspire to deliver significantly enhanced performance, enabling each to deliver much faster wireless broadband connectivity, and provide further support for the Internet of Things and Machine-to-Machine communications, positioning the two technologies as technical substitutes in many usage scenarios. We conclude that both are likely to play important roles in the future, and simultaneously serve as competitors and complements. We anticipate that 5G will remain the preferred technology for wide-area coverage, while Wi-Fi 6 will remain the preferred technology for indoor use, thanks to its much lower deployment costs. However, the trend towards providing seamless wireless broadband connectivity, as well as smaller-cell network architectures with increasingly flexible and spectrum-agile technologies, is blurring the traditional boundaries that differentiated earlier generations of cellular and Wi-Fi.

[178]  arXiv:2010.11604 [pdf, other]
Title: AI-lead Court Debate Case Investigation
Comments: 4 pages, 2 figures
Subjects: Computation and Language (cs.CL)

The multi-role judicial debate composed of the plaintiff, defendant, and judge is an important part of the judicial trial. Different from other types of dialogue, questions are raised by the judge, The plaintiff, plaintiff's agent defendant, and defendant's agent would be to debating so that the trial can proceed in an orderly manner. Question generation is an important task in Natural Language Generation. In the judicial trial, it can help the judge raise efficient questions so that the judge has a clearer understanding of the case. In this work, we propose an innovative end-to-end question generation model-Trial Brain Model (TBM) to build a Trial Brain, it can generate the questions the judge wants to ask through the historical dialogue between the plaintiff and the defendant. Unlike prior efforts in natural language generation, our model can learn the judge's questioning intention through predefined knowledge. We do experiments on real-world datasets, the experimental results show that our model can provide a more accurate question in the multi-role court debate scene.

[179]  arXiv:2010.11605 [pdf, ps, other]
Title: Automata and Fixpoints for Asynchronous Hyperproperties
Subjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL); Programming Languages (cs.PL)

Hyperproperties have received increasing attention in the last decade due to their importance e.g. for security analyses. Past approaches have focussed on synchronous analyses, i.e. techniques in which different paths are compared lockstepwise. In this paper, we systematically study asynchronous analyses for hyperproperties by introducing both a novel automata model (Alternating Asynchronous Parity Automata) and the temporal fixpoint calculus $\Hmu$, the first fixpoint calculus that can systematically express hyperproperties in an asynchronous manner and at the same time subsumes the existing logic HyperLTL. We show that the expressive power of both models coincides over fixed path assignments. The high expressive power of both models is evidenced by the fact that decision problems of interest are highly undecidable, i.e. not even arithmetical. As a remedy, we propose approximative analyses for both models that also induce natural decidable fragments.

[180]  arXiv:2010.11607 [pdf, other]
Title: Backdoor Attack against Speaker Verification
Comments: The first two authors contributed equally to this work. 5 pages
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Speaker verification has been widely and successfully adopted in many mission-critical areas for user identification. The training of speaker verification requires a large amount of data, therefore users usually need to adopt third-party data ($e.g.$, data from the Internet or third-party data company). This raises the question of whether adopting untrusted third-party data can pose a security threat. In this paper, we demonstrate that it is possible to inject the hidden backdoor for infecting speaker verification models by poisoning the training data. Specifically, we design a clustering-based attack scheme where poisoned samples from different clusters will contain different triggers ($i.e.$, pre-defined utterances), based on our understanding of verification tasks. The infected models behave normally on benign samples, while attacker-specified unenrolled triggers will successfully pass the verification even if the attacker has no information about the enrolled speaker. We also demonstrate that existing backdoor attacks can not be directly adopted in attacking speaker verification. Our attack not only provides a new perspective for designing novel attacks, but also serves as a strong baseline for improving the robustness of verification methods.

[181]  arXiv:2010.11611 [pdf]
Title: A Simple Methodology for Model-Driven Business Innovation and Low Code Implementation
Comments: 10 pages, 5 figures, 6 tables
Subjects: Software Engineering (cs.SE)

Low Code platforms, according to Gartner Group, represent one of the more disruptive technologies in the development and maintenance of enterprise applications. The key factor is represented by the central involvement of business people and domain expert, with a substantial disintermediation with respect to technical people. In this paper we propose a methodology conceived to support non-technical people in addressing business process innovation and developing enterprise software application. The proposed methodology, called EasInnova, is solidly rooted in Model-Driven Engineering and adopts a three staged model of an innovation undertaking. The three stages are: AsIs that models the existing business scenario; Transformation that consists in the elaboration of the actual innovation; ToBe that concerns the modeling of new business scenario. The core of EasInnova is represented by a matrix where columns are the three innovation stages and the rows are the three Model-Driven Architecture layers: CIM, PIM, PSM. The cells indicate the steps to be followed in achieving the sought innovation. Finally, the produced models will be transferred onto a BonitaSoft, the Low Code platform selected in our work. The methodology is described by means of a simple example in the domain of home food delivery.

[182]  arXiv:2010.11612 [pdf, other]
Title: Hierarchical Federated Learning through LAN-WAN Orchestration
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated learning (FL) was designed to enable mobile phones to collaboratively learn a global model without uploading their private data to a cloud server. However, exiting FL protocols has a critical communication bottleneck in a federated network coupled with privacy concerns, usually powered by a wide-area network (WAN). Such a WAN-driven FL design leads to significantly high cost and much slower model convergence. In this work, we propose an efficient FL protocol, which involves a hierarchical aggregation mechanism in the local-area network (LAN) due to its abundant bandwidth and almost negligible monetary cost than WAN. Our proposed FL can accelerate the learning process and reduce the monetary cost with frequent local aggregation in the same LAN and infrequent global aggregation on a cloud across WAN. We further design a concrete FL platform, namely LanFL, that incorporates several key techniques to handle those challenges introduced by LAN: cloud-device aggregation architecture, intra-LAN peer-to-peer (p2p) topology generation, inter-LAN bandwidth capacity heterogeneity. We evaluate LanFL on 2 typical Non-IID datasets, which reveals that LanFL can significantly accelerate FL training (1.5x-6.0x), save WAN traffic (18.3x-75.6x), and reduce monetary cost (3.8x-27.2x) while preserving the model accuracy.

[183]  arXiv:2010.11614 [pdf, ps, other]
Title: Quantitative analysis of robot gesticulation behavior
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Social robot capabilities, such as talking gestures, are best produced using data driven approaches to avoid being repetitive and to show trustworthiness. However, there is a lack of robust quantitative methods that allow to compare such methods beyond visual evaluation. In this paper a quantitative analysis is performed that compares two Generative Adversarial Networks based gesture generation approaches. The aim is to measure characteristics such as fidelity to the original training data, but at the same time keep track of the degree of originality of the produced gestures. Principal Coordinate Analysis and procrustes statistics are performed and a new Fr\'echet Gesture Distance is proposed by adapting the Fr\'echet Inception Distance to gestures. These three techniques are taken together to asses the fidelity/originality of the generated gestures.

[184]  arXiv:2010.11619 [pdf, other]
Title: Self-Supervised Shadow Removal
Comments: 10 pages, 4 figures, 6 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Shadow removal is an important computer vision task aiming at the detection and successful removal of the shadow produced by an occluded light source and a photo-realistic restoration of the image contents. Decades of re-search produced a multitude of hand-crafted restoration techniques and, more recently, learned solutions from shad-owed and shadow-free training image pairs. In this work,we propose an unsupervised single image shadow removal solution via self-supervised learning by using a conditioned mask. In contrast to existing literature, we do not require paired shadowed and shadow-free images, instead we rely on self-supervision and jointly learn deep models to remove and add shadows to images. We validate our approach on the recently introduced ISTD and USR datasets. We largely improve quantitatively and qualitatively over the compared methods and set a new state-of-the-art performance in single image shadow removal.

[185]  arXiv:2010.11623 [pdf, ps, other]
Title: Performance Analysis and Optimization for the MAC Protocol in UAV-based IoT Network
Authors: Bin Li, Xianzhen Guo, Ruonan Zhang, Xiaojiang Du (Fellow, IEEE), Mohsen Guizani (Fellow, IEEE)
Subjects: Information Theory (cs.IT)

Unmanned aerial vehicles (UAVs) have played an important role in air-ground integration network. Especially in Internet of Things (IoT) services, UAV equipped with communication equipments is widely adopted as a mobile base station (BS) for data collection from IoT devices on the ground. In this paper, we consider an air-ground network in which the UAV flies straightly to collect information from the IoT devices in a 2-D plane based on the CSMA/CA protocol. Due to UAV's continuous mobility, the communication durations of devices in different locations with UAV are not only time-limited, but also vary from each other. To analyze the throughput performance of uplink multiple access control (MAC) protocol, we propose a new analysis model to deal with the communications heterogeneity in the network. Firstly, we divide the devices in the coverage into different clusters according to their communication durations. Then, a quitting probability indicating the probability that a device quits the UAV's coverage at each time slot is clarified. A modified three-dimensional Markov chain model adopting the quitting probability and cluster division is developed for the performance analysis. Besides, we also propose a modified CSMA/CA protocol which fully considers the heterogeneity of the access time and adaptively allocates the time resource among the devices in different clusters. Finally, the effects of retry limit, initial contention window size, the density of the devices, UAVs speed and coverage area are discussed in the simulation section.

[186]  arXiv:2010.11625 [pdf, other]
Title: One-shot Distributed Algorithm for Generalized Eigenvalue Problem
Subjects: Machine Learning (cs.LG)

Nowadays, more and more datasets are stored in a distributed way for the sake of memory storage or data privacy. The generalized eigenvalue problem (GEP) plays a vital role in a large family of high-dimensional statistical models. However, the existing distributed method for eigenvalue decomposition cannot be applied in GEP for the divergence of the empirical covariance matrix. Here we propose a general distributed GEP framework with one-shot communication for GEP. If the symmetric data covariance has repeated eigenvalues, e.g., in canonical component analysis, we further modify the method for better convergence. The theoretical analysis on approximation error is conducted and the relation to the divergence of the data covariance, the eigenvalues of the empirical data covariance, and the number of local servers is analyzed. Numerical experiments also show the effectiveness of the proposed algorithms.

[187]  arXiv:2010.11627 [pdf, ps, other]
Title: Malware Traffic Classification: Evaluation of Algorithms and an Automated Ground-truth Generation Pipeline
Comments: 8 pages, 5 tables, 2 Algorithms. IMDEA Software Institute, Madrid, Spain. Universidad Polit\'ecnica de Madrid (UPM)
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Identifying threats in a network traffic flow which is encrypted is uniquely challenging. On one hand it is extremely difficult to simply decrypt the traffic due to modern encryption algorithms. On the other hand, passing such an encrypted stream through pattern matching algorithms is useless because encryption ensures there aren't any. Moreover, evaluating such models is also difficult due to lack of labeled benign and malware datasets. Other approaches have tried to tackle this problem by employing observable meta-data gathered from the flow. We try to augment this approach by extending it to a semi-supervised malware classification pipeline using these observable meta-data. To this end, we explore and test different kind of clustering approaches which make use of unique and diverse set of features extracted from this observable meta-data. We also, propose an automated packet data-labeling pipeline to generate ground-truth data which can serve as a base-line to evaluate the classifiers mentioned above in particular, or any other detection model in general.

[188]  arXiv:2010.11629 [pdf, other]
Title: Learning Augmented Energy Minimization via Speed Scaling
Comments: 30 pages, 4 figures. To appear in NeurIPS 2020 (spotlight)
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)

As power management has become a primary concern in modern data centers, computing resources are being scaled dynamically to minimize energy consumption. We initiate the study of a variant of the classic online speed scaling problem, in which machine learning predictions about the future can be integrated naturally. Inspired by recent work on learning-augmented online algorithms, we propose an algorithm which incorporates predictions in a black-box manner and outperforms any online algorithm if the accuracy is high, yet maintains provable guarantees if the prediction is very inaccurate. We provide both theoretical and experimental evidence to support our claims.

[189]  arXiv:2010.11631 [pdf, other]
Title: LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation
Comments: 5 pages, 3 figures, 2 tables. under reviewing of ICASSP 2020
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Recent deep-learning approaches have shown that Frequency Transformation (FT) blocks can significantly improve spectrogram-based single-source separation models by capturing frequency patterns. The goal of this paper is to extend the FT block to fit the multi-source task. We propose the Latent Source Attentive Frequency Transformation (LaSAFT) block to capture source-dependent frequency patterns. We also propose the Gated Point-wise Convolutional Modulation (GPoCM), an extension of Feature-wise Linear Modulation (FiLM), to modulate internal features. By employing these two novel methods, we extend the Conditioned-U-Net (CUNet) for multi-source separation, and the experimental results indicate that our LaSAFT and GPoCM can improve the CUNet's performance, achieving state-of-the-art SDR performance on several MUSDB18 source separation tasks.

[190]  arXiv:2010.11632 [pdf, other]
Title: The Primal-Dual method for Learning Augmented Algorithms
Comments: 30 pages, 11 figures. To appear in NeurIPS 2020 (oral)
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)

The extension of classical online algorithms when provided with predictions is a new and active research area. In this paper, we extend the primal-dual method for online algorithms in order to incorporate predictions that advise the online algorithm about the next action to take. We use this framework to obtain novel algorithms for a variety of online covering problems. We compare our algorithms to the cost of the true and predicted offline optimal solutions and show that these algorithms outperform any online algorithm when the prediction is accurate while maintaining good guarantees when the prediction is misleading.

[191]  arXiv:2010.11635 [pdf, other]
Title: Continual Learning in Low-rank Orthogonal Subspaces
Comments: The paper is accepted at NeurIPS'20
Journal-ref: NeurIPS, 2020
Subjects: Machine Learning (cs.LG)

In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the continual learning experience is finished. The prior art in CL uses episodic memory, parameter regularization or extensible network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space. We believe this invariably leads to interference among different tasks. We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Further, to keep the gradients of different tasks coming from these subspaces orthogonal to each other, we learn isometric mappings by posing network training as an optimization problem over the Stiefel manifold. To the best of our understanding, we report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning. The code is made publicly available.

[192]  arXiv:2010.11638 [pdf, other]
Title: Pseudoscientific Content on YouTube: Assessing the Effects of Watch History on the Recommendation Algorithm
Subjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)

YouTube has revolutionized the way people discover and consume videos, becoming one of the primary news sources for Internet users. Since content on YouTube is generated by its users, the platform is particularly vulnerable to misinformative and conspiratorial videos. Even worse, the role played by YouTube's recommendation algorithm in unwittingly promoting questionable content is not well understood, and could potentially make the problem even worse. This can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, e.g., during the COVID-19 pandemic.
In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the flat earth theory, the anti-vaccination, and anti-mask movements; using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant. We then train a deep learning classifier to detect pseudoscientific videos with an accuracy of 76.1%. Next, we quantify user exposure to this content on various parts of the platform (i.e., a user's homepage, recommended videos while watching a specific video, or search results) and how this exposure changes based on the user's watch history. We find that YouTube's recommendation algorithm is more aggressive in suggesting pseudoscientific content when users are searching for specific topics, while these recommendations are less common on a user's homepage or when actively watching pseudoscientific videos. Finally, we shed light on how a user's watch history substantially affects the type of recommended videos.

[193]  arXiv:2010.11639 [pdf, ps, other]
Title: Towards Fully Bilingual Deep Language Modeling
Subjects: Computation and Language (cs.CL)

Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years. While models covering a large number of languages have been introduced, their multilinguality has come at a cost in terms of monolingual performance, and the best-performing models at most tasks not involving cross-lingual transfer remain monolingual. In this paper, we consider the question of whether it is possible to pre-train a bilingual model for two remotely related languages without compromising performance at either language. We collect pre-training data, create a Finnish-English bilingual BERT model and evaluate its performance on datasets used to evaluate the corresponding monolingual models. Our bilingual model performs on par with Google's original English BERT on GLUE and nearly matches the performance of monolingual Finnish BERT on a range of Finnish NLP tasks, clearly outperforming multilingual BERT. We find that when the model vocabulary size is increased, the BERT-Base architecture has sufficient capacity to learn two remotely related languages to a level where it achieves comparable performance with monolingual models, demonstrating the feasibility of training fully bilingual deep language models. The model and all tools involved in its creation are freely available at https://github.com/TurkuNLP/biBERT

[194]  arXiv:2010.11641 [pdf, other]
Title: Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data
Comments: Accepted to AIST 2020
Journal-ref: AIST 2020 (CCIS series)
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG); Applications (stat.AP)

Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the machine learning classifier from assigning the classes correctly. To tackle this issue, we used well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation $\textit{patients} \times \textit{SNPs}$. The paper contains experimental results on applying a biclustering algorithm to a large real-world dataset collected for studying the genetic bases of ischemic stroke. The algorithm could identify large dense biclusters in the genotypic matrix for further processing, which in return significantly improved the quality of machine learning classifiers. The proposed algorithm was also able to generate biclusters for the whole dataset without size constraints in comparison to the In-Close4 algorithm for generation of formal concepts.

[195]  arXiv:2010.11644 [pdf, other]
Title: Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks
Subjects: Machine Learning (cs.LG); Econometrics (econ.EM)

Researchers often treat data-driven and theory-driven models as two disparate or even conflicting methods in travel behavior analysis. However, the two methods are highly complementary because data-driven methods are more predictive but less interpretable and robust, while theory-driven methods are more interpretable and robust but less predictive. Using their complementary nature, this study designs a theory-based residual neural network (TB-ResNet) framework, which synergizes discrete choice models (DCMs) and deep neural networks (DNNs) based on their shared utility interpretation. The TB-ResNet framework is simple, as it uses a ($\delta$, 1-$\delta$) weighting to take advantage of DCMs' simplicity and DNNs' richness, and to prevent underfitting from the DCMs and overfitting from the DNNs. This framework is also flexible: three instances of TB-ResNets are designed based on multinomial logit model (MNL-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets), which are tested on three data sets. Compared to pure DCMs, the TB-ResNets provide greater prediction accuracy and reveal a richer set of behavioral mechanisms owing to the utility function augmented by the DNN component in the TB-ResNets. Compared to pure DNNs, the TB-ResNets can modestly improve prediction and significantly improve interpretation and robustness, because the DCM component in the TB-ResNets stabilizes the utility functions and input gradients. Overall, this study demonstrates that it is both feasible and desirable to synergize DCMs and DNNs by combining their utility specifications under a TB-ResNet framework. Although some limitations remain, this TB-ResNet framework is an important first step to create mutual benefits between DCMs and DNNs for travel behavior modeling, with joint improvement in prediction, interpretation, and robustness.

[196]  arXiv:2010.11645 [pdf, other]
Title: Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Convex relaxations have emerged as a promising approach for verifying desirable properties of neural networks like robustness to adversarial perturbations. Widely used Linear Programming (LP) relaxations only work well when networks are trained to facilitate verification. This precludes applications that involve verification-agnostic networks, i.e., networks not specially trained for verification. On the other hand, semidefinite programming (SDP) relaxations have successfully be applied to verification-agnostic networks, but do not currently scale beyond small networks due to poor time and space asymptotics. In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration. By exploiting iterative eigenvector methods, we express all solver operations in terms of forward and backward passes through the network, enabling efficient use of hardware like GPUs/TPUs. For two verification-agnostic networks on MNIST and CIFAR-10, we significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.

[197]  arXiv:2010.11646 [pdf, other]
Title: Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization
Comments: submitted to ICASSP2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Many-to-many voice conversion with non-parallel training data has seen significant progress in recent years. StarGAN-based models have been interests of voice conversion. However, most of the StarGAN-based methods only focused on voice conversion experiments for the situations where the number of speakers was small, and the amount of training data was large. In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples. In order to improve data efficiency, the proposed model uses a speaker encoder for extracting speaker embeddings and conducts adaptive instance normalization (AdaIN) on convolutional weights. Experiments are conducted with 109 speakers under two low-resource situations, where the number of training samples is 20 and 5 per speaker. An objective evaluation shows the proposed model is better than the baseline methods. Furthermore, a subjective evaluation shows that, for both naturalness and similarity, the proposed model outperforms the baseline method.

[198]  arXiv:2010.11647 [pdf, ps, other]
Title: Quaternion-Valued Variational Autoencoder
Comments: Submitted to 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Deep probabilistic generative models have achieved incredible success in many fields of application. Among such models, variational autoencoders (VAEs) have proved their ability in modeling a generative process by learning a latent representation of the input. In this paper, we propose a novel VAE defined in the quaternion domain, which exploits the properties of quaternion algebra to improve performance while significantly reducing the number of parameters required by the network. The success of the proposed quaternion VAE with respect to traditional VAEs relies on the ability to leverage the internal relations between quaternion-valued input features and on the properties of second-order statistics which allow to define the latent variables in the augmented quaternion domain. In order to show the advantages due to such properties, we define a plain convolutional VAE in the quaternion domain and we evaluate it in comparison with its real-valued counterpart on the CelebA face dataset.

[199]  arXiv:2010.11649 [pdf, other]
Title: Learning to Sort Image Sequences via Accumulated Temporal Differences
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Consider a set of n images of a scene with dynamic objects captured with a static or a handheld camera. Let the temporal order in which these images are captured be unknown. There can be n! possibilities for the temporal order in which these images could have been captured. In this work, we tackle the problem of temporally sequencing the unordered set of images of a dynamic scene captured with a hand-held camera. We propose a convolutional block which captures the spatial information through 2D convolution kernel and captures the temporal information by utilizing the differences present among the feature maps extracted from the input images. We evaluate the performance of the proposed approach on the dataset extracted from a standard action recognition dataset, UCF101. We show that the proposed approach outperforms the state-of-the-art methods by a significant margin. We show that the network generalizes well by evaluating it on a dataset extracted from the DAVIS dataset, a dataset meant for video object segmentation, when the same network was trained with a dataset extracted from UCF101, a dataset meant for action recognition.

[200]  arXiv:2010.11652 [pdf, other]
Title: CoinDICE: Off-Policy Confidence Interval Estimation
Comments: To appear at NeurIPS 2020 as spotlight
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies. Starting from a function space embedding of the linear program formulation of the $Q$-function, we obtain an optimization problem with generalized estimating equation constraints. By applying the generalized empirical likelihood method to the resulting Lagrangian, we propose CoinDICE, a novel and efficient algorithm for computing confidence intervals. Theoretically, we prove the obtained confidence intervals are valid, in both asymptotic and finite-sample regimes. Empirically, we show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.

[201]  arXiv:2010.11653 [pdf, other]
Title: Graph Neural Network for Large-Scale Network Localization
Comments: Submitted in ICASSP 2021, Code available at this https URL
Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)

Graph neural networks (GNNs) are popular to use for classifying structured data in the context of machine learning. But surprisingly, they are rarely applied to regression problems. In this work, we adopt GNN for a classic but challenging nonlinear regression problem, namely the network localization. Our main findings are in order. First, GNN is potentially the best solution to large-scale network localization in terms of accuracy, robustness and computational time. Second, thresholding of the communication range is essential to its superior performance. Simulation results corroborate that the proposed GNN based method outperforms all benchmarks by far. Such inspiring results are further justified theoretically in terms of data aggregation, non-line-of-sight (NLOS) noise removal and lowpass filtering effect, all affected by the threshold for neighbor selection. Code is available at https://github.com/Yanzongzi/GNN-For-localization.

[202]  arXiv:2010.11655 [pdf, other]
Title: Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Comments: Accepted by NeurIPS2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

We study reinforcement learning (RL) for text-based games, which are interactive simulations in the context of natural language. While different methods have been developed to represent the environment information and language actions, existing RL agents are not empowered with any reasoning capabilities to deal with textual games. In this work, we aim to conduct explicit reasoning with knowledge graphs for decision making, so that the actions of an agent are generated and supported by an interpretable inference procedure. We propose a stacked hierarchical attention mechanism to construct an explicit representation of the reasoning process by exploiting the structure of the knowledge graph. We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.

[203]  arXiv:2010.11657 [pdf, other]
Title: The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge
Comments: 5 pages, 2 figures, A report about our diarisation system for VoxCeleb Challenge, Interspeech conference workshop
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

This paper describes the development of our system for the VoxCeleb Speaker Diarisation Challenge 2020. A well trained neural network based speech enhancement model is used for pre-processing and a neural network based voice activity detection (VAD) system is followed to remove background music and noise which are harmful for speaker diarisation system. The following diarisation system is built based on agglomerative hierarchical clustering (AHC) of x-vectors and a variational Bayesian hidden Markov Model (VB-HMM) based iterative clustering. Experimental results demonstrate that the proposed system yields substantial improvements compared with the baseline method for the diarisation task of the VoxCeleb Speaker Recognition Challenge 2020.

[204]  arXiv:2010.11659 [pdf, other]
Title: Neural Network-based Acoustic Vehicle Counting
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

This paper addresses acoustic vehicle counting using one-channel audio. We predict the pass-by instants of vehicles from local minima of a vehicle-to-microphone distance predicted from audio. The distance is predicted via a two-stage (coarse-fine) regression, both realised using neural networks (NNs). Experiments show that the NN-based distance regression outperforms by far the previously proposed support vector regression. The $ 95\% $ confidence interval for the mean of vehicle counting error is within $[0.28\%, -0.55\%]$. Besides the minima-based counting, we propose a deep learning counting which operates on the predicted distance without detecting local minima. Results also show that removing low frequencies in features improves the counting performance.

[205]  arXiv:2010.11660 [pdf, other]
Title: Drift Detection in Episodic Data: Detect When Your Agent Starts Faltering
Subjects: Machine Learning (cs.LG)

Detection of deterioration of agent performance in dynamic environments is challenging due to the non-i.i.d nature of the observed performance. We consider an episodic framework, where the objective is to detect when an agent begins to falter. We devise a hypothesis testing procedure for non-i.i.d rewards, which is optimal under certain conditions. To apply the procedure sequentially in an online manner, we also suggest a novel Bootstrap mechanism for False Alarm Rate control (BFAR). We demonstrate our procedure in problems where the rewards are not independent, nor identically-distributed, nor normally-distributed. The statistical power of the new testing procedure is shown to outperform alternative tests - often by orders of magnitude - for a variety of environment modifications (which cause deterioration in agent performance). Our detection method is entirely external to the agent, and in particular does not require model-based learning. Furthermore, it can be applied to detect changes or drifts in any episodic signal.

[206]  arXiv:2010.11661 [pdf, other]
Title: Efficient Generalized Spherical CNNs
Comments: 18 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)

Many problems across computer vision and the natural sciences require the analysis of spherical data, for which representations may be learned efficiently by encoding equivariance to rotational symmetries. We present a generalized spherical CNN framework that encompasses various existing approaches and allows them to be leveraged alongside each other. The only existing non-linear spherical CNN layer that is strictly equivariant has complexity $\mathcal{O}(C^2L^5)$, where $C$ is a measure of representational capacity and $L$ the spherical harmonic bandlimit. Such a high computational cost often prohibits the use of strictly equivariant spherical CNNs. We develop two new strictly equivariant layers with reduced complexity $\mathcal{O}(CL^4)$ and $\mathcal{O}(CL^3 \log L)$, making larger, more expressive models computationally feasible. Moreover, we adopt efficient sampling theory to achieve further computational savings. We show that these developments allow the construction of more expressive hybrid models that achieve state-of-the-art accuracy and parameter efficiency on spherical benchmark problems.

[207]  arXiv:2010.11663 [pdf, other]
Title: Symbolic Self-triggered Control of Continuous-time Non-deterministic Systems without Stability Assumptions for 2-LTL Specifications
Comments: 16th International Conference on Control, Automation, Robotics and Vision (ICARCV 2020)
Subjects: Systems and Control (eess.SY)

We propose a symbolic self-triggered controller synthesis procedure for non-deterministic continuous-time nonlinear systems without stability assumptions. The goal is to compute a controller that satisfies two objectives. The first objective is represented as a specification in a fragment of LTL, which we call 2-LTL. The second one is an energy objective, in the sense that control inputs are issued only when necessary, which saves energy. To this end, we first quantise the state and input spaces, and then translate the controller synthesis problem to the computation of a winning strategy in a mean-payoff parity game. We illustrate the feasibility of our method on the example of a navigating nonholonomic robot.

[208]  arXiv:2010.11666 [pdf, ps, other]
Title: Reducing Unintended Identity Bias in Russian Hate Speech Detection
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Toxicity has become a grave problem for many online communities and has been growing across many languages, including Russian. Hate speech creates an environment of intimidation, discrimination, and may even incite some real-world violence. Both researchers and social platforms have been focused on developing models to detect toxicity in online communication for a while now. A common problem of these models is the presence of bias towards some words (e.g. woman, black, jew) that are not toxic, but serve as triggers for the classifier due to model caveats. In this paper, we describe our efforts towards classifying hate speech in Russian, and propose simple techniques of reducing unintended bias, such as generating training data with language models using terms and words related to protected identities as context and applying word dropout to such words.

[209]  arXiv:2010.11671 [pdf, other]
Title: Motion Planning Combines Psychological Safety and Motion Prediction for a Sense Motive Robot
Subjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Human safety is the most important demand for human robot interaction and collaboration (HRIC), which not only refers to physical safety, but also includes psychological safety. Although many robots with different configurations have entered our living and working environments, the human safety problem is still an ongoing research problem in human-robot coexistence scenarios. This paper addresses the human safety issue by covering both the physical safety and psychological safety aspects. First, we introduce an adaptive robot velocity control and step size adjustment method according to human facial expressions, such that the robot can adjust its movement to keep safety when the human emotion is unusual. Second, we predict the human motion by detecting the suddenly changes of human head pose and gaze direction, such that the robot can infer whether the human attention is distracted, predict the next move of human and rebuild a repulsive force to avoid potential collision. Finally, we demonstrate our idea on a 7 DOF TIAGo robot in the 3D Gazebo environment, which shows that the robot becomes sense motive, and responds to human action and emotion changes quickly and efficiently.

[210]  arXiv:2010.11672 [pdf, other]
Title: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion
Comments: Accepted to Interspeech 2020. Project page: this http URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram. Audio samples are available at this http URL

[211]  arXiv:2010.11675 [pdf, other]
Title: Optimization-Based Visual-Inertial SLAM Tightly Coupled with Raw GNSS Measurements
Comments: 7 pages, 7 figures, submitted to IEEE Robotics and Automation Letters with ICRA 2021 Option. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Robotics (cs.RO)

Fusing vision, Inertial Measurement Unit (IMU) and Global Navigation Satellite System (GNSS) information is a promising solution for accurate global positioning in complex urban scenes, because of the complementarity of the different sensors. Unlike the loose coupling approaches and the EKF-based approaches in the literature, we propose an optimization-based visual-inertial SLAM tightly coupled with raw GNSS measurements, including pseudoranges and Doppler shift, which is the first of such approaches to our knowledge. Reprojection error, IMU pre-integration error and raw GNSS measurement error are jointly optimized using bundle adjustment in a sliding window, and the asynchronism between images and raw GNSS measurements is considered. Marginalization is performed in the sliding window, and some methods dealing with noisy measurements and vulnerable situations are employed. Experimental results on public dataset in complex urban scenes prove that our proposed approach outperforms state-of-the-art visual-inertial SLAM, GNSS single point positioning, as well as a loose coupling approach, both in the scenes that mainly contain low-rise buildings and the scenes that contain urban canyons.

[212]  arXiv:2010.11676 [pdf, other]
Title: Input-Shaping for Feed-Forward Control of Cable-Driven Parallel Robots
Authors: Sana Baklouti (RoMas, LS2N), Eric Courteille (LGCGM), Philippe Lemoine (LS2N, ECN), Centrale Nantes, Stéphane Caro (LS2N, CNRS, RoMas)
Comments: Journal of Dynamic Systems, Measurement, and Control, American Society of Mechanical Engineers, 2020
Subjects: Robotics (cs.RO); Systems and Control (eess.SY); Classical Physics (physics.class-ph)

This paper deals with the use of input-shaping filters in conjunction with a feed-forward control of Cable-Driven Parallel Robots (CDPRs), while integrating cable tension calculation to satisfy positive cable tensions along the prescribed trajectory of the moving-platform. This method aims to attenuate the oscillatory motions of the moving-platform. Thus, the input signal is modified to make it self-cancel residual vibrations. 5 The effectiveness, in terms of moving-platform oscillation attenuation, of the proposed closed-loop control method combined with shaping inputs is experimentally studied on a suspended and non-redundant CDPR prototype. This confirms residual vibration reduction improvement with respect to the unshaped control in terms of Peak-to-Peak amplitude of velocity error, which can achieve 72 % while using input-shaping filters.

[213]  arXiv:2010.11677 [pdf, other]
Title: Second layer data governance for permissioned blockchains: the privacy management challenge
Subjects: Computers and Society (cs.CY)

Data privacy is a trending topic in the internet era. Given such importance, many challenges emerged in order to collect, manage, process, and publish data. In this sense, personal data have got attention, and many regulations emerged, such as GDPR in the European Union and LGPD in Brazil. This regulation model aims to protect users' data from misusage and leakage and allow users to request an explanation from companies when needed. In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data between different organizations is/ was crucial to develop a significant movement to avoid the massive infection and decrease the number of deaths. However, the data subject, i.e., the users, should have the right to request the purpose of data use, anonymization, and data deletion. In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts. The governance model discussed in blockchain applications is usually regarding the first layer governance, i.e., public and permissioned models. However, this discussion is too superficial, and they do not cover compliance with the data regulations. Therefore, in order to organize the relationship between data owners and the stakeholders, i.e., companies and governmental entities, we developed a second layer data governance model for permissioned blockchains based on the Governance Analytical Framework principles applied in pandemic situations preserving the users' privacy and their duties. From the law perspective, we based our model on the UE GDPR in regard to data privacy concerns.

[214]  arXiv:2010.11679 [pdf, ps, other]
Title: DPAttack: Diffused Patch Attacks against Universal Object Detection
Comments: 4 pages, 2 figures, CIKM Workshop
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Recently, deep neural networks (DNNs) have been widely and successfully used in Object Detection, e.g. Faster RCNN, YOLO, CenterNet. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. Adversarial attacks against object detection can be divided into two categories, whole-pixel attacks and patch attacks. While these attacks add perturbations to a large number of pixels in images, we proposed a diffused patch attack (\textbf{DPAttack}) to successfully fool object detectors by diffused patches of asteroid-shaped or grid-shape, which only change a small number of pixels. Experiments show that our DPAttack can successfully fool most object detectors with diffused patches and we get the second place in the Alibaba Tianchi competition: Alibaba-Tsinghua Adversarial Challenge on Object Detection. Our code can be obtained from https://github.com/Wu-Shudeng/DPAttack.

[215]  arXiv:2010.11681 [pdf, other]
Title: Learning Panoptic Segmentation from Instance Contours
Comments: Overview Video: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Panoptic Segmentation aims to provide an understanding of background (stuff) and instances of objects (things) at a pixel level. It combines the separate tasks of semantic segmentation (pixel-level classification) and instance segmentation to build a single unified scene understanding task. Typically, panoptic segmentation is derived by combining semantic and instance segmentation tasks that are learned separately or jointly (multi-task networks). In general, instance segmentation networks are built by adding a foreground mask estimation layer on top of object detectors or using instance clustering methods that assign a pixel to an instance center. In this work, we present a fully convolution neural network that learns instance segmentation from semantic segmentation and instance contours (boundaries of things). Instance contours along with semantic segmentation yield a boundary-aware semantic segmentation of things. Connected component labeling on these results produces instance segmentation. We merge semantic and instance segmentation results to output panoptic segmentation. We evaluate our proposed method on the CityScapes dataset to demonstrate qualitative and quantitative performances along with several ablation studies.

[216]  arXiv:2010.11683 [pdf, ps, other]
Title: An Analysis of Simple Data Augmentation for Named Entity Recognition
Authors: Xiang Dai, Heike Adel
Comments: COLING 2020
Subjects: Computation and Language (cs.CL)

Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

[217]  arXiv:2010.11684 [pdf, other]
Title: Disentangling Action Sequences: Discovering Correlated Samples
Authors: Jiantao Wu, Lin Wang
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Disentanglement is a highly desirable property of representation due to its similarity with human's understanding and reasoning. This improves interpretability, enables the performance of down-stream tasks, and enables controllable generative models. However, this domain is challenged by the abstract notion and incomplete theories to support unsupervised disentanglement learning. We demonstrate the data itself, such as the orientation of images, plays a crucial role in disentanglement and instead of the factors, and the disentangled representations align the latent variables with the action sequences. We further introduce the concept of disentangling action sequences which facilitates the description of the behaviours of the existing disentangling approaches. An analogy for this process is to discover the commonality between the things and categorizing them. Furthermore, we analyze the inductive biases on the data and find that the latent information thresholds are correlated with the significance of the actions. For the supervised and unsupervised settings, we respectively introduce two methods to measure the thresholds. We further propose a novel framework, fractional variational autoencoder (FVAE), to disentangle the action sequences with different significance step-by-step. Experimental results on dSprites and 3D Chairs show that FVAE improves the stability of disentanglement.

[218]  arXiv:2010.11685 [pdf, other]
Title: DocStruct: A Multimodal Method to Extract Hierarchy Structure in Document for General Form Understanding
Comments: Accepted to EMNLP 2020 Findings
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Form understanding depends on both textual contents and organizational structure. Although modern OCR performs well, it is still challenging to realize general form understanding because forms are commonly used and of various formats. The table detection and handcrafted features in previous works cannot apply to all forms because of their requirements on formats. Therefore, we concentrate on the most elementary components, the key-value pairs, and adopt multimodal methods to extract features. We consider the form structure as a tree-like or graph-like hierarchy of text fragments. The parent-child relation corresponds to the key-value pairs in forms. We utilize the state-of-the-art models and design targeted extraction modules to extract multimodal features from semantic contents, layout information, and visual images. A hybrid fusion method of concatenation and feature shifting is designed to fuse the heterogeneous features and provide an informative joint representation. We adopt an asymmetric algorithm and negative sampling in our model as well. We validate our method on two benchmarks, MedForm and FUNSD, and extensive experiments demonstrate the effectiveness of our method.

[219]  arXiv:2010.11686 [pdf]
Title: A Very Compact Embedded CNN Processor Design Based on Logarithmic Computing
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Computer Vision and Pattern Recognition (cs.CV)

In this paper, we propose a very compact embedded CNN processor design based on a modified logarithmic computing method using very low bit-width representation. Our high-quality CNN processor can easily fit into edge devices. For Yolov2, our processing circuit takes only 0.15 mm2 using TSMC 40 nm cell library. The key idea is to constrain the activation and weight values of all layers uniformly to be within the range [-1, 1] and produce low bit-width logarithmic representation. With the uniform representations, we devise a unified, reusable CNN computing kernel and significantly reduce computing resources. The proposed approach has been extensively evaluated on many popular image classification CNN models (AlexNet, VGG16, and ResNet-18/34) and object detection models (Yolov2). The hardware-implemented results show that our design consumes only minimal computing and storage resources, yet attains very high accuracy. The design is thoroughly verified on FPGAs, and the SoC integration is underway with promising results. With extremely efficient resource and energy usage, our design is excellent for edge computing purposes.

[220]  arXiv:2010.11689 [pdf, other]
Title: Cross-Spectral Iris Matching Using Conditional Coupled GAN
Comments: International Joint Conference on Biometrics (IJCB-2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Cross-spectral iris recognition is emerging as a promising biometric approach to authenticating the identity of individuals. However, matching iris images acquired at different spectral bands shows significant performance degradation when compared to single-band near-infrared (NIR) matching due to the spectral gap between iris images obtained in the NIR and visual-light (VIS) spectra. Although researchers have recently focused on deep-learning-based approaches to recover invariant representative features for more accurate recognition performance, the existing methods cannot achieve the expected accuracy required for commercial applications. Hence, in this paper, we propose a conditional coupled generative adversarial network (CpGAN) architecture for cross-spectral iris recognition by projecting the VIS and NIR iris images into a low-dimensional embedding domain to explore the hidden relationship between them. The conditional CpGAN framework consists of a pair of GAN-based networks, one responsible for retrieving images in the visible domain and other responsible for retrieving images in the NIR domain. Both networks try to map the data into a common embedding subspace to ensure maximum pair-wise similarity between the feature vectors from the two iris modalities of the same subject. To prove the usefulness of our proposed approach, extensive experimental results obtained on the PolyU dataset are compared to existing state-of-the-art cross-spectral recognition methods.

[221]  arXiv:2010.11691 [pdf]
Title: Tackling problems of marker-based augmented reality under water
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)

Underwater sites are a harsh environment for augmented reality applications. Obstacles that must be battled include poor visibility conditions, difficult navigation, and hard manipulation with devices under water. This chapter focuses on the problem of localizing a device under water using markers. It discusses various filters that enhance and improve images recorded under water, and their impact on marker-based tracking. It presents various combinations of 10 image improving algorithms and 4 marker detecting algorithms, and tests their performance in real situations. All solutions are designed to run real-time on mobile devices to provide a solid basis for augmented reality. Usability of this solution is evaluated on locations in Mediterranean Sea. It is shown that image improving algorithms with carefully chosen parameters can reduce the problems with visibility under water and improve the detection of markers. The best results are obtained with marker detecting algorithms that are specifically designed for underwater environments.

[222]  arXiv:2010.11692 [pdf, other]
Title: Conversion and Implementation of State-of-the-Art Deep Learning Algorithms for the Classification of Diabetic Retinopathy
Comments: Pre-print version (in-review)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Diabetic retinopathy (DR) is a retinal microvascular condition that emerges in diabetic patients. DR will continue to be a leading cause of blindness worldwide, with a predicted 191.0 million globally diagnosed patients in 2030. Microaneurysms, hemorrhages, exudates, and cotton wool spots are common signs of DR. However, they can be small and hard for human eyes to detect. Early detection of DR is crucial for effective clinical treatment. Existing methods to classify images require much time for feature extraction and selection, and are limited in their performance. Convolutional Neural Networks (CNNs), as an emerging deep learning (DL) method, have proven their potential in image classification tasks. In this paper, comprehensive experimental studies of implementing state-of-the-art CNNs for the detection and classification of DR are conducted in order to determine the top performing classifiers for the task. Five CNN classifiers, namely Inception-V3, VGG19, VGG16, ResNet50, and InceptionResNetV2, are evaluated through experiments. They categorize medical images into five different classes based on DR severity. Data augmentation and transfer learning techniques are applied since annotated medical images are limited and imbalanced. Experimental results indicate that the ResNet50 classifier has top performance for binary classification and that the InceptionResNetV2 classifier has top performance for multi-class DR classification.

[223]  arXiv:2010.11694 [pdf, other]
Title: Unsupervised Representation Learning by InvariancePropagation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Unsupervised learning methods based on contrastive learning have drawn increasing attention and achieved promising results. Most of them aim to learn representations invariant to instance-level variations, which are provided by different views of the same instance. In this paper, we propose Invariance Propagation to focus on learning representations invariant to category-level variations, which are provided by different instances from the same category. Our method recursively discovers semantically consistent samples residing in the same high-density regions in representation space. We demonstrate a hard sampling strategy to concentrate on maximizing the agreement between the anchor sample and its hard positive samples, which provide more intra-class variations to help capture more abstract invariance. As a result, with a ResNet-50 as the backbone, our method achieves 71.3% top-1 accuracy on ImageNet linear classification and 78.2% top-5 accuracy fine-tuning on only 1% labels, surpassing previous results. We also achieve state-of-the-art performance on other downstream tasks, including linear classification on Places205 and Pascal VOC, and transfer learning on small scale datasets.

[224]  arXiv:2010.11696 [pdf, other]
Title: BlendTorch: A Real-Time, Adaptive Domain Randomization Library
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Solving complex computer vision tasks by deep learning techniques relies on large amounts of (supervised) image data, typically unavailable in industrial environments. The lack of training data starts to impede the successful transfer of state-of-the-art methods in computer vision to industrial applications. We introduce BlendTorch, an adaptive Domain Randomization (DR) library, to help creating infinite streams of synthetic training data. BlendTorch generates data by massively randomizing low-fidelity simulations and takes care of distributing artificial training data for model learning in real-time. We show that models trained with BlendTorch repeatedly perform better in an industrial object detection task than those trained on real or photo-realistic datasets.

[225]  arXiv:2010.11697 [pdf, other]
Title: A Data Set and a Convolutional Model for Iconography Classification in Paintings
Comments: Submitted for review at ACM Journal on Computing and Cultural Heritage (JOCCH)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Iconography in art is the discipline that studies the visual content of artworks to determine their motifs and themes andto characterize the way these are represented. It is a subject of active research for a variety of purposes, including the interpretation of meaning, the investigation of the origin and diffusion in time and space of representations, and the study of influences across artists and art works. With the proliferation of digital archives of art images, the possibility arises of applying Computer Vision techniques to the analysis of art images at an unprecedented scale, which may support iconography research and education. In this paper we introduce a novel paintings data set for iconography classification and present the quantitativeand qualitative results of applying a Convolutional Neural Network (CNN) classifier to the recognition of the iconography of artworks. The proposed classifier achieves good performances (71.17% Precision, 70.89% Recall, 70.25% F1-Score and 72.73% Average Precision) in the task of identifying saints in Christian religious paintings, a task made difficult by the presence of classes with very similar visual features. Qualitative analysis of the results shows that the CNN focuses on the traditional iconic motifs that characterize the representation of each saint and exploits such hints to attain correct identification. The ultimate goal of our work is to enable the automatic extraction, decomposition, and comparison of iconography elements to support iconographic studies and automatic art work annotation.

[226]  arXiv:2010.11699 [pdf, other]
Title: Generative Model-Enhanced Human Motion Prediction
Comments: 8 pages + 5 pages supplementary materials, under review at ICLR
Subjects: Computer Vision and Pattern Recognition (cs.CV)

The task of predicting human motion is complicated by the natural heterogeneity and compositionality of actions, necessitating robustness to distributional shifts as far as out-of-distribution (OoD). Here we formulate a new OoD benchmark based on the Human3.6M and CMU motion capture datasets, and introduce a hybrid framework for hardening discriminative architectures to OoD failure by augmenting them with a generative model. When applied to current state-of-the-art discriminative models, we show that the proposed approach improves OoD robustness without sacrificing in-distribution performance, and can facilitate model interpretability. We suggest human motion predictors ought to be constructed with OoD challenges in mind, and provide an extensible general framework for hardening diverse discriminative architectures to extreme distributional shift. The code is available at https://github.com/bouracha/OoDMotion.

[227]  arXiv:2010.11700 [pdf, other]
Title: On Benchmarking Iris Recognition within a Head-mounted Display for AR/VR Application
Comments: Accepted at International Join Conference on Biometrics (IJCB 2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Augmented and virtual reality is being deployed in different fields of applications. Such applications might involve accessing or processing critical and sensitive information, which requires strict and continuous access control. Given that Head-Mounted Displays (HMD) developed for such applications commonly contains internal cameras for gaze tracking purposes, we evaluate the suitability of such setup for verifying the users through iris recognition. In this work, we first evaluate a set of iris recognition algorithms suitable for HMD devices by investigating three well-established handcrafted feature extraction approaches, and to complement it, we also present the analysis using four deep learning models. While taking into consideration the minimalistic hardware requirements of stand-alone HMD, we employ and adapt a recently developed miniature segmentation model (EyeMMS) for segmenting the iris. Further, to account for non-ideal and non-collaborative capture of iris, we define a new iris quality metric that we termed as Iris Mask Ratio (IMR) to quantify the iris recognition performance. Motivated by the performance of iris recognition, we also propose the continuous authentication of users in a non-collaborative capture setting in HMD. Through the experiments on a publicly available OpenEDS dataset, we show that performance with EER = 5% can be achieved using deep learning methods in a general setting, along with high accuracy for continuous user authentication.

[228]  arXiv:2010.11701 [pdf, other]
Title: Spatial Attention as an Interface for Image Captioning Models
Authors: Philipp Sadler
Comments: A thesis submitted in fulfillment of the requirements for the degree Master of Science in Cognitive Systems
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)

The internal workings of modern deep learning models stay often unclear to an external observer, although spatial attention mechanisms are involved. The idea of this work is to translate these spatial attentions into natural language to provide a simpler access to the model's function. Thus, I took a neural image captioning model and measured the reactions to external modification in its spatial attention for three different interface methods: a fixation over the whole generation process, a fixation for the first time-steps and an addition to the generator's attention. The experimental results for bounding box based spatial attention vectors have shown that the captioning model reacts to method dependent changes in up to 52.65% and includes in 9.00% of the cases object categories, which were otherwise unmentioned. Afterwards, I established such a link to a hierarchical co-attention network for visual question answering by extraction of its word, phrase and question level spatial attentions. Here, generated captions for the word level included details of the question-answer pairs in up to 55.20% of the cases. This work indicates that spatial attention seen as an external interface for image caption generators is an useful method to access visual functions in natural language.

[229]  arXiv:2010.11702 [pdf, other]
Title: MLOD: Awareness of Extrinsic Perturbation in Multi-LiDAR 3D Object Detection for Autonomous Driving
Comments: 8 pages, 6 figures
Journal-ref: IROS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Extrinsic perturbation always exists in multiple sensors. In this paper, we focus on the extrinsic uncertainty in multi-LiDAR systems for 3D object detection. We first analyze the influence of extrinsic perturbation on geometric tasks with two basic examples. To minimize the detrimental effect of extrinsic perturbation, we propagate an uncertainty prior on each point of input point clouds, and use this information to boost an approach for 3D geometric tasks. Then we extend our findings to propose a multi-LiDAR 3D object detector called MLOD. MLOD is a two-stage network where the multi-LiDAR information is fused through various schemes in stage one, and the extrinsic perturbation is handled in stage two. We conduct extensive experiments on a real-world dataset, and demonstrate both the accuracy and robustness improvement of MLOD. The code, data and supplementary materials are available at: https://ram-lab.com/file/site/mlod

[230]  arXiv:2010.11703 [pdf, other]
Title: Fast and Incremental Loop Closure Detection with Deep Features and Proximity Graphs
Comments: submitted to Transactions on Robotics
Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

In recent years, methods concerning the place recognition task have been extensively examined from the robotics community within the scope of simultaneous localization and mapping applications. In this article, an appearance-based loop closure detection pipeline is proposed, entitled "FILD++" (Fast and Incremental Loop closure Detection). When the incoming camera observation arrives, global and local visual features are extracted through two passes of a single convolutional neural network. Subsequently, a modified hierarchical-navigable small-world graph incrementally generates a visual database that represents the robot's traversed path based on global features. Given the query sensor measurement, similar locations from the trajectory are retrieved using these representations, while an image-to-image pairing is further evaluated thanks to the spatial information provided by the local features. Exhaustive experiments on several publicly-available datasets exhibit the system's high performance and low execution time compared to other contemporary state-of-the-art pipelines.

[231]  arXiv:2010.11704 [pdf, other]
Title: Using Conditional Generative Adversarial Networks to Reduce the Effects of Latency in Robotic Telesurgery
Comments: 6 pages with 5 figures and 1 table. J Robotic Surg (2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO); Image and Video Processing (eess.IV)

The introduction of surgical robots brought about advancements in surgical procedures. The applications of remote telesurgery range from building medical clinics in underprivileged areas, to placing robots abroad in military hot-spots where accessibility and diversity of medical experience may be limited. Poor wireless connectivity may result in a prolonged delay, referred to as latency, between a surgeon's input and action a robot takes. In surgery, any micro-delay can injure a patient severely and in some cases, result in fatality. One was to increase safety is to mitigate the effects of latency using deep learning aided computer vision. While the current surgical robots use calibrated sensors to measure the position of the arms and tools, in this work we present a purely optical approach that provides a measurement of the tool position in relation to the patient's tissues. This research aimed to produce a neural network that allowed a robot to detect its own mechanical manipulator arms. A conditional generative adversarial networks (cGAN) was trained on 1107 frames of mock gastrointestinal robotic surgery data from the 2015 EndoVis Instrument Challenge and corresponding hand-drawn labels for each frame. When run on new testing data, the network generated near-perfect labels of the input images which were visually consistent with the hand-drawn labels and was able to do this in 299 milliseconds. These accurately generated labels can then be used as simplified identifiers for the robot to track its own controlled tools. These results show potential for conditional GANs as a reaction mechanism such that the robot can detect when its arms move outside the operating area within a patient. This system allows for more accurate monitoring of the position of surgical instruments in relation to the patient's tissue, increasing safety measures that are integral to successful telesurgery systems.

[232]  arXiv:2010.11706 [pdf, ps, other]
Title: Approximating the Minimal Lookahead Needed to Win Infinite Games
Subjects: Formal Languages and Automata Theory (cs.FL); Computer Science and Game Theory (cs.GT)

We present an exponential-time algorithm approximating the minimal lookahead necessary to win an $\omega$-regular delay game.

[233]  arXiv:2010.11708 [pdf, ps, other]
Title: Context-aware surrogate modeling for balancing approximation and sampling costs in multi-fidelity importance sampling and Bayesian inverse problems
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)

Multi-fidelity methods leverage low-cost surrogate models to speed up computations and make occasional recourse to expensive high-fidelity models to establish accuracy guarantees. Because surrogate and high-fidelity models are used together, poor predictions by the surrogate models can be compensated with frequent recourse to high-fidelity models. Thus, there is a trade-off between investing computational resources to improve surrogate models and the frequency of making recourse to expensive high-fidelity models; however, this trade-off is ignored by traditional modeling methods that construct surrogate models that are meant to replace high-fidelity models rather than being used together with high-fidelity models. This work considers multi-fidelity importance sampling and theoretically and computationally derives the optimal trade-off between improving the fidelity of surrogate models for constructing more accurate biasing densities and the number of samples that is required from the high-fidelity model to compensate poor biasing densities. Numerical examples demonstrate that such optimal---context-aware---surrogate models for multi-fidelity importance sampling have lower fidelity than what typically is set as tolerance in traditional model reduction, leading to runtime speedups of up to one order of magnitude in the presented examples.

[234]  arXiv:2010.11711 [pdf, other]
Title: Multi-view Graph Contrastive Representation Learning for Drug-Drug Interaction Prediction
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Potential Drug-Drug Interaction(DDI) occurring while treating complex or co-existing diseases with drug combinations may cause changes in drugs' pharmacological activity. Therefore, DDI prediction has been an important task in the medical healthy machine learning community. Graph-based learning methods have recently aroused widespread interest and are proved to be a priority for this task. However, these methods are often limited to exploiting the inter-view drug molecular structure and ignoring the drug's intra-view interaction relationship, vital to capturing the complex DDI patterns. This study presents a new method, multi-view graph contrastive representation learning for drug-drug interaction prediction, MIRACLE for brevity, to capture inter-view molecule structure and intra-view interactions between molecules simultaneously. MIRACLE treats a DDI network as a multi-view graph where each node in the interaction graph itself is a drug molecular graph instance. We use GCN to encode DDI relationships and a bond-aware attentive message propagating method to capture drug molecular structure information in the MIRACLE learning stage. Also, we propose a novel unsupervised contrastive learning component to balance and integrate the multi-view information. Comprehensive experiments on multiple real datasets show that MIRACLE outperforms the state-of-the-art DDI prediction models consistently.

[235]  arXiv:2010.11712 [pdf, other]
Title: Trajectory Tracking for Robotic Arms with Input Saturation and Only Position Measurements
Comments: 16 pages, 5 figures. It will be submitted to the European Control Conference 2021
Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

In this work, we propose a passivity-based control approach that addresses the trajectory tracking problem for a class of mechanical systems that comprises a broad range of robotic arms. The resulting controllers can be naturally saturated and do not require velocity measurements. Moreover, the proposed methodology does not require the implementation of observers, and the structure of the closed-loop system permits the identification of a Lyapunov function, which eases the convergence analysis. To corroborate the effectiveness of the methodology, we perform experiments with the Philips Experimental Robot Arm.

[236]  arXiv:2010.11713 [pdf, other]
Title: Joint Power Allocation and User Association Optimization for IRS-Assisted mmWave Systems
Comments: 30 pages, 9 figures
Subjects: Information Theory (cs.IT); Systems and Control (eess.SY)

Intelligent reflecting surface (IRS) is a potential technology to build programmable wireless environment in future communication systems. In this paper, we consider an IRS-assisted multi-base station (multi-BS) multi-user millimeter wave (mmWave) downlink communication system, exploiting IRS to extend mmWave signal coverage to blind spots. Considering the impact of IRS on user association in multi-BS mmWave systems, we formulate a sum rate maximization problem by jointly optimizing passive beamforming at IRS, power allocation and user association. This leads to an intractable nonconvex problem, for which to tackle we propose a computationally affordable iterative algorithm, capitalizing on alternating optimization, sequential fractional programming (SFP) and forward-reverse auction (FRA). In particular, passive beamforming at IRS is optimized by utilizing the SFP method, power allocation is solved through means of standard convex optimization method, and user association is handled by the network optimization based FRA algorithm. Simulation results demonstrate that the proposed algorithm can achieve significant performance gains, e.g., it can provide up to 175% higher sum rate compared with the benchmark and 140% higher energy efficiency compared with amplifyand-forward relay.

[237]  arXiv:2010.11714 [pdf, other]
Title: Restoring Negative Information in Few-Shot Object Detection
Comments: To appear in NeurIPS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Few-shot learning has recently emerged as a new challenge in the deep learning field: unlike conventional methods that train the deep neural networks (DNNs) with a large number of labeled data, it asks for the generalization of DNNs on new classes with few annotated samples. Recent advances in few-shot learning mainly focus on image classification while in this paper we focus on object detection. The initial explorations in few-shot object detection tend to simulate a classification scenario by using the positive proposals in images with respect to certain object class while discarding the negative proposals of that class. Negatives, especially hard negatives, however, are essential to the embedding space learning in few-shot object detection. In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives. We build our work on a recent few-shot pipeline RepMet with several new modules to encode negative information for both training and testing. Extensive experiments on ImageNet-LOC and PASCAL VOC show our method substantially improves the state-of-the-art few-shot object detection solutions. Our code is available at https://github.com/yang-yk/NP-RepMet.

[238]  arXiv:2010.11716 [pdf, other]
Title: Robust Audio-Based Vehicle Counting in Low-to-Moderate Traffic Flow
Comments: The paper has been accepted for the IV2020 conference
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

The paper presents a method for audio-based vehicle counting (VC) in low-to-moderate traffic using one-channel sound. We formulate VC as a regression problem, i.e., we predict the distance between a vehicle and the microphone. Minima of the proposed distance function correspond to vehicles passing by the microphone. VC is carried out via local minima detection in the predicted distance. We propose to set the minima detection threshold at a point where the probabilities of false positives and false negatives coincide so they statistically cancel each other in total vehicle number. The method is trained and tested on a traffic-monitoring dataset comprising $422$ short, $20$-second one-channel sound files with a total of $ 1421 $ vehicles passing by the microphone. Relative VC error in a traffic location not used in the training is below $ 2 \%$ within a wide range of detection threshold values. Experimental results show that the regression accuracy in noisy environments is improved by introducing a novel high-frequency power feature.

[239]  arXiv:2010.11719 [pdf, other]
Title: Conformance Checking for a Medical Training Process Using Petri net Simulation and Sequence Alignment
Subjects: Artificial Intelligence (cs.AI)

Process Mining has recently gained popularity in healthcare due to its potential to provide a transparent, objective and data-based view on processes. Conformance checking is a sub-discipline of process mining that has the potential to answer how the actual process executions deviate from existing guidelines. In this work, we analyze a medical training process for a surgical procedure. Ten students were trained to install a Central Venous Catheters (CVC) with ultrasound. Event log data was collected directly after instruction by the supervisors during a first test run and additionally after a subsequent individual training phase. In order to provide objective performance measures, we formulate an optimal, global sequence alignment problem inspired by approaches in bioinformatics. Therefore, we use the Petri net model representation of the medical process guideline to simulate a representative set of guideline conform sequences. Next, we calculate the optimal, global sequence alignment of the recorded and simulated event logs. Finally, the output measures and visualization of aligned sequences are provided for objective feedback.

[240]  arXiv:2010.11720 [pdf, ps, other]
Title: A study of the Multicriteria decision analysis based on the time-series features and a TOPSIS method proposal for a tensorial approach
Subjects: Artificial Intelligence (cs.AI)

A number of Multiple Criteria Decision Analysis (MCDA) methods have been developed to rank alternatives based on several decision criteria. Usually, MCDA methods deal with the criteria value at the time the decision is made without considering their evolution over time. However, it may be relevant to consider the criteria' time series since providing essential information for decision-making (e.g., an improvement of the criteria). To deal with this issue, we propose a new approach to rank the alternatives based on the criteria time-series features (tendency, variance, etc.). In this novel approach, the data is structured in three dimensions, which require a more complex data structure, as the \textit{tensors}, instead of the classical matrix representation used in MCDA. Consequently, we propose an extension for the TOPSIS method to handle a tensor rather than a matrix. Computational results reveal that it is possible to rank the alternatives from a new perspective by considering meaningful decision-making information.

[241]  arXiv:2010.11721 [pdf, other]
Title: Multifaceted Context Representation using Dual Attention for Ontology Alignment
Comments: 8 pages
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB); Machine Learning (cs.LG)

Ontology Alignment is an important research problem that finds application in various fields such as data integration, data transfer, data preparation etc. State-of-the-art (SOTA) architectures in Ontology Alignment typically use naive domain-dependent approaches with handcrafted rules and manually assigned values, making them unscalable and inefficient. Deep Learning approaches for ontology alignment use domain-specific architectures that are not only in-extensible to other datasets and domains, but also typically perform worse than rule-based approaches due to various limitations including over-fitting of models, sparsity of datasets etc. In this work, we propose VeeAlign, a Deep Learning based model that uses a dual-attention mechanism to compute the contextualized representation of a concept in order to learn alignments. By doing so, not only does our approach exploit both syntactic and semantic structure of ontologies, it is also, by design, flexible and scalable to different domains with minimal effort. We validate our approach on various datasets from different domains and in multilingual settings, and show its superior performance over SOTA methods.

[242]  arXiv:2010.11722 [pdf]
Title: Prediction-Based GNSS Spoofing Attack Detection for Autonomous Vehicles
Comments: 16 pages, 9 figures, paper accepted for the presentation at the Transportation Research Board 100th Annual Meeting
Subjects: Robotics (cs.RO); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

Global Navigation Satellite System (GNSS) provides Positioning, Navigation, and Timing (PNT) services for autonomous vehicles (AVs) using satellites and radio communications. Due to the lack of encryption, open-access of the coarse acquisition (C/A) codes, and low strength of the signal, GNSS is vulnerable to spoofing attacks compromising the navigational capability of the AV. A spoofed attack is difficult to detect as a spoofer (attacker who performs spoofing attack) can mimic the GNSS signal and transmit inaccurate location coordinates to an AV. In this study, we have developed a prediction-based spoofing attack detection strategy using the long short-term memory (LSTM) model, a recurrent neural network model. The LSTM model is used to predict the distance traveled between two consecutive locations of an autonomous vehicle. In order to develop the LSTM prediction model, we have used a publicly available real-world comma2k19 driving dataset. The training dataset contains different features (i.e., acceleration, steering wheel angle, speed, and distance traveled between two consecutive locations) extracted from the controlled area network (CAN), GNSS, and inertial measurement unit (IMU) sensors of AVs. Based on the predicted distance traveled between the current location and the immediate future location of an autonomous vehicle, a threshold value is established using the positioning error of the GNSS device and prediction error (i.e., maximum absolute error) related to distance traveled between the current location and the immediate future location. Our analysis revealed that the prediction-based spoofed attack detection strategy can successfully detect the attack in real-time.

[243]  arXiv:2010.11723 [pdf, other]
Title: Learning from Suboptimal Demonstration via Self-Supervised Reward Regression
Subjects: Robotics (cs.RO); Machine Learning (cs.LG)

Learning from Demonstration (LfD) seeks to democratize robotics by enabling non-roboticist end-users to teach robots to perform a task by providing a human demonstration. However, modern LfD techniques, such as inverse reinforcement learning (IRL), assume users provide at least stochastically optimal demonstrations. This assumption fails to hold in all but the most isolated, controlled scenarios, reducing the ability to achieve the goal of empowering real end-users. Recent attempts to learn from sub-optimal demonstration leverage pairwise rankings through Preference-based Reinforcement Learning (PbRL) to infer a more optimal policy than the demonstration. However, we show that these approaches make incorrect assumptions and, consequently, suffer from brittle, degraded performance. In this paper, we overcome the limitations of prior work by developing a novel computational technique that infers an idealized reward function from suboptimal demonstration and bootstraps suboptimal demonstrations to synthesize optimality-parameterized training data for training our reward function. We empirically validate we can learn an idealized reward function with $\sim0.95$ correlation with the ground truth reward versus only $\sim 0.75$ for prior work. We can then train policies achieving $\sim 200\%$ improvement over the suboptimal demonstration and $\sim 90\%$ improvement over prior work. Finally, we present a real-world implementation for teaching a robot to hit a topspin shot in table tennis better than user demonstration.

[244]  arXiv:2010.11724 [pdf, other]
Title: LID 2020: The Learning from Imperfect Data Challenge Results
Comments: Summary of the 2nd Learning from Imperfect Data Workshop in conjunction with CVPR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Learning from imperfect data becomes an issue in many industrial applications after the research community has made profound progress in supervised learning from perfectly annotated datasets. The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training. A massive amount of user-generated data nowadays available on multiple internet services. How to leverage those and improve the machine learning models is a high impact problem. We organize the challenges in conjunction with the workshop. The goal of these challenges is to find the state-of-the-art approaches in the weakly supervised learning setting for object detection, semantic segmentation, and scene parsing. There are three tracks in the challenge, i.e., weakly supervised semantic segmentation (Track 1), weakly supervised scene parsing (Track 2), and weakly supervised object localization (Track 3). In Track 1, based on ILSVRC DET, we provide pixel-level annotations of 15K images from 200 categories for evaluation. In Track 2, we provide point-based annotations for the training set of ADE20K. In Track 3, based on ILSVRC CLS-LOC, we provide pixel-level annotations of 44,271 images for evaluation. Besides, we further introduce a new evaluation metric proposed by \cite{zhang2020rethinking}, i.e., IoU curve, to measure the quality of the generated object localization maps. This technical report summarizes the highlights from the challenge. The challenge submission server and the leaderboard will continue to open for the researchers who are interested in it. More details regarding the challenge and the benchmarks are available at https://lidchallenge.github.io

[245]  arXiv:2010.11725 [pdf, other]
Title: What do CNN neurons learn: Visualization & Clustering
Authors: Haoyue Dai
Comments: 9 pages, 10 figures, tech report
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In recent years convolutional neural networks (CNN) have shown striking progress in various tasks. However, despite the high performance, the training and prediction process remains to be a black box, leaving it a mystery to extract what neurons learn in CNN. In this paper, we address the problem of interpreting a CNN from the aspects of the input image's focus and preference, and the neurons' domination, activation and contribution to a concrete final prediction. Specifically, we use two techniques - visualization and clustering - to tackle the problems above. Visualization means the method of gradient descent on image pixel, and in clustering section two algorithms are proposed to cluster respectively over image categories and network neurons. Experiments and quantitative analyses have demonstrated the effectiveness of the two methods in explaining the question: what do neurons learn.

[246]  arXiv:2010.11727 [pdf, other]
Title: Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks
Comments: 8 pages
Journal-ref: 25th International Conference on Pattern Recognition (ICPR2020)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD), a shared subtask of several information extraction problems. Scientific publications contain multiple types of information sought by researchers in various disciplines, organized into an abstract, bibliography, and sections documenting related work, experimental methods, and results; however, there is no effective way to extract this information due to their diverse layout. In this paper, we present a novel approach to developing an end-to-end learning framework to segment and classify major regions of a scientific document. We consider scientific document layout analysis as an object detection task over digital images, without any additional text features that need to be added into the network during the training process. Our technical objective is to implement transfer learning via fine-tuning of pre-trained networks and thereby demonstrate that this deep learning architecture is suitable for tasks that lack very large document corpora for training ab initio. As part of the experimental test bed for empirical evaluation of this approach, we created a merged multi-corpus data set for scientific publication layout detection tasks. Our results show good improvement with fine-tuning of a pre-trained base network using this merged data set, compared to the baseline convolutional neural network architecture.

[247]  arXiv:2010.11728 [pdf, other]
Title: A highly efficient and accurate exponential semi-implicit scalar auxiliary variable (ESI-SAV) approach for dissipative system
Comments: arXiv admin note: text overlap with arXiv:1912.09263, arXiv:2001.00812
Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)

The scalar auxiliary variable (SAV) approach is a very popular and efficient method to simulate various phase field models. To save the computational cost, a new SAV approach is given by introducing a new variable $\theta$. The new SAV approach can be proved to save nearly half CPU time of the original SAV approach while keeping all its other advantages. In this paper, we propose a novel technique to construct an exponential semi-implicit scalar auxiliary variable (ESI-SAV) approach without introducing any extra variables. The new proposed method also only needs to solve one linear equation with constant coefficients at each time step. Furthermore, the constructed ESI-SAV method does not need the bounded below restriction of nonlinear free energy potential which is more reasonable and effective for various phase field models. Meanwhile it is easy to construct first-order, second-order and higher-order unconditionally energy stable time-stepping schemes. Other than that, the ESI-SAV approach can be proved to be effective to solve the non-gradient but dissipative system such as Navier-Stokes equations. Several numerical examples are provided to demonstrate the improved efficiency and accuracy of the proposed method.

[248]  arXiv:2010.11731 [pdf, other]
Title: Improving BERT Performance for Aspect-Based Sentiment Analysis
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)

Aspect-Based Sentiment Analysis (ABSA) studies the consumer opinion on the market products. It involves examining the type of sentiments as well as sentiment targets expressed in product reviews. Analyzing the language used in a review is a difficult task that requires a deep understanding of the language. In recent years, deep language models, such as BERT \cite{devlin2019bert}, have shown great progress in this regard. In this work, we propose two simple modules called Parallel Aggregation and Hierarchical Aggregation to be utilized on top of BERT for two main ABSA tasks namely Aspect Extraction (AE) and Aspect Sentiment Classification (ASC) in order to improve the model's performance. We show that applying the proposed models eliminates the need for further training of the BERT model. The source code is available on the Web for further research and reproduction of the results.

[249]  arXiv:2010.11732 [pdf, other]
Title: A Cluster-Matching-Based Method for Video Face Recognition
Comments: 13 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)

Face recognition systems are present in many modern solutions and thousands of applications in our daily lives. However, current solutions are not easily scalable, especially when it comes to the addition of new targeted people. We propose a cluster-matching-based approach for face recognition in video. In our approach, we use unsupervised learning to cluster the faces present in both the dataset and targeted videos selected for face recognition. Moreover, we design a cluster matching heuristic to associate clusters in both sets that is also capable of identifying when a face belongs to a non-registered person. Our method has achieved a recall of 99.435% and a precision of 99.131% in the task of video face recognition. Besides performing face recognition, it can also be used to determine the video segments where each person is present.

[250]  arXiv:2010.11733 [pdf, other]
Title: Multi-Radar Tracking Optimization for Collaborative Combat
Comments: Conference On Artificial Intelligence in Defense (CAID'2020), Nov 2020, Rennes, France
Subjects: Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)

Smart Grids of collaborative netted radars accelerate kill chains through more efficient cross-cueing over centralized command and control. In this paper, we propose two novel reward-based learning approaches to decentralized netted radar coordination based on black-box optimization and Reinforcement Learning (RL). To make the RL approach tractable, we use a simplification of the problem that we proved to be equivalent to the initial formulation. We apply these techniques on a simulation where radars can follow multiple targets at the same time and show they can learn implicit cooperation by comparing them to a greedy baseline.

[251]  arXiv:2010.11734 [pdf, other]
Title: Identification of deep breath while moving forward based on multiple body regions and graph signal analysis
Comments: 5 pages, 3 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Systems and Control (eess.SY)

This paper presents an unobtrusive solution that can automatically identify deep breath when a person is walking past the global depth camera. Existing non-contact breath assessments achieve satisfactory results under restricted conditions when human body stays relatively still. When someone moves forward, the breath signals detected by depth camera are hidden within signals of trunk displacement and deformation, and the signal length is short due to the short stay time, posing great challenges for us to establish models. To overcome these challenges, multiple region of interests (ROIs) based signal extraction and selection method is proposed to automatically obtain the signal informative to breath from depth video. Subsequently, graph signal analysis (GSA) is adopted as a spatial-temporal filter to wipe the components unrelated to breath. Finally, a classifier for identifying deep breath is established based on the selected breath-informative signal. In validation experiments, the proposed approach outperforms the comparative methods with the accuracy, precision, recall and F1 of 75.5%, 76.2%, 75.0% and 75.2%, respectively. This system can be extended to public places to provide timely and ubiquitous help for those who may have or are going through physical or mental trouble.

[252]  arXiv:2010.11735 [pdf, other]
Title: Self-Supervised Learning of Part Mobility from Point Cloud Sequence
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Part mobility analysis is a significant aspect required to achieve a functional understanding of 3D objects. It would be natural to obtain part mobility from the continuous part motion of 3D objects. In this study, we introduce a self-supervised method for segmenting motion parts and predicting their motion attributes from a point cloud sequence representing a dynamic object. To sufficiently utilize spatiotemporal information from the point cloud sequence, we generate trajectories by using correlations among successive frames of the sequence instead of directly processing the point clouds. We propose a novel neural network architecture called PointRNN to learn feature representations of trajectories along with their part rigid motions. We evaluate our method on various tasks including motion part segmentation, motion axis prediction and motion range estimation. The results demonstrate that our method outperforms previous techniques on both synthetic and real datasets. Moreover, our method has the ability to generalize to new and unseen objects. It is important to emphasize that it is not required to know any prior shape structure, prior shape category information, or shape orientation. To the best of our knowledge, this is the first study on deep learning to extract part mobility from point cloud sequence of a dynamic object.

[253]  arXiv:2010.11738 [pdf, other]
Title: Optimising Stochastic Routing for Taxi Fleets with Model Enhanced Reinforcement Learning
Comments: 24 pages, comments welcome
Subjects: Machine Learning (cs.LG); Adaptation and Self-Organizing Systems (nlin.AO); Physics and Society (physics.soc-ph)

The future of mobility-as-a-Service (Maas)should embrace an integrated system of ride-hailing, street-hailing and ride-sharing with optimised intelligent vehicle routing in response to a real-time, stochastic demand pattern. We aim to optimise routing policies for a large fleet of vehicles for street-hailing services, given a stochastic demand pattern in small to medium-sized road networks. A model-based dispatch algorithm, a high performance model-free reinforcement learning based algorithm and a novel hybrid algorithm combining the benefits of both the top-down approach and the model-free reinforcement learning have been proposed to route the \emph{vacant} vehicles. We design our reinforcement learning based routing algorithm using proximal policy optimisation and combined intrinsic and extrinsic rewards to strike a balance between exploration and exploitation. Using a large-scale agent-based microscopic simulation platform to evaluate our proposed algorithms, our model-free reinforcement learning and hybrid algorithm show excellent performance on both artificial road network and community-based Singapore road network with empirical demands, and our hybrid algorithm can significantly accelerate the model-free learner in the process of learning.

[254]  arXiv:2010.11740 [pdf, ps, other]
Title: Robust Low-tubal-rank Tensor Completion based on Tensor Factorization and Maximum Correntopy Criterion
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

The goal of tensor completion is to recover a tensor from a subset of its entries, often by exploiting its low-rank property. Among several useful definitions of tensor rank, the low-tubal-rank was shown to give a valuable characterization of the inherent low-rank structure of a tensor. While some low-tubal-rank tensor completion algorithms with favorable performance have been recently proposed, these algorithms utilize second-order statistics to measure the error residual, which may not work well when the observed entries contain large outliers. In this paper, we propose a new objective function for low-tubal-rank tensor completion, which uses correntropy as the error measure to mitigate the effect of the outliers. To efficiently optimize the proposed objective, we leverage a half-quadratic minimization technique whereby the optimization is transformed to a weighted low-tubal-rank tensor factorization problem. Subsequently, we propose two simple and efficient algorithms to obtain the solution and provide their convergence and complexity analysis. Numerical results using both synthetic and real data demonstrate the robust and superior performance of the proposed algorithms.

[255]  arXiv:2010.11742 [pdf, other]
Title: Learning Black-Box Attackers with Transferable Priors and Query Feedback
Comments: NeurIPS 2020. Code is available at this https URL
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

This paper addresses the challenging black-box adversarial attack problem, where only classification confidence of a victim model is available. Inspired by consistency of visual saliency between different vision models, a surrogate model is expected to improve the attack performance via transferability. By combining transferability-based and query-based black-box attack, we propose a surprisingly simple baseline approach (named SimBA++) using the surrogate model, which significantly outperforms several state-of-the-art methods. Moreover, to efficiently utilize the query feedback, we update the surrogate model in a novel learning scheme, named High-Order Gradient Approximation (HOGA). By constructing a high-order gradient computation graph, we update the surrogate model to approximate the victim model in both forward and backward pass. The SimBA++ and HOGA result in Learnable Black-Box Attack (LeBA), which surpasses previous state of the art by considerable margins: the proposed LeBA significantly reduces queries, while keeping higher attack success rates close to 100% in extensive ImageNet experiments, including attacking vision benchmarks and defensive models. Code is open source at https://github.com/TrustworthyDL/LeBA.

[256]  arXiv:2010.11743 [pdf, other]
Title: The Role of Machine Learning for Trajectory Prediction in Cooperative Driving
Comments: arXiv admin note: text overlap with arXiv:2010.10426
Journal-ref: Mobihoc 2020: Proceedings of the Twenty-First International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

In this paper, we study the role that machine learning can play in cooperative driving. Given the increasing rate of connectivity in modern vehicles, and road infrastructure, cooperative driving is a promising first step in automated driving. The example scenario we explored in this paper, is coordinated lane merge, with data collection, test and evaluation all conducted in an automotive test track. The assumption is that vehicles are a mix of those equipped with communication units on board, i.e. connected vehicles, and those that are not connected. However, roadside cameras are connected and can capture all vehicles including those without connectivity. We develop a Traffic Orchestrator that suggests trajectories based on these two sources of information, i.e. connected vehicles, and connected roadside cameras. Recommended trajectories are built, which are then communicated back to the connected vehicles. We explore the use of different machine learning techniques in accurately and timely prediction of trajectories.

[257]  arXiv:2010.11744 [pdf]
Title: A Qualitative Analysis of Haptic Feedback in Music Focused Exercises
Comments: 6 pages
Journal-ref: Proceedings of the International Conference on New Interfaces for Musical Expression, 2017
Subjects: Human-Computer Interaction (cs.HC); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

We present the findings of a pilot-study that analysed the role of haptic feedback in a musical context. To examine the role of haptics in Digital Musical Instrument (DMI) design an experiment was formulated to measure the users' perception of device usability across four separate feedback stages: fully haptic (force and tactile combined), constant force only, vibrotactile only, and no feedback. The study was piloted over extended periods with the intention of exploring the application and integration of DMIs in real-world musical contexts. Applying a music orientated analysis of this type enabled the investigative process to not only take place over a comprehensive period, but allowed for the exploration of DMI integration in everyday compositional practices. As with any investigation that involves creativity, it was important that the participants did not feel rushed or restricted. That is, they were given sufficient time to explore and assess the different feedback types without constraint. This provided an accurate and representational set of qualitative data for validating the participants' experience with the different feedback types they were presented with.

[258]  arXiv:2010.11745 [pdf, ps, other]
Title: Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Is pushing numbers on a single benchmark valuable in automatic speech recognition? Research results in acoustic modeling are typically evaluated based on performance on a single dataset. While the research community has coalesced around various benchmarks, we set out to understand generalization performance in acoustic modeling across datasets -- in particular, if models trained on a single dataset transfer to other (possibly out-of-domain) datasets. Further, we demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world data. Finally, we show that training a single acoustic model on the most widely-used datasets -- combined -- reaches competitive performance on both research and real-world benchmarks.

[259]  arXiv:2010.11746 [pdf, other]
Title: Iterative Decomposition of Joint Chance Constraints in OPF
Subjects: Systems and Control (eess.SY)

In chance-constrained OPF models, joint chance constraints (JCCs) offer a stronger guarantee on security compared to single chance constraints (SCCs). Using Boole's inequality or its improved versions to decompose JCCs into SCCs is popular, yet the conservativeness introduced is still significant. In this letter, a non-parametric iterative framework is proposed to achieve the decomposition of JCCs with negligible conservativeness. An adaptive risk allocation strategy is also proposed and embedded in the framework. Results on an IEEE test case show that the conservativeness using the framework is nearly eliminated, thereby reducing the generation cost considerably.

[260]  arXiv:2010.11747 [pdf, other]
Title: CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20
Comments: WMT20
Subjects: Computation and Language (cs.CL)

This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised and very low-resource supervised machine translation between German and Upper Sorbian. We experimented with training on synthetic data and pre-training on a related language pair. In the fully unsupervised scenario, we achieved 25.5 and 23.7 BLEU translating from and into Upper Sorbian, respectively. Our low-resource systems relied on transfer learning from German-Czech parallel data and achieved 57.4 BLEU and 56.1 BLEU, which is an improvement of 10 BLEU points over the baseline trained only on the available small German-Upper Sorbian parallel corpus.

[261]  arXiv:2010.11749 [pdf, ps, other]
Title: How wireless queues benefit from motion: an analysis of the continuum between zero and infinite mobility
Comments: Preliminary version appeared in WiOPT 2020
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Probability (math.PR)

This paper considers the time evolution of a queue that is embedded in a Poisson point process of moving wireless interferers. The queue is driven by an external arrival process and is subject to a time-varying service process that is a function of the SINR that it sees. Static configurations of interferers result in an infinite queue workload with positive probability. In contrast, a generic stability condition is established for the queue in the case where interferers possess any non-zero mobility that results in displacements that are both independent across interferers and oblivious to interferer positions. The proof leverages the mixing property of the Poisson point process. The effect of an increase in mobility on queueing metrics is also studied. Convex ordering tools are used to establish that faster moving interferers result in a queue workload that is larger for the increasing convex stochastic order. As a corollary, mean workload and mean delay improve as network mobility increases. Positive correlation between SINR level-crossing events at different time points is established and the autocorrelation function is determined. System behaviour is empirically analyzed using discrete-event simulation. The performance of various mobility models is evaluated using heavy-traffic approximations.

[262]  arXiv:2010.11752 [pdf, other]
Title: TurboLoRa: enhancing LoRaWAN data rate via device synchronization
Subjects: Networking and Internet Architecture (cs.NI)

Over the last few years we have witnessed an exponential growth in the adoption of LoRaWAN as LPWAN technology for IoT. While LoRaWAN offers many advantages, one of its limitations is the paltry data rate. Most IoT applications don't require a high throughput but there are some that would benefit from a higher data rate. In this paper, we present TurboLoRa, a system that combines the strengths of LoRaWAN while providing a higher data rate by synchronizing the transmission of multiple LoRaWAN devices. Our proposal allows to combine cheap devices making it a frugal solution to this kind of problems. We present some preliminary results obtained using a real prototype of TurboLoRa.

[263]  arXiv:2010.11754 [pdf, ps, other]
Title: Separation Results for Boolean Function Classes
Subjects: Computational Complexity (cs.CC); Cryptography and Security (cs.CR)

We show (almost) separation between certain important classes of Boolean functions. The technique that we use is to show that the total influence of functions in one class is less than the total influence of functions in the other class. In particular, we show (almost) separation of several classes of Boolean functions which have been studied in the coding theory and cryptography from classes which have been studied in combinatorics and complexity theory.

[264]  arXiv:2010.11757 [pdf, ps, other]
Title: Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Comments: Codes and models are available on \url{this https URL}
Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years, a number of approaches based on 2D CNNs and 3D CNNs have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets. In this paper, we carry out an in-depth comparative analysis to better understand the differences between these approaches and the progress made by them. To this end, we develop a unified framework for both 2D-CNN and 3D-CNN action models, which enables us to remove bells and whistles and provides a common ground for a fair comparison. We then conduct an effort towards a large-scale analysis involving over 300 action recognition models. Our comprehensive analysis reveals that a) a significant leap is made in efficiency for action recognition, but not in accuracy; b) 2D-CNN and 3D-CNN models behave similarly in terms of spatio-temporal representation abilities and transferability. Our analysis also shows that recent action models seem to be able to learn data-dependent temporality flexibly as needed. Our codes and models are available on \url{https://github.com/IBM/action-recognition-pytorch}.

[265]  arXiv:2010.11761 [pdf, other]
Title: A Comparative Analysis of Industry Human-AI Interaction Guidelines
Comments: 8 pages, 3 figures, Presented at VIS2020 Workshop on TRust and EXpertise in Visual Analytics
Subjects: Human-Computer Interaction (cs.HC)

With the recent release of AI interaction guidelines from Apple, Google, and Microsoft, there is clearly interest in understanding the best practices in human-AI interaction. However, industry standards are not determined by a single company, but rather by the synthesis of knowledge from the whole community. We have surveyed all of the design guidelines from each of these major companies and developed a single, unified structure of guidelines, giving developers a centralized reference. We have then used this framework to compare each of the surveyed companies to find differences in areas of emphasis. Finally, we encourage people to contribute additional guidelines from other companies, academia, or individuals, to provide an open and extensible reference of AI design guidelines at https://ai-open-guidelines.readthedocs.io/.

[266]  arXiv:2010.11762 [pdf, ps, other]
Title: Ghost Signals: Verifying Termination of Busy-Waiting
Comments: 44 pages, 14 figures
Subjects: Logic in Computer Science (cs.LO); Programming Languages (cs.PL)

Programs for multiprocessor machines commonly perform busy-waiting for synchronization. We propose a separation logic using so-called ghost signals to modularly verify termination of such programs under fair scheduling. Intuitively spoken, ghost signals lift the runtime concept of wait-notify synchronization to the verification level and allow a thread to busy-wait for an event $X$ while another thread promises to trigger $X$.

[267]  arXiv:2010.11764 [pdf, other]
Title: EIGEN: Event Influence GENeration using Pre-trained Language Models
Subjects: Computation and Language (cs.CL)

Reasoning about events and tracking their influences is fundamental to understanding processes. In this paper, we present EIGEN - a method to leverage pre-trained language models to generate event influences conditioned on a context, nature of their influence, and the distance in a reasoning chain. We also derive a new dataset for research and evaluation of methods for event influence generation. EIGEN outperforms strong baselines both in terms of automated evaluation metrics (by 10 ROUGE points) and human judgments on closeness to reference and relevance of generations. Furthermore, we show that the event influences generated by EIGEN improve the performance on a "what-if" Question Answering (WIQA) benchmark (over 3% F1), especially for questions that require background knowledge and multi-hop reasoning.

[268]  arXiv:2010.11769 [pdf]
Title: Deep learning prediction of patient response time course from early data via neural-pharmacokinetic/pharmacodynamic modeling
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

The longitudinal analysis of patient response time course following doses of therapeutics is currently performed using Pharmacokinetic/Pharmacodynamic (PK/PD) methodologies, which requires significant human experience and expertise in the modeling of dynamical systems. By utilizing recent advancements in deep learning, we show that the governing differential equations can be learnt directly from longitudinal patient data. In particular, we propose a novel neural-PK/PD framework that combines key pharmacological principles with neural ordinary differential equations. We applied it to an analysis of drug concentration and platelet response from a clinical dataset consisting of over 600 patients. We show that the neural-PK/PD model improves upon a state-of-the-art model with respect to metrics for temporal prediction. Furthermore, by incorporating key PK/PD concepts into its architecture, the model can generalize and enable the simulations of patient responses to untested dosing regimens. These results demonstrate the potential of neural-PK/PD for automated predictive analytics of patient response time course.

[269]  arXiv:2010.11773 [pdf, other]
Title: On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks
Comments: Accepted at ICPR 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach by also considering the model size. Both methods are motivated by recent developments in the deep learning community, and they provide effective means to trade off between model size and prediction accuracy, which is demonstrated in extensive experiments. Furthermore, we contrast quantized BN classifiers with quantized deep neural networks (DNNs) for small-scale scenarios which have hardly been investigated in the literature. We show Pareto optimal models with respect to model size, number of operations, and test error and find that both model classes are viable options.

[270]  arXiv:2010.11775 [pdf, other]
Title: Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity
Comments: 32 pages, 2 figures, 3 pages
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

As a popular approach to modeling the dynamics of training overparametrized neural networks (NNs), the neural tangent kernels (NTK) are known to fall behind real-world NNs in generalization ability. This performance gap is in part due to the \textit{label agnostic} nature of the NTK, which renders the resulting kernel not as \textit{locally elastic} as NNs~\citep{he2019local}. In this paper, we introduce a novel approach from the perspective of \emph{label-awareness} to reduce this gap for the NTK. Specifically, we propose two label-aware kernels that are each a superimposition of a label-agnostic part and a hierarchy of label-aware parts with increasing complexity of label dependence, using the Hoeffding decomposition. Through both theoretical and empirical evidence, we show that the models trained with the proposed kernels better simulate NNs in terms of generalization ability and local elasticity.

[271]  arXiv:2010.11780 [pdf, other]
Title: FasterRCNN Monitoring of Road Damages: Competition and Deployment
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Maintaining aging infrastructure is a challenge currently faced by local and national administrators all around the world. An important prerequisite for efficient infrastructure maintenance is to continuously monitor (i.e., quantify the level of safety and reliability) the state of very large structures. Meanwhile, computer vision has made impressive strides in recent years, mainly due to successful applications of deep learning models. These novel progresses are allowing the automation of vision tasks, which were previously impossible to automate, offering promising possibilities to assist administrators in optimizing their infrastructure maintenance operations. In this context, the IEEE 2020 global Road Damage Detection (RDD) Challenge is giving an opportunity for deep learning and computer vision researchers to get involved and help accurately track pavement damages on road networks. This paper proposes two contributions to that topic: In a first part, we detail our solution to the RDD Challenge. In a second part, we present our efforts in deploying our model on a local road network, explaining the proposed methodology and encountered challenges.

[272]  arXiv:2010.11782 [pdf, other]
Title: Adversarial Attacks on Binary Image Recognition Systems
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)

We initiate the study of adversarial attacks on models for binary (i.e. black and white) image classification. Although there has been a great deal of work on attacking models for colored and grayscale images, little is known about attacks on models for binary images. Models trained to classify binary images are used in text recognition applications such as check processing, license plate recognition, invoice processing, and many others. In contrast to colored and grayscale images, the search space of attacks on binary images is extremely restricted and noise cannot be hidden with minor perturbations in each pixel. Thus, the optimization landscape of attacks on binary images introduces new fundamental challenges.
In this paper we introduce a new attack algorithm called SCAR, designed to fool classifiers of binary images. We show that SCAR significantly outperforms existing $L_0$ attacks applied to the binary setting and use it to demonstrate the vulnerability of real-world text recognition systems. SCAR's strong performance in practice contrasts with the existence of classifiers that are provably robust to large perturbations. In many cases, altering a single pixel is sufficient to trick Tesseract, a popular open-source text recognition system, to misclassify a word as a different word in the English dictionary. We also license software from providers of check processing systems to most of the major US banks and demonstrate the vulnerability of check recognitions for mobile deposits. These systems are substantially harder to fool since they classify both the handwritten amounts in digits and letters, independently. Nevertheless, we generalize SCAR to design attacks that fool state-of-the-art check processing systems using unnoticeable perturbations that lead to misclassification of deposit amounts. Consequently, this is a powerful method to perform financial fraud.

[273]  arXiv:2010.11784 [pdf, other]
Title: Self-alignment Pre-training for Biomedical Entity Representations
Comments: 8 pages. work in progress
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Despite the widespread success of self-supervised learning via masked language models, learning representations directly from text to accurately capture complex and fine-grained semantic relationships in the biomedical domain remains as a challenge. Addressing this is of paramount importance for tasks such as entity linking where complex relational knowledge is pivotal. We propose SapBERT, a pre-training scheme based on BERT. It self-aligns the representation space of biomedical entities with a metric learning objective function leveraging UMLS, a collection of biomedical ontologies with >4M concepts. Our experimental results on six medical entity linking benchmarking datasets demonstrate that SapBERT outperforms many domain-specific BERT-based variants such as BioBERT, BlueBERT and PubMedBERT, achieving the state-of-the-art (SOTA) performances.

[274]  arXiv:2010.11786 [pdf, other]
Title: Spikyball sampling: Exploring large networks via an inhomogeneous filtered diffusion
Subjects: Social and Information Networks (cs.SI); Information Retrieval (cs.IR)

Studying real-world networks such as social networks or web networks is a challenge. These networks often combine a complex, highly connected structure together with a large size. We propose a new approach for large scale networks that is able to automatically sample user-defined relevant parts of a network. Starting from a few selected places in the network and a reduced set of expansion rules, the method adopts a filtered breadth-first search approach, that expands through edges and nodes matching these properties. Moreover, the expansion is performed over a random subset of neighbors at each step to mitigate further the overwhelming number of connections that may exist in large graphs. This carries the image of a "spiky" expansion. We show that this approach generalize previous exploration sampling methods, such as Snowball or Forest Fire and extend them. We demonstrate its ability to capture groups of nodes with high interactions while discarding weakly connected nodes that are often numerous in social networks and may hide important structures.

[275]  arXiv:2010.11787 [pdf, other]
Title: Prediction of Rainfall in Rajasthan, India using Deep and Wide Neural Network
Subjects: Machine Learning (cs.LG)

Rainfall is a natural process which is of utmost importance in various areas including water cycle, ground water recharging, disaster management and economic cycle. Accurate prediction of rainfall intensity is a challenging task and its exact prediction helps in every aspect. In this paper, we propose a deep and wide rainfall prediction model (DWRPM) and evaluate its effectiveness to predict rainfall in Indian state of Rajasthan using historical time-series data. For wide network, instead of using rainfall intensity values directly, we are using features obtained after applying a convolutional layer. For deep part, a multi-layer perceptron (MLP) is used. Information of geographical parameters (latitude and longitude) are included in a unique way. It gives the model a generalization ability, which helps a single model to make rainfall predictions in different geographical conditions. We compare our results with various deep-learning approaches like MLP, LSTM and CNN, which are observed to work well in sequence-based predictions. Experimental analysis and comparison shows the applicability of our proposed method for rainfall prediction in Rajasthan.

[276]  arXiv:2010.11788 [pdf, ps, other]
Title: Equation satisfiability in solvable groups
Subjects: Computational Complexity (cs.CC); Group Theory (math.GR)

The study of the complexity of the equation satisfiability problem in finite groups had been initiated by Goldmann and Russell (2002) where they showed that this problem is in polynomial time for nilpotent groups while it is NP-complete for non-solvable groups. Since then, several results have appeared showing that the problem can be solved in polynomial time in certain solvable groups $G$ having a nilpotent normal subgroup $H$ with nilpotent factor $G/H$. This paper shows that such normal subgroup must exist in each finite group with equation satisfiability solvable in polynomial time, unless the Exponential Time Hypothesis fails.

[277]  arXiv:2010.11791 [pdf, other]
Title: ConVEx: Data-Efficient and Few-Shot Slot Labeling
Subjects: Computation and Language (cs.CL)

We propose ConVEx (Conversational Value Extractor), an efficient pretraining and fine-tuning neural approach for slot-labeling dialog tasks. Instead of relying on more general pretraining objectives from prior work (e.g., language modeling, response selection), ConVEx's pretraining objective, a novel pairwise cloze task using Reddit data, is well aligned with its intended usage on sequence labeling tasks. This enables learning domain-specific slot labelers by simply fine-tuning decoding layers of the pretrained general-purpose sequence labeling model, while the majority of the pretrained model's parameters are kept frozen. We report state-of-the-art performance of ConVEx across a range of diverse domains and data sets for dialog slot-labeling, with the largest gains in the most challenging, few-shot setups. We believe that ConVEx's reduced pretraining times (i.e., only 18 hours on 12 GPUs) and cost, along with its efficient fine-tuning and strong performance, promise wider portability and scalability for data-efficient sequence-labeling tasks in general.

[278]  arXiv:2010.11792 [pdf, other]
Title: Resource allocation for task-level speculative scientific applications: a proof of concept using Parallel Trajectory Splicing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Materials Science (cond-mat.mtrl-sci)

The constant increase in parallelism available on large-scale distributed computers poses major scalability challenges to many scientific applications. A common strategy to improve scalability is to express the algorithm in terms of independent tasks that can be executed concurrently on a runtime system. In this manuscript, we consider a generalization of this approach where task-level speculation is allowed. In this context, a probability is attached to each task which corresponds to the likelihood that the product of the task will be consumed as part of the calculation. We consider the problem of optimal resource allocation to each of the possible tasks so as too maximize the expected overall computational throughput. The power of this approach is demonstrated by analyzing its application to Parallel Trajectory Splicing, a massively-parallel long-time-dynamics method for atomistic simulations.

[279]  arXiv:2010.11793 [pdf, other]
Title: Metapath- and Entity-aware Graph Neural Network for Recommendation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Due to the shallow structure, classic graph neural networks (GNNs) failed in modelling high-order graph structures that deliver critical insights of task relevant relations. The negligence of those insights lead to insufficient distillation of collaborative signals in recommender systems. In this paper, we propose PEAGNN, a unified GNN framework tailored for recommendation tasks, which is capable of exploiting the rich semantics in metapaths. PEAGNN trains multilayer GNNs to perform metapath-aware information aggregation on collaborative subgraphs, $h$-hop subgraphs around the target user-item pairs. After the attentive fusion of aggregated information from different metapaths, a graph-level representation is then extracted for matching score prediction. To leverage the local structure of collaborative subgraphs, we present entity-awareness that regularizes node embedding with the presence of features in a contrastive manner. Moreover, PEAGNN is compatible with the mainstream GNN structures such as GCN, GAT and GraphSage. The empirical analysis on three public datasets demonstrate that our model outperforms or is at least on par with other competitive baselines. Further analysis indicates that trained PEAGNN automatically derives meaningful metapath combinations from the given metapaths.

[280]  arXiv:2010.11796 [pdf, other]
Title: CryptoGRU: Low Latency Privacy-Preserving Text Analysis With GRU
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Billions of text analysis requests containing private emails, personal text messages, and sensitive online reviews, are processed by recurrent neural networks (RNNs) deployed on public clouds every day. Although prior secure networks combine homomorphic encryption (HE) and garbled circuit (GC) to preserve users' privacy, naively adopting the HE and GC hybrid technique to implement RNNs suffers from long inference latency due to slow activation functions. In this paper, we present a HE and GC hybrid gated recurrent unit (GRU) network, CryptoGRU, for low-latency secure inferences. CryptoGRU replaces computationally expensive GC-based $tanh$ with fast GC-based $ReLU$, and then quantizes $sigmoid$ and $ReLU$ with a smaller bit length to accelerate activations in a GRU. We evaluate CryptoGRU with multiple GRU models trained on 4 public datasets. Experimental results show CryptoGRU achieves top-notch accuracy and improves the secure inference latency by up to $138\times$ over one of state-of-the-art secure networks on the Penn Treebank dataset.

[281]  arXiv:2010.11797 [pdf, other]
Title: Should Graph Convolution Trust Neighbors? A Simple Causal Inference Method
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Recent studies on Graph Convolutional Networks (GCNs) reveal the usefulness of adaptive locality, which enables adjusting the contribution of a neighbor to the target node representation. Existing work typically achieves adaptive locality by introducing an additional module such as graph attention, which learns to weigh neighbor nodes. However, such module may not work well in practice, since fitting training data well does not necessarily lead to reasonable adaptive locality, especially when the labeled data are small. In an orthogonal direction, this work explores how to achieve adaptive locality in the model inference stage, a new perspective that receives little scrutiny. The main advantage of leaving the training stage unchanged is generality -- it can be applied to most GCNs and improve their inference accuracy. Given a trained GCN model, the idea is to make a counterfactual prediction by blocking the graph structure, i.e., forcing the model to use each node's own features to predict its label. By comparing the real prediction with counterfactual prediction, we can assess the trustworthiness of neighbor nodes. Furthermore, we explore graph uncertainty that measures how the prediction would vary with changes on graph structure, and introduce edge dropout into the inference stage to estimate graph uncertainty. We conduct empirical studies on seven node classification datasets to validate the effectiveness of our methods.

[282]  arXiv:2010.11800 [pdf, other]
Title: Castle in the Sky: Dynamic Sky Replacement and Harmonization in Videos
Authors: Zhengxia Zou
Comments: project website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)

This paper proposes a vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles. Different from previous sky editing methods that either focus on static photos or require inertial measurement units integrated in smartphones on shooting videos, our method is purely vision-based, without any requirements on the capturing devices, and can be well applied to either online or offline processing scenarios. Our method runs in real-time and is free of user interactions. We decompose this artistic creation process into a couple of proxy tasks including sky matting, motion estimation, and image blending. Experiments are conducted on videos diversely captured in the wild by handheld smartphones and dash cameras, and show high fidelity and good generalization of our method in both visual quality and lighting/motion dynamics. Our code and animated results are available at \url{https://jiupinjia.github.io/skyar/}.

[283]  arXiv:2010.11803 [pdf, other]
Title: Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

We propose a new method for speaker diarization that can handle overlapping speech with 2+ people. Our method is based on compositional embeddings [1]: Like standard speaker embedding methods such as x-vector [2], compositional embedding models contain a function f that separates speech from different speakers. In addition, they include a composition function g to compute set-union operations in the embedding space so as to infer the set of speakers within the input audio. In an experiment on multi-person speaker identification using synthesized LibriSpeech data, the proposed method outperforms traditional embedding methods that are only trained to separate single speakers (not speaker sets). In a speaker diarization experiment on the AMI Headset Mix corpus, we achieve state-of-the-art accuracy (DER=22.93%), slightly higher than the previous best result (23.82% from [3]).

[284]  arXiv:2010.11805 [pdf, other]
Title: Urban Sound Classification : striving towards a fair comparison
Comments: 7 pages, 1 figure
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Urban sound classification has been achieving remarkable progress and is still an active research area in audio pattern recognition. In particular, it allows to monitor the noise pollution, which becomes a growing concern for large cities. The contribution of this paper is two-fold. First, we present our DCASE 2020 task 5 winning solution which aims at helping the monitoring of urban noise pollution. It achieves a macro-AUPRC of 0.82 / 0.62 for the coarse / fine classification on validation set. Moreover, it reaches accuracies of 89.7% and 85.41% respectively on ESC-50 and US8k datasets. Second, it is not easy to find a fair comparison and to reproduce the performance of existing models. Sometimes authors copy-pasting the results of the original papers which is not helping reproducibility. As a result, we provide a fair comparison by using the same input representation, metrics and optimizer to assess performances. We preserve data augmentation used by the original papers. We hope this framework could help evaluate new architectures in this field. For better reproducibility, the code is available on our GitHub repository.

[285]  arXiv:2010.11818 [pdf, other]
Title: Compositional Generalization via Semantic Tagging
Subjects: Computation and Language (cs.CL)

Although neural sequence-to-sequence models have been successfully applied to semantic parsing, they struggle to perform well on query-based data splits that require \emph{composition generalization}, an ability of systematically generalizing to unseen composition of seen components. Motivated by the explicitly built-in compositionality in traditional statistical semantic parsing, we propose a new decoding framework that preserves the expressivity and generality of sequence-to-sequence models while featuring explicit lexicon-style alignments and disentangled information processing. Specifically, we decompose decoding into two phases where an input utterance is first tagged with semantic symbols representing the meanings of its individual words, and then a sequence-to-sequence model is used to predict the final meaning representation conditioning on the utterance and the predicted tag sequence. Experimental results on three semantic parsing datasets with query-based splits show that the proposed approach consistently improves compositional generalization of sequence-to-sequence models across different model architectures, domains and semantic formalisms.

[286]  arXiv:2010.11820 [pdf, other]
Title: Posterior Re-calibration for Imbalanced Datasets
Comments: Accepted to NeurIPS 2020
Subjects: Machine Learning (cs.LG)

Neural Networks can perform poorly when the training label distribution is heavily imbalanced, as well as when the testing data differs from the training distribution. In order to deal with shift in the testing label distribution, which imbalance causes, we motivate the problem from the perspective of an optimal Bayes classifier and derive a post-training prior rebalancing technique that can be solved through a KL-divergence based optimization. This method allows a flexible post-training hyper-parameter to be efficiently tuned on a validation set and effectively modify the classifier margin to deal with this imbalance. We further combine this method with existing likelihood shift methods, re-interpreting them from the same Bayesian perspective, and demonstrating that our method can deal with both problems in a unified way. The resulting algorithm can be conveniently used on probabilistic classification problems agnostic to underlying architectures. Our results on six different datasets and five different architectures show state of art accuracy, including on large-scale imbalanced datasets such as iNaturalist for classification and Synthia for semantic segmentation. Please see https://github.com/GT-RIPL/UNO-IC.git for implementation.

[287]  arXiv:2010.11821 [pdf, other]
Title: Scalable Bottom-Up Hierarchical Clustering
Subjects: Machine Learning (cs.LG)

Bottom-up algorithms such as the classic hierarchical agglomerative clustering, are highly effective for hierarchical as well as flat clustering. However, the large number of rounds and their sequential nature limit the scalability of agglomerative clustering. In this paper, we present an alternative round-based bottom-up hierarchical clustering, the Sub-Cluster Component Algorithm (SCC), that scales gracefully to massive datasets. Our method builds many sub-clusters in parallel in a given round and requires many fewer rounds -- usually an order of magnitude smaller than classic agglomerative clustering. Our theoretical analysis shows that, under a modest separability assumption, SCC will contain the optimal flat clustering. SCC also provides a 2-approx solution to the DP-means objective, thereby introducing a novel application of hierarchical clustering methods. Empirically, SCC finds better hierarchies and flat clusterings even when the data does not satisfy the separability assumption. We demonstrate the scalability of our method by applying it to a dataset of 30 billion points and showing that SCC produces higher quality clusterings than the state-of-the-art.

[288]  arXiv:2010.11827 [pdf, other]
Title: Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding
Comments: Submitted to Neurips 2020 Workshop on Data Curation
Subjects: Databases (cs.DB); Machine Learning (cs.LG)

ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.

[289]  arXiv:2010.11828 [pdf, other]
Title: Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness and Accuracy for Free
Comments: NeurIPS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Adversarial training and its many variants substantially improve deep network robustness, yet at the cost of compromising standard accuracy. Moreover, the training process is heavy and hence it becomes impractical to thoroughly explore the trade-off between accuracy and robustness. This paper asks this new question: how to quickly calibrate a trained model in-situ, to examine the achievable trade-offs between its standard and robust accuracies, without (re-)training it many times? Our proposed framework, Once-for-all Adversarial Training (OAT), is built on an innovative model-conditional training framework, with a controlling hyper-parameter as the input. The trained model could be adjusted among different standard and robust accuracies "for free" at testing time. As an important knob, we exploit dual batch normalization to separate standard and adversarial feature statistics, so that they can be learned in one model without degrading performance. We further extend OAT to a Once-for-all Adversarial Training and Slimming (OATS) framework, that allows for the joint trade-off among accuracy, robustness and runtime efficiency. Experiments show that, without any re-training nor ensembling, OAT/OATS achieve similar or even superior performance compared to dedicatedly trained models at various configurations. Our codes and pretrained models are available at: https://github.com/VITA-Group/Once-for-All-Adversarial-Training.

[290]  arXiv:2010.11833 [pdf, other]
Title: Shape related constraints aware generation of Mechanical Designs through Deep Convolutional GAN
Comments: "submitted to the Engineering Applications of Artificial Intelligence journal"
Subjects: Computational Engineering, Finance, and Science (cs.CE)

Mechanical product engineering often must comply with manufacturing or geometric constraints related to the shaping process. Mechanical design hence should rely on robust and fast tools to explore complex shapes, typically for design for additive manufacturing (DfAM). Topology optimization is such a powerful tool, yet integrating geometric constraints (shape-related) into it is hard. In this work, we leverage machine learning capability to handle complex geometric and spatial correlations to integrate into the mechanical design process geometry-related constraints at the conceptual level. More precisely, we explore the generative capabilities of recent Deep Learning architectures to enhance mechanical designs, typically for additive manufacturing. In this work, we build a generative Deep-Learning-based approach of topology optimization integrating mechanical conditions in addition to one typical manufacturing condition (the complexity of a design i.e. a geometrical condition). The approach is a dual-discriminator GAN: a generator that takes as input the mechanical and geometrical conditions and outputs a 2D structure and two discriminators, one to ensure that the generated structure follows the mechanical constraints and the other to assess the geometrical constraint. We also explore the generation of designs with a non-uniform material distribution and show promising results. Finally, We evaluate the generated designs with an objective evaluation of all wanted aspects: the mechanical as well as the geometrical constraints.

[291]  arXiv:2010.11835 [pdf, other]
Title: Multi-agent active perception with prediction rewards
Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)

Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward determined by a centralized decision-maker who perceives the observations gathered by all agents after the task ends. In this paper, we model multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward. We prove that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-POMDP with a decentralized prediction reward. The loss due to decentralization is bounded, and we give a sufficient condition for when it is zero. Our results allow application of any Dec-POMDP solution algorithm to multi-agent active perception problems, and enable planning to reduce uncertainty without explicit computation of joint estimates. We demonstrate the empirical usefulness of our results by applying a standard Dec-POMDP algorithm to multi-agent active perception problems, showing increased scalability in the planning horizon.

[292]  arXiv:2010.11838 [pdf, other]
Title: Blind Video Temporal Consistency via Deep Video Prior
Comments: NeurIPS 2020; github link: github.com/ChenyangLEI/deep-video-prior
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Applying image processing algorithms independently to each video frame often leads to temporal inconsistency in the resulting video. To address this issue, we present a novel and general approach for blind video temporal consistency. Our method is only trained on a pair of original and processed videos directly instead of a large dataset. Unlike most previous methods that enforce temporal consistency with optical flow, we show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior. Moreover, a carefully designed iteratively reweighted training strategy is proposed to address the challenging multimodal inconsistency problem. We demonstrate the effectiveness of our approach on 7 computer vision tasks on videos. Extensive quantitative and perceptual experiments show that our approach obtains superior performance than state-of-the-art methods on blind video temporal consistency. Our source codes are publicly available at github.com/ChenyangLEI/deep-video-prior.

[293]  arXiv:2010.11842 [pdf, other]
Title: Containment in Monadic Disjunctive Datalog, MMSNP, and Expressive Description Logics
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)

We study query containment in three closely related formalisms: monadic disjunctive Datalog (MDDLog), MMSNP (a logical generalization of constraint satisfaction problems), and ontology-mediated queries (OMQs) based on expressive description logics and unions of conjunctive queries. Containment in MMSNP was known to be decidable due to a result by Feder and Vardi, but its exact complexity has remained open. We prove 2NEXPTIME-completeness and extend this result to monadic disjunctive Datalog and to OMQs.

[294]  arXiv:2010.11844 [pdf, other]
Title: Spatio-temporal Features for Generalized Detection of Deepfake Videos
Comments: Submitted to Computer Vision and Image Understanding (CVIU)
Subjects: Computer Vision and Pattern Recognition (cs.CV)

For deepfake detection, video-level detectors have not been explored as extensively as image-level detectors, which do not exploit temporal data. In this paper, we empirically show that existing approaches on image and sequence classifiers generalize poorly to new manipulation techniques. To this end, we propose spatio-temporal features, modeled by 3D CNNs, to extend the generalization capabilities to detect new sorts of deepfake videos. We show that spatial features learn distinct deepfake-method-specific attributes, while spatio-temporal features capture shared attributes between deepfake methods. We provide an in-depth analysis of how the sequential and spatio-temporal video encoders are utilizing temporal information using DFDC dataset arXiv:2006.07397. Thus, we unravel that our approach captures local spatio-temporal relations and inconsistencies in the deepfake videos while existing sequence encoders are indifferent to it. Through large scale experiments conducted on the FaceForensics++ arXiv:1901.08971 and Deeper Forensics arXiv:2001.03024 datasets, we show that our approach outperforms existing methods in terms of generalization capabilities.

[295]  arXiv:2010.11848 [pdf, ps, other]
Title: From Conjunctive Queries to Instance Queries in Ontology-Mediated Querying
Subjects: Artificial Intelligence (cs.AI); Databases (cs.DB)

We consider ontology-mediated queries (OMQs) based on expressive description logics of the ALC family and (unions) of conjunctive queries, studying the rewritability into OMQs based on instance queries (IQs). Our results include exact characterizations of when such a rewriting is possible and tight complexity bounds for deciding rewritability. We also give a tight complexity bound for the related problem of deciding whether a given MMSNP sentence is equivalent to a CSP.

[296]  arXiv:2010.11851 [pdf, other]
Title: Hawkes Process Classification through Discriminative Modeling of Text
Comments: 9 pages, 10 figures
Subjects: Social and Information Networks (cs.SI)

Social media has provided a platform for users to gather and share information and stay updated with the news. Such networks also provide a platform to users where they can engage in conversations. However, such micro-blogging platforms like Twitter restricts the length of text. Due to paucity of sufficient word occurrences in such posts, classification of this information is a challenging task using standard tools of natural language processing (NLP). Moreover, high complexity and dynamics of the posts in social media makes text classification a challenging problem. However, considering additional cues in the form of past labels and times associated with the post can be potentially helpful for performing text classification in a better way. To address this problem, we propose models based on the Hawkes process (HP) which can naturally incorporate the temporal features and past labels along with textual features for improving short text classification. In particular, we propose a discriminative approach to model text in HP where the text features parameterize the base intensity and/or the triggering kernel. Another major contribution is to consider kernel to be a function of both time and text, and further use a neural network to model the kernel. This enables modelling and effectively learning the text along with the historical influences for tweet classification. We demonstrate the advantages of the proposed techniques on standard benchmarks for rumour stance classification.

[297]  arXiv:2010.11852 [pdf, other]
Title: Efficient robust optimal transport: formulations and algorithms
Comments: Technical report
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)

The problem of robust optimal transport (OT) aims at recovering the best transport plan with respect to the worst possible cost function. In this work, we study novel robust OT formulations where the cost function is parameterized by a symmetric positive semi-definite Mahalanobis metric. In particular, we study several different regularizations on the Mahalanobis metric -- element-wise $p$-norm, KL-divergence, or doubly-stochastic constraint -- and show that the resulting optimization formulations can be considerably simplified by exploiting the problem structure. For large-scale applications, we additionally propose a suitable low-dimensional decomposition of the Mahalanobis metric for the studied robust OT problems. Overall, we view the robust OT (min-max) optimization problems as non-linear OT (minimization) problems, which we solve using a Frank-Wolfe algorithm. We discuss the use of robust OT distance as a loss function in multi-class/multi-label classification problems. Empirical results on several real-world tag prediction and multi-class datasets show the benefit of our modeling approach.

[298]  arXiv:2010.11853 [pdf, other]
Title: STAR: A Schema-Guided Dialog Dataset for Transfer Learning
Comments: Equal contribution: Johannes E. M. Mosig, Shikib Mehri
Subjects: Computation and Language (cs.CL)

We present STAR, a schema-guided task-oriented dialog dataset consisting of 127,833 utterances and knowledge base queries across 5,820 task-oriented dialogs in 13 domains that is especially designed to facilitate task and domain transfer learning in task-oriented dialog. Furthermore, we propose a scalable crowd-sourcing paradigm to collect arbitrarily large datasets of the same quality as STAR. Moreover, we introduce novel schema-guided dialog models that use an explicit description of the task(s) to generalize from known to unknown tasks. We demonstrate the effectiveness of these models, particularly for zero-shot generalization across tasks and domains.

[299]  arXiv:2010.11855 [pdf, other]
Title: Detecting and Exorcising Statistical Demons from Language Models with Anti-Models of Negative Data
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

It's been said that "Language Models are Unsupervised Multitask Learners." Indeed, self-supervised language models trained on "positive" examples of English text generalize in desirable ways to many natural language tasks. But if such models can stray so far from an initial self-supervision objective, a wayward model might generalize in undesirable ways too, say to nonsensical "negative" examples of unnatural language. A key question in this work is: do language models trained on (positive) training data also generalize to (negative) test data? We use this question as a contrivance to assess the extent to which language models learn undesirable properties of text, such as n-grams, that might interfere with the learning of more desirable properties of text, such as syntax. We find that within a model family, as the number of parameters, training epochs, and data set size increase, so does a model's ability to generalize to negative n-gram data, indicating standard self-supervision generalizes too far. We propose a form of inductive bias that attenuates such undesirable signals with negative data distributions automatically learned from positive data. We apply the method to remove n-gram signals from LSTMs and find that doing so causes them to favor syntactic signals, as demonstrated by large error reductions (up to 46% on the hardest cases) on a syntactic subject-verb agreement task.

[300]  arXiv:2010.11856 [pdf, other]
Title: XOR QA: Cross-lingual Open-Retrieval Question Answering
Comments: Our data and code are available at this https URL
Subjects: Computation and Language (cs.CL)

Multilingual question answering tasks typically assume answers exist in the same language as the question. Yet in practice, many languages face both information scarcity---where languages have few reference articles---and information asymmetry---where questions reference concepts from other cultures. This work extends open-retrieval question answering to a cross-lingual setting enabling questions from one language to be answered via answer content from another language. We construct a large-scale dataset built on questions from TyDi QA lacking same-language answers. Our task formulation, called Cross-lingual Open Retrieval Question Answering (XOR QA), includes 40k information-seeking questions from across 7 diverse non-English languages. Based on this dataset, we introduce three new tasks that involve cross-lingual document retrieval using multi-lingual and English resources. We establish baselines with state-of-the-art machine translation systems and cross-lingual pretrained models. Experimental results suggest that XOR QA is a challenging task that will facilitate the development of novel techniques for multilingual question answering. Our data and code are available at https://nlp.cs.washington.edu/xorqa.

[301]  arXiv:2010.11858 [pdf, other]
Title: Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Comments: 22 pages, 1 figure
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

We study the problem of policy optimization for infinite-horizon discounted Markov Decision Processes with softmax policy and nonlinear function approximation trained with policy gradient algorithms. We concentrate on the training dynamics in the mean-field regime, modeling e.g., the behavior of wide single hidden layer neural networks, when exploration is encouraged through entropy regularization. The dynamics of these models is established as a Wasserstein gradient flow of distributions in parameter space. We further prove global optimality of the fixed points of this dynamics under mild conditions on their initialization.

[302]  arXiv:2010.11859 [pdf, other]
Title: Not all parameters are born equal: Attention is mostly what you need
Subjects: Computation and Language (cs.CL)

Transformers are widely used in state-of-the-art machine translation, but the key to their success is still unknown. To gain insight into this, we consider three groups of parameters: embeddings, attention, and feed forward neural network (FFN) layers. We examine the relative importance of each by performing an ablation study where we initialise them at random and freeze them, so that their weights do not change over the course of the training. Through this, we show that the attention and FFN are equally important and fulfil the same functionality in a model. We show that the decision about whether a component is frozen or allowed to train is at least as important for the final model performance as its number of parameters. At the same time, the number of parameters alone is not indicative of a component's importance. Finally, while the embedding layer is the least essential for machine translation tasks, it is the most important component for language modelling tasks.

[303]  arXiv:2010.11863 [pdf, other]
Title: Planning with Submodular Objective Functions
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

We study planning with submodular objective functions, where instead of maximizing the cumulative reward, the goal is to maximize the objective value induced by a submodular function. Our framework subsumes standard planning and submodular maximization with cardinality constraints as special cases, and thus many practical applications can be naturally formulated within our framework. Based on the notion of multilinear extension, we propose a novel and theoretically principled algorithmic framework for planning with submodular objective functions, which recovers classical algorithms when applied to the two special cases mentioned above. Empirically, our approach significantly outperforms baseline algorithms on synthetic environments and navigation tasks.

[304]  arXiv:2010.11868 [pdf, other]
Title: Parameter Reduction in Probabilistic Critical Time Evaluation Using Sensitivity Analysis and PCA
Comments: 5 pages
Subjects: Systems and Control (eess.SY)

In this paper, we discuss a method to find the most influential power system parameters to the probabilistic transient stability assessment problem---finding the probability distribution of the critical clearing time. We perform the parameter selection by employing a sensitivity analysis combined with a principal component analysis. First, we determine the sensitivity of the machine angles with respect to all system parameters. Second, we employ the principal component analysis algorithm to identify the most influential parameters in the transient stability problem. By identifying such parameters, we can reduce the number of uncertain parameters to only the influential ones in the probabilistic assessment of transient stability, providing a significant speed-up in the probabilistic analysis of large power systems. The proposed algorithm was tested in the IEEE 14 bus systems and the results obtained show that our method can effectively find the most influential parameters.

[305]  arXiv:2010.11869 [pdf, other]
Title: Rewriting Meaningful Sentences via Conditional BERT Sampling and an application on fooling text classifiers
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Most adversarial attack methods that are designed to deceive a text classifier change the text classifier's prediction by modifying a few words or characters. Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting.
In this paper, we explore the problem of creating adversarial examples with sentence-level rewriting. We design a new sampling method, named ParaphraseSampler, to efficiently rewrite the original sentence in multiple ways. Then we propose a new criteria for modification, called a sentence-level threaten model. This criteria allows for both word- and sentence-level changes, and can be adjusted independently in two dimensions: semantic similarity and grammatical quality. Experimental results show that many of these rewritten sentences are misclassified by the classifier. On all 6 datasets, our ParaphraseSampler achieves a better attack success rate than our baseline.

[306]  arXiv:2010.11870 [pdf, other]
Title: Strengthening SDN Security: Protocol Dialecting and Downgrade Attacks
Comments: 14 pages
Subjects: Networking and Internet Architecture (cs.NI)

Software-defined networking (SDN) has become a fundamental technology for data centers and 5G networks. In an SDN network, routing and traffic management decisions are made by a centralized controller and communicated to switches via a control channel. Transport Layer Security (TLS) has been proposed as its single security layer; however, use of TLS is optional and connections are still vulnerable to downgrade attacks. In this paper, we propose the strengthening of security assurance using a protocol dialecting approach to provide additional and customizable security. We consider and evaluate two dialecting approaches for OpenFlow protocol operation, adding per-message authentication to the SDN control channel that is independent of TLS and provides robustness against downgrade attacks in the optional case of TLS implementation. Furthermore, we measure the performance impact of using these dialecting primitives in a Mininet experiment. The results show a modest increase of communication latency of less than 22%.

[307]  arXiv:2010.11871 [pdf, other]
Title: Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm
Comments: 5 pages, 8 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

In neural network-based monaural speech separation techniques, it has been recently common to evaluate the loss using the permutation invariant training (PIT) loss. However, the ordinary PIT requires to try all $N!$ permutations between $N$ ground truths and $N$ estimates. Since the factorial complexity explodes very rapidly as $N$ increases, a PIT-based training works only when the number of source signals is small, such as $N = 2$ or $3$. To overcome this limitation, this paper proposes a SinkPIT, a novel variant of the PIT losses, which is much more efficient than the ordinary PIT loss when $N$ is large. The SinkPIT is based on Sinkhorn's matrix balancing algorithm, which efficiently finds a doubly stochastic matrix which approximates the best permutation in a differentiable manner. The author conducted an experiment to train a neural network model to decompose a single-channel mixture into 10 sources using the SinkPIT, and obtained promising results.

[308]  arXiv:2010.11876 [pdf, other]
Title: Error Bounds of Imitating Policies and Environments
Authors: Tian Xu, Ziniu Li, Yang Yu
Comments: Appears in NeurIPS 2020
Subjects: Machine Learning (cs.LG)

Imitation learning trains a policy by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understanding needs further studies. In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity. Noticed that by considering the environment transition model as a dual agent, imitation learning can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than behavioral cloning, suggesting a novel application of adversarial imitation for model-based reinforcement learning. We hope these results could inspire future advances in imitation learning and model-based reinforcement learning.

[309]  arXiv:2010.11879 [pdf, other]
Title: Tight two-level convergence of Linear Parareal and MGRIT: Extensions and implications in practice
Subjects: Numerical Analysis (math.NA)

Two of the most popular parallel-in-time methods are Parareal and multigrid-reduction-in-time (MGRIT). Recently, a general convergence theory was developed in Southworth (2019) for linear two-level MGRIT/Parareal that provides necessary and sufficient conditions for convergence, with tight bounds on worst-case convergence factors. This paper starts by providing a new and simplified analysis of linear error and residual propagation of Parareal, wherein the norm of error or residual propagation is given by one over the minimum singular value of a certain block bidiagonal operator. New discussion is then provided on the resulting necessary and sufficient conditions for convergence that arise by appealing to block Toeplitz theory as in Southworth (2019). Practical applications of the theory are discussed, and the convergence bounds demonstrated to predict convergence in practice to high accuracy on two standard linear hyperbolic PDEs: the advection(-diffusion) equation, and the wave equation in first-order form.

[310]  arXiv:2010.11880 [pdf, ps, other]
Title: Fast Approximate CoSimRanks via Random Projections
Authors: Renchi Yang
Comments: 4 pages, manuscript
Subjects: Social and Information Networks (cs.SI); Data Structures and Algorithms (cs.DS)

Given a graph G with n nodes, and two nodes u,v in G, the CoSim-Rank value s(u,v) quantifies the similarity between u and v based on graph topology. Compared to SimRank, CoSimRank has been shown to be more accurate and effective in many real-world applications including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all-pair CoSimRank values in G is highly expensive and challenging. Existing methods all focus on devising approximate algorithms for the computation of all-pair CoSimRanks. To attain the desired absolute error delta, the state-of-the-art approximate algorithm for computing all-pair CoSimRank values requires O(n^3log2(ln(1/delta))) time. In this paper, we propose RP-CoSim, a randomized algorithm for computing all-pair CoSimRank values. The basic idea of RP-CoSim is to reduce the n*n matrix multiplications into a k-dimensional(k<<n) subspace via a random projection such that the pairwise inner product is preserved within a certain error, and then iteratively approximate CoSimRank values in the k-dimensional subspace in O(n^2k) time. Theoretically, RP-CoSimruns in O(n^2*ln(n)*ln(1/delta)/delta^2) time, and meanwhile ensures an absolute error of at most delta in the CoSimRank value of every two nodes in G with a high probability.

[311]  arXiv:2010.11882 [pdf, other]
Title: Learning Invariances in Neural Networks
Comments: NeurIPS 2020. Code available at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

Invariances to translations have imbued convolutional neural networks with powerful generalization properties. However, we often do not know a priori what invariances are present in the data, or to what extent a model should be invariant to a given symmetry group. We show how to \emph{learn} invariances and equivariances by parameterizing a distribution over augmentations and optimizing the training loss simultaneously with respect to the network parameters and augmentation parameters. With this simple procedure we can recover the correct set and extent of invariances on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations, on training data alone.

[312]  arXiv:2010.11884 [pdf]
Title: AEGIS: A real-time multimodal augmented reality computer vision based system to assist facial expression recognition for individuals with autism spectrum disorder
Comments: 4 pages, 1 figure
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)

The ability to interpret social cues comes naturally for most people, but for those living with Autism Spectrum Disorder (ASD), some experience a deficiency in this area. This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN) in order to assist individuals with the detection and interpretation of facial expressions in social settings. The proposed system, which we call AEGIS (Augmented-reality Expression Guided Interpretation System), is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses, showcasing its extreme flexibility and wide range of use cases, to allow integration into daily life with ease. Given a streaming video camera source, each real-world frame is passed into AEGIS, processed for facial bounding boxes, and then fed into our novel deep convolutional time windowed neural network (TimeConvNet). We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame. The system runs in real-time, requires minimal set up and is simple to use. With the use of AEGIS, we can assist individuals living with ASD to learn to better identify expressions and thus improve their social experiences.

[313]  arXiv:2010.11886 [pdf, other]
Title: GAZED- Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings
Comments: 10 pages
Journal-ref: In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1-11
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Multimedia (cs.MM)

We present GAZED- eye GAZe-guided EDiting for videos captured by a solitary, static, wide-angle and high-resolution camera. Eye-gaze has been effectively employed in computational applications as a cue to capture interesting scene content; we employ gaze as a proxy to select shots for inclusion in the edited video. Given the original video, scene content and user eye-gaze tracks are combined to generate an edited video comprising cinematically valid actor shots and shot transitions to generate an aesthetic and vivid representation of the original narrative. We model cinematic video editing as an energy minimization problem over shot selection, whose constraints capture cinematographic editing conventions. Gazed scene locations primarily determine the shots constituting the edited video. Effectiveness of GAZED against multiple competing methods is demonstrated via a psychophysical study involving 12 users and twelve performance videos.

[314]  arXiv:2010.11887 [pdf, ps, other]
Title: Conditional independence by typing
Subjects: Programming Languages (cs.PL); Machine Learning (cs.LG); Machine Learning (stat.ML)

A central goal of probabilistic programming languages (PPLs) is to separate modelling from inference. However, this goal is hard to achieve in practice. Users are often forced to re-write their models in order to improve efficiency of inference or meet restrictions imposed by the PPL. Conditional independence (CI) relationships among parameters are a crucial aspect of probabilistic models that captures a qualitative summary of the specified model and can facilitate more efficient inference.
We present an information flow type system for probabilistic programming that captures conditional independence (CI) relationships, and show that, for a well-typed program in our system, the distribution it implements is guaranteed to have certain CI-relationships. Further, by using type inference, we can statically \emph{deduce} which CI-properties are present in a specified model.
As a practical application, we consider the problem of how to perform inference on models with mixed discrete and continuous parameters. Inference on such models is challenging in many existing PPLs, but can be improved through a workaround, where the discrete parameters are used \textit{implicitly}, at the expense of manual model re-writing. We present a source-to-source semantics-preserving transformation, which uses our CI-type system to automate this workaround by eliminating the discrete parameters from a probabilistic program. The resulting program can be seen as a hybrid inference algorithm on the original program, where continuous parameters can be drawn using efficient gradient-based inference methods, while the discrete parameters are drawn using variable elimination.
We implement our CI-type system and its example application in SlicStan: a compositional variant of Stan.

[315]  arXiv:2010.11893 [pdf, other]
Title: ParaLarH: Parallel FPGA Router based upon Lagrange Heuristics
Comments: 10 pages, 3 Figures, and 5 Tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Routing of the nets in Field Programmable Gate Array (FPGA) design flow is one of the most time consuming steps. Although Versatile Place and Route (VPR), which is a commonly used algorithm for this purpose, routes effectively, it is slow in execution. One way to accelerate this design flow is to use parallelization. Since VPR is intrinsically sequential, a set of parallel algorithms have been recently proposed for this purpose (ParaLaR and ParaLarPD).
These algorithms formulate the routing process as a Linear Program (LP) and solve it using the Lagrange relaxation, the sub-gradient method, and the Steiner tree algorithm. Out of the many metrics available to check the effectiveness of routing, ParaLarPD, which is an improved version of ParaLaR, suffers from large violations in the constraints of the LP problem (which is related to the minimum channel width metric) as well as an easily measurable critical path delay metric that can be improved further.
In this paper, we introduce a set of novel Lagrange heuristics that improve the Lagrange relaxation process. When tested on the MCNC benchmark circuits, on an average, this leads to halving of the constraints violation, up to 10% improvement in the minimum channel width, and up to 8% reduction in the critical path delay as obtained from ParaLarPD. We term our new algorithm as ParaLarH. Due to the increased work in the Lagrange relaxation process, as compared to ParaLarPD, ParaLarH does slightly deteriorate the speedup obtained because of parallelization, however, this aspect is easily compensated by using more number of threads.

[316]  arXiv:2010.11895 [pdf, other]
Title: What are the Statistical Limits of Offline RL with Linear Function Approximation?
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)

Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions.
This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning. Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of \emph{every} policy is linear in a given set of features and 2) our off-policy data has good coverage over all features (under a strong spectral condition), then any algorithm still (information-theoretically) requires a number of offline samples that is exponential in the problem horizon in order to non-trivially estimate the value of \emph{any} given policy. Our results highlight that sample-efficient offline policy evaluation is simply not possible unless significantly stronger conditions hold; such conditions include either having low distribution shift (where the offline data distribution is close to the distribution of the policy to be evaluated) or significantly stronger representational conditions (beyond realizability).

[317]  arXiv:2010.11897 [pdf]
Title: A Visual Analytics Based Decision Making Environment for COVID-19 Modeling and Visualization
Subjects: Human-Computer Interaction (cs.HC)

Public health officials dealing with pandemics like COVID-19 have to evaluate and prepare response plans. This planning phase requires not only looking into the spatiotemporal dynamics and impact of the pandemic using simulation models, but they also need to plan and ensure the availability of resources under different spread scenarios. To this end, we have developed a visual analytics environment that enables public health officials to model, simulate, and explore the spread of COVID-19 by supplying county-level information such as population, demographics, and hospital beds. This environment facilitates users to explore spatiotemporal model simulation data relevant to COVID-19 through a geospatial map with linked statistical views, apply different decision measures at different points in time, and understand their potential impact. Users can drill-down to county-level details such as the number of sicknesses, deaths, needs for hospitalization, and variations in these statistics over time. We demonstrate the usefulness of this environment through a use case study and also provide feedback from domain experts. We also provide details about future extensions and potential applications of this work.

[318]  arXiv:2010.11904 [pdf, other]
Title: Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Most music source separation systems require large collections of isolated sources for training, which can be difficult to obtain. In this work, we use musical scores, which are comparatively easy to obtain, as a weak label for training a source separation system. In contrast with previous score-informed separation approaches, our system does not require isolated sources, and score is used only as a training target, not required for inference. Our model consists of a separator that outputs a time-frequency mask for each instrument, and a transcriptor that acts as a critic, providing both temporal and frequency supervision to guide the learning of the separator. A harmonic mask constraint is introduced as another way of leveraging score information during training, and we propose two novel adversarial losses for additional fine-tuning of both the transcriptor and the separator. Results demonstrate that using score information outperforms temporal weak-labels, and adversarial structures lead to further improvements in both separation and transcription performance.

[319]  arXiv:2010.11910 [pdf, other]
Title: Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning
Comments: Preprint, submitted to ICASSP 2021
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale. In this work, we generate a low-dimensional representation from a short unit segment of audio, and couple this fingerprint with a fast maximum inner-product search. To this end, we present a contrastive learning framework that derives from the segment-level search objective. Each update in training uses a batch consisting of a set of pseudo labels, randomly selected original samples, and their augmented replicas. These replicas can simulate the degrading effects on original audio signals by applying small time offsets and various types of distortions, such as background noise and room/microphone impulse responses. In the segment-level search task, where the conventional audio fingerprinting systems used to fail, our system using 10x smaller storage has shown promising results. Codes and dataset will be available.

[320]  arXiv:2010.11911 [pdf, other]
Title: Source localization using particle filtering on FPGA for robotic navigation with imprecise binary measurement
Subjects: Robotics (cs.RO); Signal Processing (eess.SP)

Particle filtering is a recursive Bayesian estimation technique that has gained popularity recently for tracking and localization applications. It uses Monte Carlo simulation and has proven to be a very reliable technique to model non-Gaussian and non-linear elements of physical systems. Particle filters outperform various other traditional filters like Kalman filters in non-Gaussian and non-linear settings due to their non-analytical and non-parametric nature. However, a significant drawback of particle filters is their computational complexity, which inhibits their use in real-time applications with conventional CPU or DSP based implementation schemes. This paper proposes a modification to the existing particle filter algorithm and presents a highspeed and dedicated hardware architecture. The architecture incorporates pipelining and parallelization in the design to reduce execution time considerably. The design is validated for a source localization problem wherein we estimate the position of a source in real-time using the particle filter algorithm implemented on hardware. The validation setup relies on an Unmanned Ground Vehicle (UGV) with a photodiode housing on top to sense and localize a light source. We have prototyped the design using Artix-7 field-programmable gate array (FPGA), and resource utilization for the proposed system is presented. Further, we show the execution time and estimation accuracy of the high-speed architecture and observe a significant reduction in computational time. Our implementation of particle filters on FPGA is scalable and modular, with a low execution time of about 5.62 us for processing 1024 particles and can be deployed for real-time applications.

[321]  arXiv:2010.11913 [pdf, other]
Title: Mass-conservative and positivity preserving second-order semi-implicit methods for high-order parabolic equations
Subjects: Numerical Analysis (math.NA)

We consider a class of finite element approximations for fourth-order parabolic equations that can be written as a system of second-order equations by introducing an auxiliary variable. In our approach, we first solve a variational problem and then an optimization problem to satisfy the desired physical properties of the solution such as conservation of mass, positivity (non-negativity) of solution and dissipation of energy. Furthermore, we show existence and uniqueness of the solution to the optimization problem and we prove that the methods converge to the truncation schemes \cite{Berger1975}. We also propose new conservative truncation methods for high-order parabolic equations. A numerical convergence study is performed and a series of numerical tests are presented to show and compare the efficiency and robustness of the different schemes.

[322]  arXiv:2010.11915 [pdf, ps, other]
Title: Challenges in Information Seeking QA:Unanswerable Questions and Paragraph Retrieval
Subjects: Computation and Language (cs.CL)

Recent progress in pretrained language model "solved" many reading comprehension benchmark datasets. Yet information-seeking Question Answering (QA) datasets, where questions are written without the evidence document, remain unsolved. We analyze two such datasets (Natural Questions and TyDi QA) to identify remaining headrooms: paragraph selection and answerability classification, i.e. determining whether the paired evidence document contains the answer to the query or not. In other words, given a gold paragraph and knowing whether it contains an answer or not, models easily outperform a single annotator in both datasets. After identifying unanswerability as a bottleneck, we further inspect what makes questions unanswerable. Our study points to avenues for future research, both for dataset creation and model development.

[323]  arXiv:2010.11917 [pdf, other]
Title: Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
Comments: 10 Pages, 9 Figures
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Learning from diverse offline datasets is a promising path towards learning general purpose robotic agents. However, a core challenge in this paradigm lies in collecting large amounts of meaningful data, while not depending on a human in the loop for data collection. One way to address this challenge is through task-agnostic exploration, where an agent attempts to explore without a task-specific reward function, and collect data that can be useful for any downstream task. While these approaches have shown some promise in simple domains, they often struggle to explore the relevant regions of the state space in more challenging settings, such as vision based robotic manipulation. This challenge stems from an objective that encourages exploring everything in a potentially vast state space. To mitigate this challenge, we propose to focus exploration on the important parts of the state space using weak human supervision. Concretely, we propose an exploration technique, Batch Exploration with Examples (BEE), that explores relevant regions of the state-space, guided by a modest number of human provided images of important states. These human provided images only need to be collected once at the beginning of data collection and can be collected in a matter of minutes, allowing us to scalably collect diverse datasets, which can then be combined with any batch RL algorithm. We find that BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot, and observe that compared to task-agnostic and weakly-supervised exploration techniques, it (1) interacts more than twice as often with relevant objects, and (2) improves downstream task performance when used in conjunction with offline RL.

[324]  arXiv:2010.11918 [pdf, other]
Title: AdapterDrop: On the Efficiency of Adapters in Transformers
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)

Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters. In this paper, we propose AdapterDrop, removing adapters from lower transformer layers during training and inference, which incorporates concepts from all three directions. We show that AdapterDrop can dynamically reduce the computational overhead when performing inference over multiple tasks simultaneously, with minimal decrease in task performances. We further prune adapters from AdapterFusion, which improves the inference efficiency while maintaining the task performances entirely.

[325]  arXiv:2010.11924 [pdf, other]
Title: In Search of Robust Measures of Generalization
Comments: 27 pages, 11 figures, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)

One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neural network architectures -- are unable to explain empirical performance. A large volume of work aims to close this gap, primarily by developing bounds on generalization error, optimization error, and excess risk. When evaluated empirically, however, most of these bounds are numerically vacuous. Focusing on generalization bounds, this work addresses the question of how to evaluate such bounds empirically. Jiang et al. (2020) recently described a large-scale empirical study aimed at uncovering potential causal relationships between bounds/measures and generalization. Building on their study, we highlight where their proposed methods can obscure failures and successes of generalization measures in explaining generalization. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.

[326]  arXiv:2010.11925 [pdf, ps, other]
Title: The Polynomial Method is Universal for Distribution-Free Correlational SQ Learning
Comments: 18 pages
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

We consider the problem of distribution-free learning for Boolean function classes in the PAC and agnostic models. Generalizing a recent beautiful work of Malach and Shalev-Shwartz \cite{malach2020hardness} who gave the first tight correlational SQ (CSQ) lower bounds for learning DNF formulas, we show that lower bounds on the threshold or approximate degree of any function class directly imply CSQ lower bounds for PAC or agnostic learning respectively. These match corresponding positive results using upper bounds on the threshold or approximate degree in the SQ model for PAC or agnostic learning, and in this sense our results show that the polynomial method is a universal, best-possible approach for distribution-free CSQ learning.
Our results imply the first exponential (in the dimension) CSQ lower bounds for PAC learning intersections of two halfspaces and constant depth circuits as well as the first exponential lower bounds for agnostically learning conjunctions and halfspaces.

[327]  arXiv:2010.11926 [pdf, ps, other]
Title: Neural-Symbolic Integration: A Compositional Perspective
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Despite significant progress in the development of neural-symbolic frameworks, the question of how to integrate a neural and a symbolic system in a \emph{compositional} manner remains open. Our work seeks to fill this gap by treating these two systems as black boxes to be integrated as modules into a single architecture, without making assumptions on their internal structure and semantics. Instead, we expect only that each module exposes certain methods for accessing the functions that the module implements: the symbolic module exposes a deduction method for computing the function's output on a given input, and an abduction method for computing the function's inputs for a given output; the neural module exposes a deduction method for computing the function's output on a given input, and an induction method for updating the function given input-output training instances. We are, then, able to show that a symbolic module -- with any choice for syntax and semantics, as long as the deduction and abduction methods are exposed -- can be cleanly integrated with a neural module, and facilitate the latter's efficient training, achieving empirical performance that exceeds that of previous work.

[328]  arXiv:2010.11929 [pdf, other]
Title: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Comments: Fine-tuning code and pre-trained models are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

[329]  arXiv:2010.11930 [pdf, other]
Title: Scientific Claim Verification with VERT5ERINI
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)

This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain. We propose VERT5ERINI that exploits T5 for abstract retrieval, sentence selection and label prediction, which are three critical sub-tasks of claim verification. We evaluate our pipeline on SCIFACT, a newly curated dataset that requires models to not just predict the veracity of claims but also provide relevant sentences from a corpus of scientific literature that support this decision. Empirically, our pipeline outperforms a strong baseline in each of the three steps. Finally, we show VERT5ERINI's ability to generalize to two new datasets of COVID-19 claims using evidence from the ever-expanding CORD-19 corpus.

[330]  arXiv:2010.11931 [pdf, other]
Title: Brain-Inspired Learning on Neuromorphic Substrates
Comments: All authors contributed equally
Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)

Neuromorphic hardware strives to emulate brain-like neural networks and thus holds the promise for scalable, low-power information processing on temporal data streams. Yet, to solve real-world problems, these networks need to be trained. However, training on neuromorphic substrates creates significant challenges due to the offline character and the required non-local computations of gradient-based learning algorithms. This article provides a mathematical framework for the design of practical online learning algorithms for neuromorphic substrates. Specifically, we show a direct connection between Real-Time Recurrent Learning (RTRL), an online algorithm for computing gradients in conventional Recurrent Neural Networks (RNNs), and biologically plausible learning rules for training Spiking Neural Networks (SNNs). Further, we motivate a sparse approximation based on block-diagonal Jacobians, which reduces the algorithm's computational complexity, diminishes the non-local information requirements, and empirically leads to good learning performance, thereby improving its applicability to neuromorphic substrates. In summary, our framework bridges the gap between synaptic plasticity and gradient-based approaches from deep learning and lays the foundations for powerful information processing on future neuromorphic hardware systems.

[331]  arXiv:2010.11932 [pdf, other]
Title: Minimal Exposure Dubins Orienteering Problem
Subjects: Robotics (cs.RO)

Different applications, such as environmental monitoring and military operations, demand the observation of predefined target locations, and an autonomous mobile robot can assist in these tasks. In this context, the Orienteering Problem (OP) is a well-known routing problem, in which the goal is to maximize the objective function by visiting the most rewarding locations, however, respecting a limited travel budget (e.g., length, time, energy). However, traditional formulations for routing problems generally neglect some environment peculiarities, such as obstacles or threatening zones. In this paper, we tackle the OP considering Dubins vehicles in the presence of a known deployed sensor field. We propose a novel multi-objective formulation called Minimal Exposure Dubins Orienteering Problem (MEDOP), whose main objectives are: (i) maximize the collected reward, and (ii) minimize the exposure of the agent, i.e., the probability of being detected. The solution is based on an evolutionary algorithm that iteratively varies the subset and sequence of locations to be visited, the orientations on each location, and the turning radius used to determine the paths. Results show that our approach can efficiently find a diverse set of solutions that simultaneously optimize both objectives.

[332]  arXiv:2010.11934 [pdf, other]
Title: mT5: A massively multilingual pre-trained text-to-text transformer
Subjects: Computation and Language (cs.CL)

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available.

[333]  arXiv:2010.11935 [pdf, other]
Title: Coded Data Rebalancing for Decentralized Distributed Databases
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

The performance of replication-based distributed databases is affected due to non-uniform storage across storage nodes (also called \textit{data skew}) and reduction in the replication factor during operation, particularly due to node additions or removals. Data rebalancing refers to the communication involved between the nodes in correcting this data skew, while maintaining the replication factor. For carefully designed distributed databases, transmitting coded symbols during the rebalancing phase has been recently shown to reduce the communication load of rebalancing. In this work, we look at balanced distributed databases with \textit{random placement}, in which each data segment is stored in a random subset of $r$ nodes in the system, where $r$ refers to the replication factor of the distributed database. We call these as decentralized databases. For a natural class of such decentralized databases, we propose rebalancing schemes for correcting data skew and reinstating the replication factor arising due to a single node addition or removal. We give converse arguments which show that our proposed rebalancing schemes are optimal asymptotically in the size of the file.

[334]  arXiv:2010.11936 [pdf, other]
Title: UniCase -- Rethinking Casing in Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

In this paper, we introduce a new approach to dealing with the problem of case-sensitiveness in Language Modelling (LM). We propose simple architecture modification to the RoBERTa language model, accompanied by a new tokenization strategy, which we named Unified Case LM (UniCase). We tested our solution on the GLUE benchmark, which led to increased performance by 0.42 points. Moreover, we prove that the UniCase model works much better when we have to deal with text data, where all tokens are uppercased (+5.88 point).

[335]  arXiv:2010.11939 [pdf, other]
Title: Autoregressive Modeling is Misspecified for Some Sequence Distributions
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)

Should sequences be modeled autoregressively---one symbol at a time? How much computation is needed to predict the next symbol? While local normalization is cheap, this also limits its power. We point out that some probability distributions over discrete sequences cannot be well-approximated by any autoregressive model whose runtime and parameter size grow polynomially in the sequence length---even though their unnormalized sequence probabilities are efficient to compute exactly. Intuitively, the probability of the next symbol can be expensive to compute or approximate (even via randomized algorithms) when it marginalizes over exponentially many possible futures, which is in general $\mathrm{NP}$-hard. Our result is conditional on the widely believed hypothesis that $\mathrm{NP} \nsubseteq \mathrm{P/poly}$ (without which the polynomial hierarchy would collapse at the second level). This theoretical observation serves as a caution to the viewpoint that pumping up parameter size is a straightforward way to improve autoregressive models (e.g., in language modeling). It also suggests that globally normalized (energy-based) models may sometimes outperform locally normalized (autoregressive) models, as we demonstrate experimentally for language modeling.

[336]  arXiv:2010.11940 [pdf, other]
Title: Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments
Comments: Published at the Conference on Robot Learning (CoRL) 2020
Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. In contrast, motion planners use explicit models of the agent and environment to plan collision-free paths to faraway goals, but suffer from inaccurate models in tasks that require contacts with the environment. To combine the benefits of both approaches, we propose motion planner augmented RL (MoPA-RL) which augments the action space of an RL agent with the long-horizon planning capabilities of motion planners. Based on the magnitude of the action, our approach smoothly transitions between directly executing the action and invoking a motion planner. We evaluate our approach on various simulated manipulation tasks and compare it to alternative action spaces in terms of learning efficiency and safety. The experiments demonstrate that MoPA-RL increases learning efficiency, leads to a faster exploration, and results in safer policies that avoid collisions with the environment. Videos and code are available at https://clvrai.com/mopa-rl .

[337]  arXiv:2010.11943 [pdf, other]
Title: Few-Shot Adaptation of Generative Adversarial Networks
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Generative Adversarial Networks (GANs) have shown remarkable performance in image synthesis tasks, but typically require a large number of training samples to achieve high-quality synthesis. This paper proposes a simple and effective method, Few-Shot GAN (FSGAN), for adapting GANs in few-shot settings (less than 100 images). FSGAN repurposes component analysis techniques and learns to adapt the singular values of the pre-trained weights while freezing the corresponding singular vectors. This provides a highly expressive parameter space for adaptation while constraining changes to the pretrained weights. We validate our method in a challenging few-shot setting of 5-100 images in the target domain. We show that our method has significant visual quality gains compared with existing GAN adaptation methods. We report qualitative and quantitative results showing the effectiveness of our method. We additionally highlight a problem for few-shot synthesis in the standard quantitative metric used by data-efficient image synthesis works. Code and additional results are available at this http URL

[338]  arXiv:2010.11944 [pdf, other]
Title: Accelerating Reinforcement Learning with Learned Skill Priors
Comments: 4th Conference on Robot Learning (CoRL), 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Intelligent agents rely heavily on prior experience when learning a new task, yet most modern reinforcement learning (RL) approaches learn every task from scratch. One approach for leveraging prior knowledge is to transfer skills learned on prior tasks to the new task. However, as the amount of prior experience increases, the number of transferable skills grows too, making it challenging to explore the full set of available skills during downstream learning. Yet, intuitively, not all skills should be explored with equal probability; for example information about the current state can hint which skills are promising to explore. In this work, we propose to implement this intuition by learning a prior over skills. We propose a deep latent variable model that jointly learns an embedding space of skills and the skill prior from offline agent experience. We then extend common maximum-entropy RL approaches to use skill priors to guide downstream learning. We validate our approach, SPiRL (Skill-Prior RL), on complex navigation and robotic manipulation tasks and show that learned skill priors are essential for effective skill transfer from rich datasets. Videos and code are available at https://clvrai.com/spirl.

Cross-lists for Fri, 23 Oct 20

[339]  arXiv:2010.11189 (cross-list from quant-ph) [pdf, ps, other]
Title: Quantum Deformed Neural Networks
Subjects: Quantum Physics (quant-ph); Machine Learning (cs.LG)

We develop a new quantum neural network layer designed to run efficiently on a quantum computer but that can be simulated on a classical computer when restricted in the way it entangles input states. We first ask how a classical neural network architecture, both fully connected or convolutional, can be executed on a quantum computer using quantum phase estimation. We then deform the classical layer into a quantum design which entangles activations and weights into quantum superpositions. While the full model would need the exponential speedups delivered by a quantum computer, a restricted class of designs represent interesting new classical network layers that still use quantum features. We show that these quantum deformed neural networks can be trained and executed on normal data such as images, and even classically deliver modest improvements over standard architectures.

[340]  arXiv:2010.11194 (cross-list from astro-ph.IM) [pdf, other]
Title: Anomaly Detection for Multivariate Time Series of Exotic Supernovae
Comments: 6 pages, 2 figures, written for non-astronomers, submitted to the NeurIPS workshop Machine Learning and the Physical Sciences. Comments welcome!!
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); High Energy Astrophysical Phenomena (astro-ph.HE); Machine Learning (cs.LG)

Supernovae mark the explosive deaths of stars and enrich the cosmos with heavy elements. Future telescopes will discover thousands of new supernovae nightly, creating a need to flag astrophysically interesting events rapidly for followup study. Ideally, such an anomaly detection pipeline would be independent of our current knowledge and be sensitive to unexpected phenomena. Here we present an unsupervised method to search for anomalous time series in real time for transient, multivariate, and aperiodic signals. We use a RNN-based variational autoencoder to encode supernova time series and an isolation forest to search for anomalous events in the learned encoded space. We apply this method to a simulated dataset of 12,159 supernovae, successfully discovering anomalous supernovae and objects with catastrophically incorrect redshift measurements. This work is the first anomaly detection pipeline for supernovae which works with online datastreams.

[341]  arXiv:2010.11213 (cross-list from stat.ML) [pdf, other]
Title: Precise Statistical Analysis of Classification Accuracies for Adversarial Training
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)

Despite the wide empirical success of modern machine learning algorithms and models in a multitude of applications, they are known to be highly susceptible to seemingly small indiscernible perturbations to the input data known as adversarial attacks. A variety of recent adversarial training procedures have been proposed to remedy this issue. Despite the success of such procedures at increasing accuracy on adversarially perturbed inputs or robust accuracy, these techniques often reduce accuracy on natural unperturbed inputs or standard accuracy. Complicating matters further the effect and trend of adversarial training procedures on standard and robust accuracy is rather counter intuitive and radically dependent on a variety of factors including the perceived form of the perturbation during training, size/quality of data, model overparameterization, etc. In this paper we focus on binary classification problems where the data is generated according to the mixture of two Gaussians with general anisotropic covariance matrices and derive a precise characterization of the standard and robust accuracy for a class of minimax adversarially trained models. We consider a general norm-based adversarial model, where the adversary can add perturbations of bounded $\ell_p$ norm to each input data, for an arbitrary $p\ge 1$. Our comprehensive analysis allows us to theoretically explain several intriguing empirical phenomena and provide a precise understanding of the role of different problem parameters on standard and robust accuracies.

[342]  arXiv:2010.11221 (cross-list from eess.AS) [pdf, other]
Title: Learning Speaker Embedding from Text-to-Speech
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Zero-shot multi-speaker Text-to-Speech (TTS) generates target speaker voices given an input text and the corresponding speaker embedding. In this work, we investigate the effectiveness of the TTS reconstruction objective to improve representation learning for speaker verification. We jointly trained end-to-end Tacotron 2 TTS and speaker embedding networks in a self-supervised fashion. We hypothesize that the embeddings will contain minimal phonetic information since the TTS decoder will obtain that information from the textual input. TTS reconstruction can also be combined with speaker classification to enhance these embeddings further. Once trained, the speaker encoder computes representations for the speaker verification task, while the rest of the TTS blocks are discarded. We investigated training TTS from either manual or ASR-generated transcripts. The latter allows us to train embeddings on datasets without manual transcripts. We compared ASR transcripts and Kaldi phone alignments as TTS inputs, showing that the latter performed better due to their finer resolution. Unsupervised TTS embeddings improved EER by 2.06\% absolute with regard to i-vectors for the LibriTTS dataset. TTS with speaker classification loss improved EER by 0.28\% and 0.73\% absolutely from a model using only speaker classification loss in LibriTTS and Voxceleb1 respectively.

[343]  arXiv:2010.11286 (cross-list from eess.AS) [pdf, other]
Title: Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Network
Comments: 5 pages, submitted to ICASSP'2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Anomalous audio in speech recordings is often caused by speaker voice distortion, external noise, or even electric interferences. These obstacles have become a serious problem in some fields, such as recording high-quality music and speech processing. In this paper, a novel approach using a temporal convolutional attention network (TCAN) is proposed to process this problem. The use of temporal conventional network (TCN) can capture long range patterns using a hierarchy of temporal convolutional filters. To enhance the ability to tackle audio anomalies in different acoustic conditions, an attention mechanism is used in TCN, where a self-attention block is added after each temporal convolutional layer. This aims to highlight the target related features and mitigate the interferences from irrelevant information. To evaluate the performance of the proposed model, audio recordings are collected from the TIMIT dataset, and are then changed by adding five different types of audio distortions: gaussian noise, magnitude drift, random dropout, reduction of temporal resolution, and time warping. Distortions are mixed at different signal-to-noise ratios (SNRs) (5dB, 10dB, 15dB, 20dB, 25dB, 30dB). The experimental results show that the use of proposed model can yield good classification performances and outperforms some strong baseline methods, such as the LSTM and TCN based models, by about 3$\sim$ 10\% relatively.

[344]  arXiv:2010.11292 (cross-list from math.OC) [pdf, ps, other]
Title: Decentralized optimization over noisy, rate-constrained networks: How to agree by talking about how we disagree
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)

In decentralized optimization, multiple nodes in a network collaborate to minimize the sum of their local loss functions. The information exchange between nodes required for the task is often limited by network connectivity. We consider a generalization of this setting in which communication is further hindered by (i) a finite data-rate constraint on the signal transmitted by any node, and (ii) an additive noise corruption of the signal received by any node. We develop a novel algorithm for this scenario: Decentralized Lazy Mirror Descent with Differential Exchanges (DLMD-DiffEx), which guarantees convergence of the local estimates to the optimal solution under the given communication constraints. A salient feature of DLMD-DiffEx is the careful design of the evolution of proxy variables which are maintained to account for the disagreement in estimates at the nodes due to channel noise and data-rate constraints. We investigate the performance of DLMD-DiffEx both from a theoretical perspective as well as through numerical evaluations.

[345]  arXiv:2010.11345 (cross-list from stat.ML) [pdf, other]
Title: Network topology change-point detection from graph signals with prior spectral signatures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)

We consider the problem of sequential graph topology change-point detection from graph signals. We assume that signals on the nodes of the graph are regularized by the underlying graph structure via a graph filtering model, which we then leverage to distill the graph topology change-point detection problem to a subspace detection problem. We demonstrate how prior information on the spectral signature of the post-change graph can be incorporated to implicitly denoise the observed sequential data, thus leading to a natural CUSUM-based algorithm for change-point detection. Numerical experiments illustrate the performance of our proposed approach, particularly underscoring the benefits of (potentially noisy) prior information.

[346]  arXiv:2010.11348 (cross-list from physics.flu-dyn) [pdf, ps, other]
Title: Deep Learning for Efficient Reconstruction of High-Resolution Turbulent DNS Data
Comments: 12 pages, 7 figures
Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)

Within the domain of Computational Fluid Dynamics, Direct Numerical Simulation (DNS) is used to obtain highly accurate numerical solutions for fluid flows. However, this approach for numerically solving the Navier-Stokes equations is extremely computationally expensive mostly due to the requirement of greatly refined grids. Large Eddy Simulation (LES) presents a more computationally efficient approach for solving fluid flows on lower-resolution (LR) grids but results in an overall reduction in solution fidelity. Through this paper, we introduce a novel deep learning framework SR-DNS Net, which aims to mitigate this inherent trade-off between solution fidelity and computational complexity by leveraging deep learning techniques used in image super-resolution. Using our model, we wish to learn the mapping from a coarser LR solution to a refined high-resolution (HR) DNS solution so as to eliminate the need for DNS simulations on highly refined grids. Our model efficiently reconstructs the high-fidelity DNS data from the LES like low-resolution solutions while yielding good reconstruction metrics. Thus our implementation improves the solution accuracy of LR solutions while incurring only a marginal increase in computational cost required for deploying the trained deep learning model.

[347]  arXiv:2010.11356 (cross-list from stat.ML) [pdf, ps, other]
Title: Beyond Lazy Training for Over-parameterized Tensor Decomposition
Comments: NeurIPS 2020; the first two authors contribute equally
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{\otimes l}$ of rank $r$ (where $r\ll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}\log d)$. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.

[348]  arXiv:2010.11366 (cross-list from stat.ML) [pdf, ps, other]
Title: Random Coordinate Underdamped Langevin Monte Carlo
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The Underdamped Langevin Monte Carlo (ULMC) is a popular Markov chain Monte Carlo sampling method. It requires the computation of the full gradient of the log-density at each iteration, an expensive operation if the dimension of the problem is high. We propose a sampling method called Random Coordinate ULMC (RC-ULMC), which selects a single coordinate at each iteration to be updated and leaves the other coordinates untouched. We investigate the computational complexity of RC-ULMC and compare it with the classical ULMC for strongly log-concave probability distributions. We show that RC-ULMC is always cheaper than the classical ULMC, with a significant cost reduction when the problem is highly skewed and high dimensional. Our complexity bound for RC-ULMC is also tight in terms of dimension dependence.

[349]  arXiv:2010.11370 (cross-list from q-bio.NC) [pdf, other]
Title: A Multi-Componential Approach to Emotion Recognition and the Effect of Personality
Comments: 13 pages
Journal-ref: IEEE Transactions on Affective Computing, 2020
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Emotions are an inseparable part of human nature affecting our behavior in response to the outside world. Although most empirical studies have been dominated by two theoretical models including discrete categories of emotion and dichotomous dimensions, results from neuroscience approaches suggest a multi-processes mechanism underpinning emotional experience with a large overlap across different emotions. While these findings are consistent with the influential theories of emotion in psychology that emphasize a role for multiple component processes to generate emotion episodes, few studies have systematically investigated the relationship between discrete emotions and a full componential view. This paper applies a componential framework with a data-driven approach to characterize emotional experiences evoked during movie watching. The results suggest that differences between various emotions can be captured by a few (at least 6) latent dimensions, each defined by features associated with component processes, including appraisal, expression, physiology, motivation, and feeling. In addition, the link between discrete emotions and component model is explored and results show that a componential model with a limited number of descriptors is still able to predict the level of experienced discrete emotion(s) to a satisfactory level. Finally, as appraisals may vary according to individual dispositions and biases, we also study the relationship between personality traits and emotions in our computational framework and show that the role of personality on discrete emotion differences can be better justified using the component model.

[350]  arXiv:2010.11375 (cross-list from eess.IV) [pdf]
Title: Deep Learning for Distinguishing Normal versus Abnormal Chest Radiographs and Generalization to Unseen Diseases
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to build specific systems to detect every possible condition. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For development, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system generalizes to new patient populations and abnormalities. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist.

[351]  arXiv:2010.11391 (cross-list from eess.SP) [pdf, ps, other]
Title: Unfolding Neural Networks for Compressive Multichannel Blind Deconvolution
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)

We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution. In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter. Unlike prior works where the compression is achieved either through random projections or by applying a fixed structured compression matrix, this paper proposes to learn the compression matrix from data. Given the full measurements, the proposed network is trained in an unsupervised fashion to learn the source and estimate sparse filters. Then, given the estimated source, we learn a structured compression operator while optimizing for signal reconstruction and sparse filter recovery. The efficient structure of the compression allows its practical hardware implementation. The proposed neural network is an autoencoder constructed based on an unfolding approach: upon training, the encoder maps the compressed measurements into an estimate of sparse filters using the compression operator and the source, and the linear convolutional decoder reconstructs the full measurements. We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.

[352]  arXiv:2010.11408 (cross-list from eess.AS) [pdf, ps, other]
Title: Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020
Comments: Accepted in INTERSPEECH 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods (e.g., statistical, self-attentive, ghostVLAD pooling). Although the conventional pooling methods provide embeddings with a sufficient amount of speaker-dependent information, our experiments show that these embeddings often lack phrase-dependent information. To mitigate this problem, we propose a new pooling and score compensation methods that leverage a CTC-based automatic speech recognition (ASR) model for taking the lexical content into account. Both methods showed improvement over the conventional techniques, and the best performance was achieved by fusing all the experimented systems, which showed 0.0785% MinDCF and 2.23% EER on the challenge's evaluation subset.

[353]  arXiv:2010.11423 (cross-list from eess.IV) [pdf, other]
Title: DeepCSR: A 3D Deep Learning Approach for Cortical Surface Reconstruction
Comments: Accepted in 2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The study of neurodegenerative diseases relies on the reconstruction and analysis of the brain cortex from magnetic resonance imaging (MRI). Traditional frameworks for this task like FreeSurfer demand lengthy runtimes, while its accelerated variant FastSurfer still relies on a voxel-wise segmentation which is limited by its resolution to capture narrow continuous objects as cortical surfaces. Having these limitations in mind, we propose DeepCSR, a 3D deep learning framework for cortical surface reconstruction from MRI. Towards this end, we train a neural network model with hypercolumn features to predict implicit surface representations for points in a brain template space. After training, the cortical surface at a desired level of detail is obtained by evaluating surface representations at specific coordinates, and subsequently applying a topology correction algorithm and an isosurface extraction method. Thanks to the continuous nature of this approach and the efficacy of its hypercolumn features scheme, DeepCSR efficiently reconstructs cortical surfaces at high resolution capturing fine details in the cortical folding. Moreover, DeepCSR is as accurate, more precise, and faster than the widely used FreeSurfer toolbox and its deep learning powered variant FastSurfer on reconstructing cortical surfaces from MRI which should facilitate large-scale medical studies and new healthcare applications.

[354]  arXiv:2010.11428 (cross-list from eess.AS) [pdf, other]
Title: Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition
Comments: Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)

For various speech-related tasks, confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions. In traditional hidden Markov model-based automatic speech recognition (ASR) systems, confidence scores can be reliably obtained from word posteriors in decoding lattices. However, for an ASR system with an auto-regressive decoder, such as an attention-based sequence-to-sequence model, computing word posteriors is difficult. An obvious alternative is to use the decoder softmax probability as the model confidence. In this paper, we first examine how some commonly used regularisation methods influence the softmax-based confidence scores and study the overconfident behaviour of end-to-end models. Then we propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can mitigate the overconfidence problem and can produce more reliable confidence scores with and without shallow fusion of a language model. Further analysis shows that CEM generalises well to speech from a moderately mismatched domain and can potentially improve downstream tasks such as semi-supervised learning.

[355]  arXiv:2010.11433 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning
Comments: 5 pages, 1 figure, 4 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the proposed CEL significantly outperforms the state-of-the-art unsupervised speaker verification systems and the best performing model achieved 8.01% and 4.01% EER on VoxCeleb1 and VOiCES evaluation sets, respectively. On top of that, the performance of the supervised speaker embedding networks trained with initial parameters pre-trained via CEL showed better performance than those trained with randomly initialized parameters.

[356]  arXiv:2010.11457 (cross-list from eess.AS) [pdf, other]
Title: Momentum Contrast Speaker Representation Learning
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Unsupervised representation learning has shown remarkable achievement by reducing the performance gap with supervised feature learning, especially in the image domain. In this study, to extend the technique of unsupervised learning to the speech domain, we propose the Momentum Contrast for VoxCeleb (MoCoVox) as a form of learning mechanism. We pre-trained the MoCoVox on the VoxCeleb1 by implementing instance discrimination. Applying MoCoVox for speaker verification revealed that it outperforms the state-of-the-art metric learning-based approach by a large margin. We also empirically demonstrate the features of contrastive learning in the speech domain by analyzing the distribution of learned representations. Furthermore, we explored which pretext task is adequate for speaker verification. We expect that learning speaker representation without human supervision helps to address the open-set speaker recognition.

[357]  arXiv:2010.11458 (cross-list from eess.AS) [pdf, other]
Title: Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020
Comments: 5 pages, 3 figures, 2 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

This paper describes the Microsoft speaker diarization system for monaural multi-talker recordings in the wild, evaluated at the diarization track of the VoxCeleb Speaker Recognition Challenge(VoxSRC) 2020. We will first explain our system design to address issues in handling real multi-talker recordings. We then present the details of the components, which include Res2Net-based speaker embedding extractor, conformer-based continuous speech separation with leakage filtering, and a modified DOVER (short for Diarization Output Voting Error Reduction) method for system fusion. We evaluate the systems with the data set provided by VoxSRCchallenge 2020, which contains real-life multi-talker audio collected from YouTube. Our best system achieves 3.71% and 6.23% of the diarization error rate (DER) on development set and evaluation set, respectively, being ranked the 1st at the diarization track of the challenge.

[358]  arXiv:2010.11481 (cross-list from eess.AS) [pdf, other]
Title: Similarity Analysis of Self-Supervised Speech Representations
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of speech tasks have also been investigated. However, there has been little research focusing on understanding the properties of existing approaches. In this work, we aim to provide a comparative study of some of the most representative self-supervised algorithms. Specifically, we quantify the similarities between different self-supervised representations using existing similarity measures. We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations. In addition to showing how various self-supervised models behave differently given the same input, our study also finds that the training objective has a higher impact on representation similarity than architectural choices such as building blocks (RNN/Transformer/CNN) and directionality (uni/bidirectional). Our results also suggest that there exists a strong correlation between pre-training loss and downstream performance for some self-supervised algorithms.

[359]  arXiv:2010.11489 (cross-list from eess.AS) [pdf, other]
Title: The NTU-AISG Text-to-speech System for Blizzard Challenge 2020
Comments: 5 pages, Technical Report
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizzard Challenge 2020 in this paper. There are two TTS tasks in this year's challenge, one is a Mandarin TTS task, the other is a Shanghai dialect TTS task. We have participated both. One of the main challenges is to build TTS systems with low-resource constraints, particularly for the case of Shanghai dialect, of which about three hours data are available to participants. To overcome the constraint, we adopt an average-speaker modeling method. That is, we first employ external Mandarin data to train both End-to-end acoustic model and WaveNet vocoder, then we use Shanghai dialect to tune the acoustic model and WaveNet vocoder respectively. Apart from this, we have no Shanghai dialect lexicon despite syllable transcripts are provided for the training data. Since we are not sure if similar syllable transcripts are provided for the evaluation data during the training stage, we use Mandarin lexicon for Shanghai dialect instead. With the letter, as decomposed from the corresponding Mandarin syllable, as input, though the naturalness and original speaker similarity of the synthesized speech are good, subjective evaluation results indicate the intelligibility of the synthesized speech is deeply undermined for the Shanghai dialect TTS system.

[360]  arXiv:2010.11518 (cross-list from stat.ML) [pdf, other]
Title: Geometry-Aware Hamiltonian Variational Auto-Encoder
Authors: Clément Chadebec (CRC, Université de Paris), Clément Mantoux (ARAMIS), Stéphanie Allassonnière (CRC, Université de Paris)
Comments: 44 pages, 23 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Differential Geometry (math.DG); Statistics Theory (math.ST)

Variational auto-encoders (VAEs) have proven to be a well suited tool for performing dimensionality reduction by extracting latent variables lying in a potentially much smaller dimensional space than the data. Their ability to capture meaningful information from the data can be easily apprehended when considering their capability to generate new realistic samples or perform potentially meaningful interpolations in a much smaller space. However, such generative models may perform poorly when trained on small data sets which are abundant in many real-life fields such as medicine. This may, among others, come from the lack of structure of the latent space, the geometry of which is often under-considered. We thus propose in this paper to see the latent space as a Riemannian manifold endowed with a parametrized metric learned at the same time as the encoder and decoder networks. This metric is then used in what we called the Riemannian Hamiltonian VAE which extends the Hamiltonian VAE introduced by arXiv:1805.11328 to better exploit the underlying geometry of the latent space. We argue that such latent space modelling provides useful information about its underlying structure leading to far more meaningful interpolations, more realistic data-generation and more reliable clustering.

[361]  arXiv:2010.11521 (cross-list from eess.IV) [pdf, other]
Title: Malaria detection from RBC images using shallow Convolutional Neural Networks
Comments: 8 pages, 4 figures, 1 table
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

The advent of Deep Learning models like VGG-16 and Resnet-50 has considerably revolutionized the field of image classification, and by using these Convolutional Neural Networks (CNN) architectures, one can get a high classification accuracy on a wide variety of image datasets. However, these Deep Learning models have a very high computational complexity and so incur a high computational cost of running these algorithms as well as make it hard to interpret the results. In this paper, we present a shallow CNN architecture which gives the same classification accuracy as the VGG-16 and Resnet-50 models for thin blood smear RBC slide images for detection of malaria, while decreasing the computational run time by an order of magnitude. This can offer a significant advantage for commercial deployment of these algorithms, especially in poorer countries in Africa and some parts of the Indian subcontinent, where the menace of malaria is quite severe.

[362]  arXiv:2010.11537 (cross-list from math.ST) [pdf, ps, other]
Title: On Mean Estimation for Heteroscedastic Random Variables
Comments: 29 pages
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT); Machine Learning (cs.LG)

We study the problem of estimating the common mean $\mu$ of $n$ independent symmetric random variables with different and unknown standard deviations $\sigma_1 \le \sigma_2 \le \cdots \le\sigma_n$. We show that, under some mild regularity assumptions on the distribution, there is a fully adaptive estimator $\widehat{\mu}$ such that it is invariant to permutations of the elements of the sample and satisfies that, up to logarithmic factors, with high probability, \[ |\widehat{\mu} - \mu| \lesssim \min\left\{\sigma_{m^*}, \frac{\sqrt{n}}{\sum_{i = \sqrt{n}}^n \sigma_i^{-1}} \right\}~, \] where the index $m^* \lesssim \sqrt{n}$ satisfies $m^* \approx \sqrt{\sigma_{m^*}\sum_{i = m^*}^n\sigma_i^{-1}}$.

[363]  arXiv:2010.11543 (cross-list from eess.AS) [pdf, other]
Title: Graph Attention Networks for Speaker Verification
Comments: 5 pages, 1 figure, 2 tables, submitted to ICASSP 2021 as a conference paper
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

This work presents a novel back-end framework for speaker verification using graph attention networks. Segment-wise speaker embeddings extracted from multiple crops within an utterance are interpreted as node representations of a graph. The proposed framework inputs segment-wise speaker embeddings from an enrollment and a test utterance and directly outputs a similarity score. We first construct a graph using segment-wise speaker embeddings and then input these to graph attention networks. After a few graph attention layers with residual connections, each node is projected into a one-dimensional space using affine transform, followed by a readout operation resulting in a scalar similarity score. To enable successful adaptation for speaker verification, we propose techniques such as separating trainable weights for attention map calculations between segment-wise speaker embeddings from different utterances. The effectiveness of the proposed framework is validated using three different speaker embedding extractors trained with different architectures and objective functions. Experimental results demonstrate consistent improvement over various baseline back-end classifiers, with an average equal error rate improvement of 20% over the cosine similarity back-end without test time augmentation.

[364]  arXiv:2010.11549 (cross-list from eess.AS) [pdf, other]
Title: How Similar or Different Is Rakugo Speech Synthesizer to Professional Performers?
Comments: Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

We have been working on speech synthesis for rakugo (a traditional Japanese form of verbal entertainment similar to one-person stand-up comedy) toward speech synthesis that authentically entertains audiences. In this paper, we propose a novel evaluation methodology using synthesized rakugo speech and real rakugo speech uttered by professional performers of three different ranks. The naturalness of the synthesized speech was comparable to that of the human speech, but the synthesized speech entertained listeners less than the performers of any rank. However, we obtained some interesting insights into challenges to be solved in order to achieve a truly entertaining rakugo synthesizer. For example, naturalness was not the most important factor, even though it has generally been emphasized as the most important point to be evaluated in the conventional speech synthesis field. More important factors were the understandability of the content and distinguishability of the characters in the rakugo story, both of which the synthesized rakugo speech was relatively inferior at as compared with the professional performers. We also found that fundamental frequency fo modeling should be further improved to better entertain audiences. These results show important steps to reaching authentically entertaining speech synthesis.

[365]  arXiv:2010.11557 (cross-list from eess.SP) [pdf, ps, other]
Title: Denoising Atmospheric Temperature Measurements Taken by the Mars Science Laboratory on the Martian Surface
Comments: 10 pages, 7 figures, accepted for publication at IEEE Transactions on Instrumentation and Measurement
Subjects: Signal Processing (eess.SP); Earth and Planetary Astrophysics (astro-ph.EP); Information Retrieval (cs.IR); Data Analysis, Statistics and Probability (physics.data-an)

In the present article we analyze data from two temperature sensors of the Mars Science Laboratory, which has been active in Mars since August 2012. Temperature measurements received from the rover are noisy and must be processed and validated before being delivered to the scientific community. Currently, a simple Moving Average (MA) filter is used to perform signal denoising. The application of this basic method relies on the assumption that the noise is stationary and statistically independent from the underlying structure of the signal, an arguable assumption in this kind of harsh environment. In this paper, we analyze the application of two alternative methods to process the temperature sensor measurements: the Discrete Wavelet Transform (DWT) and the Hilbert-Huang Transform (HHT). We consider two different datasets, one belonging to the current Martian measurement campaigns, and the other to the Thermal Vacuum Tests. The processing of these datasets allows to separate the random noise from the interference created by other systems. The experiments show that the MA filter may provide useful results under given circumstances. However, the proposed methods allow a better fitting for all the realistic scenarios, while providing the possibility to identify and analyze other interesting signal features and artifacts that could be later studied and classified. The large amount of data to be processed makes computational efficiency an important requirement in this mission. Considering the computational cost and the filtering performance, we propose the method based on DWT as more suitable for this application.

[366]  arXiv:2010.11566 (cross-list from eess.AS) [pdf, other]
Title: DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)

Many deep learning techniques are available to perform source separation and reduce background noise. However, designing an end-to-end multi-channel source separation method using deep learning and conventional acoustic signal processing techniques still remains challenging. In this paper we propose a direction-of-arrival-driven beamforming network (DBnet) consisting of direction-of-arrival (DOA) estimation and beamforming layers for end-to-end source separation. We propose to train DBnet using loss functions that are solely based on the distances between the separated speech signals and the target speech signals, without a need for the ground-truth DOAs of speakers. To improve the source separation performance, we also propose end-to-end extensions of DBnet which incorporate post masking networks. We evaluate the proposed DBnet and its extensions on a very challenging dataset, targeting realistic far-field sound source separation in reverberant and noisy environments. The experimental results show that the proposed extended DBnet using a convolutional-recurrent post masking network outperforms state-of-the-art source separation methods.

[367]  arXiv:2010.11595 (cross-list from stat.ML) [pdf, other]
Title: Early Anomaly Detection in Time Series: A Hierarchical Approach for Predicting Critical Health Episodes
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The early detection of anomalous events in time series data is essential in many domains of application. In this paper we deal with critical health events, which represent a significant cause of mortality in intensive care units of hospitals. The timely prediction of these events is crucial for mitigating their consequences and improving healthcare. One of the most common approaches to tackle early anomaly detection problems is standard classification methods. In this paper we propose a novel method that uses a layered learning architecture to address these tasks. One key contribution of our work is the idea of pre-conditional events, which denote arbitrary but computable relaxed versions of the event of interest. We leverage this idea to break the original problem into two hierarchical layers, which we hypothesize are easier to solve. The results suggest that the proposed approach leads to a better performance relative to state of the art approaches for critical health episode prediction.

[368]  arXiv:2010.11620 (cross-list from math.AT) [pdf, other]
Title: From trees to barcodes and back again: theoretical and statistical perspectives
Subjects: Algebraic Topology (math.AT); Data Structures and Algorithms (cs.DS)

Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there are many powerful techniques for computing topological descriptors, the inverse problem, i.e., recovering the input data from topological descriptors, has proved to be challenging. In this article we study in detail the Topological Morphology Descriptor (TMD), which assigns a persistence diagram to any tree embedded in Euclidean space, and a sort of stochastic inverse to the TMD, the Topological Neuron Synthesis (TNS) algorithm, gaining both theoretical and computational insights into the relation between the two. We propose a new approach to classify barcodes using symmetric groups, which provides a concrete language to formulate our results. We investigate to what extent the TNS recovers a geometric tree from its TMD and describe the effect of different types of noise on the process of tree generation from persistence diagrams. We prove moreover that the TNS algorithm is stable with respect to specific types of noise.

[369]  arXiv:2010.11634 (cross-list from eess.SP) [pdf, ps, other]
Title: Online Time-Varying Topology Identification via Prediction-Correction Algorithms
Comments: Submitted to IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
Subjects: Signal Processing (eess.SP); Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)

Signal processing and machine learning algorithms for data supported over graphs, require the knowledge of the graph topology. Unless this information is given by the physics of the problem (e.g., water supply networks, power grids), the topology has to be learned from data. Topology identification is a challenging task, as the problem is often ill-posed, and becomes even harder when the graph structure is time-varying. In this paper, we address the problem of dynamic topology identification by building on recent results from time-varying optimization, devising a general-purpose online algorithm operating in non-stationary environments. Because of its iteration-constrained nature, the proposed approach exhibits an intrinsic temporal-regularization of the graph topology without explicitly enforcing it. As a case-study, we specialize our method to the Gaussian graphical model (GGM) problem and corroborate its performance.

[370]  arXiv:2010.11637 (cross-list from math.OC) [pdf, other]
Title: Competitive Control with Delayed Imperfect Information
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Dynamical Systems (math.DS)

This paper studies the impact of imperfect information in online control with adversarial disturbances. In particular, we consider both delayed state feedback and inexact predictions of future disturbances. We introduce a greedy, myopic policy that yields a constant competitive ratio against the offline optimal policy with delayed feedback and inexact predictions. A special case of our result is a constant competitive policy for the case of exact predictions and no delay, a previously open problem. We also analyze the fundamental limits of online control with limited information by showing that our competitive ratio bounds for the greedy, myopic policy in the adversarial setting match (up to lower-order terms) lower bounds in the stochastic setting.

[371]  arXiv:2010.11642 (cross-list from stat.ML) [pdf, other]
Title: The Role of Mutual Information in Variational Classifiers
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by developing upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds provide an information-theoretic understanding of generalization in the so-called class of variational classifiers, which are regularized by a Kullback-Leibler (KL) divergence term. These results give theoretical grounds for the highly popular KL term in variational inference methods that was already recognized to act effectively as a regularization penalty. We further observe connections with well studied notions such as Variational Autoencoders, Information Dropout, Information Bottleneck and Boltzmann Machines. Finally, we perform numerical experiments on MNIST and CIFAR datasets and show that mutual information is indeed highly representative of the behaviour of the generalization error.

[372]  arXiv:2010.11658 (cross-list from quant-ph) [pdf, other]
Title: On the Compressed-Oracle Technique, and Post-Quantum Security of Proofs of Sequential Work
Subjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Cryptography and Security (cs.CR)

We revisit the so-called compressed oracle technique, introduced by Zhandry for analyzing quantum algorithms in the quantum random oracle model (QROM). To start off with, we offer a concise exposition of the technique, which easily extends to the parallel-query QROM, where in each query-round the considered algorithm may make several queries to the QROM in parallel. This variant of the QROM allows for a more fine-grained query-complexity analysis.
Our main technical contribution is a framework that simplifies the use of (the parallel-query generalization of) the compressed oracle technique for proving query complexity results. With our framework in place, whenever applicable, it is possible to prove quantum query complexity lower bounds by means of purely classical reasoning. More than that, for typical examples the crucial classical observations that give rise to the classical bounds are sufficient to conclude the corresponding quantum bounds.
We demonstrate this on a few examples, recovering known results (like the optimality of parallel Grover), but also obtaining new results (like the optimality of parallel BHT collision search). Our main target is the hardness of finding a $q$-chain with fewer than $q$ parallel queries, i.e., a sequence $x_0, x_1,\ldots, x_q$ with $x_i = H(x_{i-1})$ for all $1 \leq i \leq q$.
The above problem of finding a hash chain is of fundamental importance in the context of proofs of sequential work. Indeed, as a concrete cryptographic application of our techniques, we prove that the "Simple Proofs of Sequential Work" proposed by Cohen and Pietrzak remains secure against quantum attacks. Such an analysis is not simply a matter of plugging in our new bound; the entire protocol needs to be analyzed in the light of a quantum attack. Thanks to our framework, this can now be done with purely classical reasoning.

[373]  arXiv:2010.11665 (cross-list from stat.ML) [pdf, ps, other]
Title: Spike and slab variational Bayes for high dimensional logistic regression
Comments: NeurIPS 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME)

Variational Bayes (VB) is a popular scalable alternative to Markov chain Monte Carlo for Bayesian inference. We study a mean-field spike and slab VB approximation of widely used Bayesian model selection priors in sparse high-dimensional logistic regression. We provide non-asymptotic theoretical guarantees for the VB posterior in both $\ell_2$ and prediction loss for a sparse truth, giving optimal (minimax) convergence rates. Since the VB algorithm does not depend on the unknown truth to achieve optimality, our results shed light on effective prior choices. We confirm the improved performance of our VB algorithm over common sparse VB approaches in a numerical study.

[374]  arXiv:2010.11682 (cross-list from eess.IV) [pdf]
Title: Lung Nodule Classification Using Biomarkers, Volumetric Radiomics and 3D CNNs
Comments: This paper has been submitted to the Journal of Digital Imaging (JDI 2020). The poster of this paper has received the 2nd prize for the Research Poster Award. Link: this https URL
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

We present a hybrid algorithm to estimate lung nodule malignancy that combines imaging biomarkers from Radiologist's annotation with image classification of CT scans. Our algorithm employs a 3D Convolutional Neural Network (CNN) as well as a Random Forest in order to combine CT imagery with biomarker annotation and volumetric radiomic features. We analyze and compare the performance of the algorithm using only imagery, only biomarkers, combined imagery + biomarkers, combined imagery + volumetric radiomic features and finally the combination of imagery + biomarkers + volumetric features in order to classify the suspicion level of nodule malignancy. The National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) IDRI dataset is used to train and evaluate the classification task. We show that the incorporation of semi-supervised learning by means of K-Nearest-Neighbors (KNN) can increase the available training sample size of the LIDC-IDRI thereby further improving the accuracy of malignancy estimation of most of the models tested although there is no significant improvement with the use of KNN semi-supervised learning if image classification with CNNs and volumetric features are combined with descriptive biomarkers. Unexpectedly, we also show that a model using image biomarkers alone is more accurate than one that combines biomarkers with volumetric radiomics, 3D CNNs, and semi-supervised learning. We discuss the possibility that this result may be influenced by cognitive bias in LIDC-IDRI because malignancy estimates were recorded by the same radiologist panel as biomarkers, as well as future work to incorporate pathology information over a subset of study participants.

[375]  arXiv:2010.11687 (cross-list from eess.IV) [pdf, other]
Title: PlenoptiCam v1.0: A light-field imaging framework
Comments: preprint
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Light-field cameras play a vital role for rich 3-D information retrieval in narrow range depth sensing applications. The key obstacle in composing light-fields from exposures taken by a plenoptic camera is to computationally calibrate, re-align and rearrange four-dimensional image data. Several attempts have been proposed to enhance the overall image quality by tailoring pipelines dedicated to particular plenoptic cameras and improving the color consistency across viewpoints at the expense of high computational loads. The framework presented herein advances prior outcomes thanks to its cost-effective color equalization from parallax-invariant probability distribution transfers and a novel micro image scale-space analysis for generic camera calibration independent of the lens specifications. Our framework compensates for hot-pixels, resampling artifacts, micro image grid rotations just as vignetting in an innovative way to enable superior quality in sub-aperture image extraction, computational refocusing and Scheimpflug rendering with sub-sampling capabilities. Benchmark comparisons using established image metrics suggest that our proposed pipeline outperforms state-of-the-art tool chains in the majority of cases. The software described in this paper is released under an open-source license offering cross-platform compatibility, few dependencies and a lean graphical user interface to make the reproduction of results and the experimentation with plenoptic camera technology convenient for peer researchers, developers, photographers, data scientists and everyone else working in this field.

[376]  arXiv:2010.11695 (cross-list from eess.IV) [pdf, other]
Title: Automatic Data Augmentation for 3D Medical Image Segmentation
Comments: 10 pages
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Data augmentation is an effective and universal technique for improving generalization performance of deep neural networks. It could enrich diversity of training samples that is essential in medical image segmentation tasks because 1) the scale of medical image dataset is typically smaller, which may increase the risk of overfitting; 2) the shape and modality of different objects such as organs or tumors are unique, thus requiring customized data augmentation policy. However, most data augmentation implementations are hand-crafted and suboptimal in medical image processing. To fully exploit the potential of data augmentation, we propose an efficient algorithm to automatically search for the optimal augmentation strategies. We formulate the coupled optimization w.r.t. network weights and augmentation parameters into a differentiable form by means of stochastic relaxation. This formulation allows us to apply alternative gradient-based methods to solve it, i.e. stochastic natural gradient method with adaptive step-size. To the best of our knowledge, it is the first time that differentiable automatic data augmentation is employed in medical image segmentation tasks. Our numerical experiments demonstrate that the proposed approach significantly outperforms existing build-in data augmentation of state-of-the-art models.

[377]  arXiv:2010.11698 (cross-list from eess.IV) [pdf, other]
Title: OCT-GAN: Single Step Shadow and Noise Removal from Optical Coherence Tomography Images of the Human Optic Nerve Head
Comments: 20 pages, 7 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)

Speckle noise and retinal shadows within OCT B-scans occlude important edges, fine textures and deep tissues, preventing accurate and robust diagnosis by algorithms and clinicians. We developed a single process that successfully removed both noise and retinal shadows from unseen single-frame B-scans within 10.4ms. Mean average gradient magnitude (AGM) for the proposed algorithm was 57.2% higher than current state-of-the-art, while mean peak signal to noise ratio (PSNR), contrast to noise ratio (CNR), and structural similarity index metric (SSIM) increased by 11.1%, 154% and 187% respectively compared to single-frame B-scans. Mean intralayer contrast (ILC) improvement for the retinal nerve fiber layer (RNFL), photoreceptor layer (PR) and retinal pigment epithelium (RPE) layers decreased from 0.362 \pm 0.133 to 0.142 \pm 0.102, 0.449 \pm 0.116 to 0.0904 \pm 0.0769, 0.381 \pm 0.100 to 0.0590 \pm 0.0451 respectively. The proposed algorithm reduces the necessity for long image acquisition times, minimizes expensive hardware requirements and reduces motion artifacts in OCT images.

[378]  arXiv:2010.11718 (cross-list from eess.AS) [pdf, ps, other]
Title: Analysis of the BUT Diarization System for VoxConverse Challenge
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

This paper describes the system developed by the BUT team for the fourth track of the VoxCeleb Speaker Recognition Challenge, focusing on diarization on the VoxConverse dataset. The system consists of signal pre-processing, voice activity detection, speaker embedding extraction, an initial agglomerative hierarchical clustering followed by diarization using a Bayesian hidden Markov model, a reclustering step based on per-speaker global embeddings and overlapped speech detection and handling. We provide comparisons for each of the steps and share the implementation of the most relevant modules of our system. Our system scored second in the challenge in terms of the primary metric (diarization error rate) and first according to the secondary metric (Jaccard error rate).

[379]  arXiv:2010.11737 (cross-list from math.OC) [pdf, other]
Title: Efficient Projection-Free Algorithms for Saddle Point Problems
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)

The Frank-Wolfe algorithm is a classic method for constrained optimization problems. It has recently been popular in many machine learning applications because its projection-free property leads to more efficient iterations. In this paper, we study projection-free algorithms for convex-strongly-concave saddle point problems with complicated constraints. Our method combines Conditional Gradient Sliding with Mirror-Prox and shows that it only requires $\tilde{O}(1/\sqrt{\epsilon})$ gradient evaluations and $\tilde{O}(1/\epsilon^2)$ linear optimizations in the batch setting. We also extend our method to the stochastic setting and propose first stochastic projection-free algorithms for saddle point problems. Experimental results demonstrate the effectiveness of our algorithms and verify our theoretical guarantees.

[380]  arXiv:2010.11741 (cross-list from eess.AS) [pdf, other]
Title: Ultra-low power on-chip learning of speech commands with phase-change memories
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Audio and Speech Processing (eess.AS); Disordered Systems and Neural Networks (cond-mat.dis-nn); Hardware Architecture (cs.AR); Machine Learning (cs.LG); Sound (cs.SD)

Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI devices. Recently, we proposed an NVIMC-based neuromorphic accelerator using the phase change memories (PCMs), which we call as Raven. In this work, we demonstrate the ultra-low-power on-chip training and inference of speech commands using Raven. We showed that Raven can be trained on-chip with power consumption as low as 30~uW, which is suitable for edge applications. Furthermore, we showed that at iso-accuracies, Raven needs 70.36x and 269.23x less number of computations to be performed than a deep neural network (DNN) during inference and training, respectively. Owing to such low power and computational requirements, Raven provides a promising pathway towards ultra-low-power training and inference at the edge.

[381]  arXiv:2010.11748 (cross-list from stat.ML) [pdf, other]
Title: Classification with Rejection Based on Cost-sensitive Classification
Comments: 34 pages
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

The goal of classification with rejection is to avoid risky misclassification in error-critical applications such as medical diagnosis and product inspection. In this paper, based on the relationship between classification with rejection and cost-sensitive classification, we propose a novel method of classification with rejection by learning an ensemble of cost-sensitive classifiers, which satisfies all the following properties for the first time: (i) it can avoid estimating class-posterior probabilities, resulting in improved classification accuracy. (ii) it allows a flexible choice of losses including non-convex ones, (iii) it does not require complicated modifications when using different losses, (iv) it is applicable to both binary and multiclass cases, and (v) it is theoretically justifiable for any classification-calibrated loss. Experimental results demonstrate the usefulness of our proposed approach in clean-labeled, noisy-labeled, and positive-unlabeled classification.

[382]  arXiv:2010.11750 (cross-list from stat.ML) [pdf, other]
Title: Sharp Bias-variance Tradeoffs of Hard Parameter Sharing in High-dimensional Linear Regression
Comments: 44 pages, 3 figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)

Hard parameter sharing for multi-task learning is widely used in empirical research despite the fact that its generalization properties have not been well established in many cases. This paper studies its generalization properties in a fundamental setting: How does hard parameter sharing work given multiple linear regression tasks? We develop new techniques and establish a number of new results in the high-dimensional setting, where the sample size and feature dimension increase at a fixed ratio. First, we show a sharp bias-variance decomposition of hard parameter sharing, given multiple tasks with the same features. Second, we characterize the asymptotic bias-variance limit for two tasks, even when they have arbitrarily different sample size ratios and covariate shifts. We also demonstrate that these limiting estimates for the empirical loss are incredibly accurate in moderate dimensions. Finally, we explain an intriguing phenomenon where increasing one task's sample size helps another task initially by reducing variance but hurts eventually due to increasing bias. This suggests progressively adding data for optimizing hard parameter sharing, and we validate its efficiency in text classification tasks.

[383]  arXiv:2010.11751 (cross-list from physics.comp-ph) [pdf, other]
Title: Cross-platform programming model for many-core lattice Boltzmann simulations
Comments: The STLBM library is available at this https URL
Subjects: Computational Physics (physics.comp-ph); Distributed, Parallel, and Cluster Computing (cs.DC)

We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our approach does not rely on any language extensions, external libraries, vendor-specific code annotations, or pre-compilation steps. Thanks in particular to a recently proposed GPU back-end to C++17 Parallel Algorithms, it is shown that a single code can compile and reach state-of-the-art performance on both many-core CPU and GPU environments for the solution of a given non trivial fluid dynamics problem. The proposed strategy is tested with six different, commonly used implementation schemes to test the performance impact of memory access patterns on different platforms. Nine different LB collision models are included in the tests and exhibit good performance, demonstrating the versatility of our parallel approach. This work shows that it is less than ever necessary to draw a distinction between research and production software, as a concise and generic LB implementation yields performances comparable to those achievable in a hardware specific programming language. The results also highlight the gains of performance achieved by modern many-core CPUs and their apparent capability to narrow the gap with the traditionally massively faster GPU platforms. All code is made available to the community in form of the open-source project stlbm, which serves both as a stand-alone simulation software and as a collection of reusable patterns for the acceleration of pre-existing LB codes.

[384]  arXiv:2010.11765 (cross-list from q-bio.NC) [pdf, other]
Title: Identifying Learning Rules From Neural Network Observables
Comments: NeurIPS 2020 Camera Ready Version, 21 pages including supplementary information, 13 figures
Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Machine Learning (stat.ML)

The brain modifies its synaptic strengths during learning in order to better adapt to its environment. However, the underlying plasticity rules that govern learning are unknown. Many proposals have been suggested, including Hebbian mechanisms, explicit error backpropagation, and a variety of alternatives. It is an open question as to what specific experimental measurements would need to be made to determine whether any given learning rule is operative in a real biological system. In this work, we take a "virtual experimental" approach to this problem. Simulating idealized neuroscience experiments with artificial neural networks, we generate a large-scale dataset of learning trajectories of aggregate statistics measured in a variety of neural network architectures, loss functions, learning rule hyperparameters, and parameter initializations. We then take a discriminative approach, training linear and simple non-linear classifiers to identify learning rules from features based on these observables. We show that different classes of learning rules can be separated solely on the basis of aggregate statistics of the weights, activations, or instantaneous layer-wise activity changes, and that these results generalize to limited access to the trajectory and held-out architectures and learning curricula. We identify the statistics of each observable that are most relevant for rule identification, finding that statistics from network activities across training are more robust to unit undersampling and measurement noise than those obtained from the synaptic strengths. Our results suggest that activation patterns, available from electrophysiological recordings of post-synaptic activities on the order of several hundred units, frequently measured at wider intervals over the course of learning, may provide a good basis on which to identify learning rules.

[385]  arXiv:2010.11789 (cross-list from math.DS) [pdf, other]
Title: Travelling wave solutions for fully discrete FitzHugh-Nagumo type equations with infinite-range interactions
Subjects: Dynamical Systems (math.DS); Numerical Analysis (math.NA)

We investigate the impact of spatial-temporal discretisation schemes on the dynamics of a class of reaction-diffusion equations that includes the FitzHugh-Nagumo system. For the temporal discretisation we consider the family of six backward differential formula (BDF) methods, which includes the well-known backward-Euler scheme. The spatial discretisations can feature infinite-range interactions, allowing us to consider neural field models. We construct travelling wave solutions to these fully discrete systems in the small time-step limit by viewing them as singular perturbations of the corresponding spatially discrete system. In particular, we refine the previous approach by Hupkes and Van Vleck for scalar fully discretised systems, which is based on a spectral convergence technique that was developed by Bates, Chen and Chmaj.

[386]  arXiv:2010.11807 (cross-list from cond-mat.mtrl-sci) [pdf, other]
Title: Validation of non-negative matrix factorization for assessment of atomic pair-distribution function (PDF) data in a real-time streaming context
Subjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)

We validate the use of matrix factorization for the automatic identification of relevant components from atomic pair distribution function (PDF) data. We also present a newly developed software infrastructure for analyzing the PDF data arriving in streaming manner. We then apply two matrix factorization techniques, Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF), to study simulated and experiment datasets in the context of in situ experiment.

[387]  arXiv:2010.11810 (cross-list from q-bio.NC) [pdf, other]
Title: Factorized Neural Processes for Neural Processes: $K$-Shot Prediction of Neural Responses
Comments: 14 pages, 5 figures, NeurIPS 2020 conference paper
Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)

In recent years, artificial neural networks have achieved state-of-the-art performance for predicting the responses of neurons in the visual cortex to natural stimuli. However, they require a time consuming parameter optimization process for accurately modeling the tuning function of newly observed neurons, which prohibits many applications including real-time, closed-loop experiments. We overcome this limitation by formulating the problem as $K$-shot prediction to directly infer a neuron's tuning function from a small set of stimulus-response pairs using a Neural Process. This required us to developed a Factorized Neural Process, which embeds the observed set into a latent space partitioned into the receptive field location and the tuning function properties. We show on simulated responses that the predictions and reconstructed receptive fields from the Factorized Neural Process approach ground truth with increasing number of trials. Critically, the latent representation that summarizes the tuning function of a neuron is inferred in a quick, single forward pass through the network. Finally, we validate this approach on real neural data from visual cortex and find that the predictive accuracy is comparable to -- and for small $K$ even greater than -- optimization based approaches, while being substantially faster. We believe this novel deep learning systems identification framework will facilitate better real-time integration of artificial neural network modeling into neuroscience experiments.

[388]  arXiv:2010.11825 (cross-list from stat.ML) [pdf, other]
Title: Model identification and local linear convergence of coordinate descent
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)

For composite nonsmooth optimization problems, Forward-Backward algorithm achieves model identification (e.g. support identification for the Lasso) after a finite number of iterations, provided the objective function is regular enough. Results concerning coordinate descent are scarcer and model identification has only been shown for specific estimators, the support-vector machine for instance. In this work, we show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions. In addition, we prove explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results.

[389]  arXiv:2010.11841 (cross-list from econ.GN) [pdf]
Title: Does it Pay Off to Learn a New Skill? Revealing the Economic Benefits of Cross-Skilling
Authors: Fabian Stephany
Comments: 23 pages, 5 figures, 2 tables
Subjects: General Economics (econ.GN); Social and Information Networks (cs.SI); Applications (stat.AP)

This work examines the economic benefits of learning a new skill from a different domain: cross-skilling. To assess this, a network of skills from the job profiles of 4,810 online freelancers is constructed. Based on this skill network, relationships between 3,525 different skills are revealed and marginal effects of learning a new skill can be calculated via workers' wages. The results indicate that the added economic value of learning a new skill strongly depends on the already existing skill bundle but that acquiring a skill from a different domain is often beneficial. As technological and social transformation is reshuffling jobs' task profiles at a fast pace, the findings of this study help to clarify skill sets required for mastering new technologies and designing individual training pathways. This can help to increase employability and reduce labour market shortages.

[390]  arXiv:2010.11860 (cross-list from eess.AS) [pdf, other]
Title: Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Deep learning based speech denoising still suffers from the challenge of improving perceptual quality of enhanced signals. We introduce a generalized framework called Perceptual Ensemble Regularization Loss (PERL) built on the idea of perceptual losses. Perceptual loss discourages distortion to certain speech properties and we analyze it using six large-scale pre-trained models: speaker classification, acoustic model, speaker embedding, emotion classification, and two self-supervised speech encoders (PASE+, wav2vec 2.0). We first build a strong baseline (w/o PERL) using Conformer Transformer Networks on the popular enhancement benchmark called VCTK-DEMAND. Using auxiliary models one at a time, we find acoustic event and self-supervised model PASE+ to be most effective. Our best model (PERL-AE) only uses acoustic event model (utilizing AudioSet) to outperform state-of-the-art methods on major perceptual metrics. To explore if denoising can leverage full framework, we use all networks but find that our seven-loss formulation suffers from the challenges of Multi-Task Learning. Finally, we report a critical observation that state-of-the-art Multi-Task weight learning methods cannot outperform hand tuning, perhaps due to challenges of domain mismatch and weak complementarity of losses.

[391]  arXiv:2010.11875 (cross-list from eess.AS) [pdf, other]
Title: Position-Agnostic Multi-Microphone Speech Dereverberation
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Neural networks (NNs) have been widely applied in speech processing tasks, and, in particular, those employing microphone arrays. Nevertheless, most of the existing NN architectures can only deal with fixed and position-specific microphone arrays. In this paper, we present an NN architecture that can cope with microphone arrays on which no prior knowledge is presumed, and demonstrate its applicability on the speech dereverberation problem. To this end, our approach harnesses recent advances in the Deep Sets framework to design an architecture that enhances the reverberant log-spectrum. We provide a setup for training and testing such a network. Our experiments, using REVERB challenge datasets, show that the proposed position-agnostic setup performs comparably with the position-aware framework and sometimes slightly better, even with fewer microphones. In addition, it substantially improves performance over a single microphone architecture.

[392]  arXiv:2010.11909 (cross-list from eess.SP) [pdf, other]
Title: Contrastive Self-Supervised Learning for Wireless Power Control
Comments: Code available at this https URL
Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)

We propose a new approach for power control in wireless networks using self-supervised learning. We partition a multi-layer perceptron that takes as input the channel matrix and outputs the power control decisions into a backbone and a head, and we show how we can use contrastive learning to pre-train the backbone so that it produces similar embeddings at its output for similar channel matrices and vice versa, where similarity is defined in an information-theoretic sense by identifying the interference links that can be optimally treated as noise. The backbone and the head are then fine-tuned using a limited number of labeled samples. Simulation results show the effectiveness of the proposed approach, demonstrating significant gains over pure supervised learning methods in both sum-throughput and sample efficiency.

[393]  arXiv:2010.11914 (cross-list from math.ST) [pdf, ps, other]
Title: Gaussoids are two-antecedental approximations of Gaussian conditional independence structures
Authors: Tobias Boege
Comments: 17 pages
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)

The gaussoid axioms are conditional independence inference rules which characterize regular Gaussian CI structures over a three-element ground set. It is known that no finite set of inference rules completely describes regular Gaussian CI as the ground set grows. In this article we show that the gaussoid axioms logically imply every inference rule of at most two antecedents which is valid for regular Gaussians over any ground set. The proof is accomplished by exhibiting for each inclusion-minimal gaussoid extension of at most two CI statements a regular Gaussian realization. Moreover we prove that all those gaussoids have rational positive-definite realizations inside every $\varepsilon$-ball around the identity matrix. For the proof we introduce the concept of algebraic Gaussians over arbitrary fields and of positive Gaussians over ordered fields and obtain the same two-antecedental completeness of the gaussoid axioms for algebraic and positive Gaussians over all fields of characteristic zero as a byproduct.

Replacements for Fri, 23 Oct 20

[394]  arXiv:1602.03420 (replaced) [pdf, ps, other]
Title: Relative Perturbation Theory for Quadratic Hermitian Eigenvalue Problem
Comments: 29 pages
Subjects: Numerical Analysis (math.NA)
[395]  arXiv:1612.02904 (replaced) [pdf, other]
Title: GOTM: a Goal-Oriented Framework for Capturing Uncertainty of Medical Treatments
Comments: Idea Paper
Subjects: Artificial Intelligence (cs.AI)
[396]  arXiv:1806.05464 (replaced) [pdf, ps, other]
Title: Event-triggered controllers based on the supremum norm of sampling-induced error
Subjects: Systems and Control (eess.SY)
[397]  arXiv:1807.03488 (replaced) [pdf, other]
Title: Dynamics of Taxi-like Logistics Systems: Theory and Microscopic Simulations
Authors: Bo Yang, Qianxiao Li
Comments: 12 pages, comments very welcome
Journal-ref: Transportmetrica B: Transport Dynamics, 8:1, 129-149 (2020)
Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Multiagent Systems (cs.MA); Chaotic Dynamics (nlin.CD)
[398]  arXiv:1807.07648 (replaced) [pdf, ps, other]
Title: An improved uncertainty principle for functions with symmetry
Comments: 29 pages
Subjects: Classical Analysis and ODEs (math.CA); Information Theory (cs.IT); Number Theory (math.NT)
[399]  arXiv:1809.01506 (replaced) [pdf, other]
Title: VLSTM: Very Long Short-Term Memory Networks for High-Frequency Trading
Comments: 4 pages + 1 page references
Subjects: Machine Learning (cs.LG); Trading and Market Microstructure (q-fin.TR); Machine Learning (stat.ML)
[400]  arXiv:1810.12482 (replaced) [pdf, other]
Title: Using Large Ensembles of Control Variates for Variational Inference
Comments: Neural Information Processing Systems (NIPS 2018)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[401]  arXiv:1812.07026 (replaced) [pdf, ps, other]
Title: State Leakage and Coordination with Causal State Knowledge at the Encoder
Comments: preliminary draft
Subjects: Information Theory (cs.IT)
[402]  arXiv:1812.09324 (replaced) [pdf, other]
Title: End-to-End Classification of Reverberant Rooms using DNNs
Comments: Accept for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[403]  arXiv:1812.09808 (replaced) [pdf, other]
Title: Wasserstein Distributionally Robust Stochastic Control: A Data-Driven Approach
Authors: Insoon Yang
Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
[404]  arXiv:1903.05878 (replaced) [pdf, other]
Title: A Functional (Monadic) Second-Order Theory of Infinite Trees
Subjects: Logic in Computer Science (cs.LO)
[405]  arXiv:1903.06733 (replaced) [pdf, other]
Title: Dying ReLU and Initialization: Theory and Numerical Examples
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Probability (math.PR)
[406]  arXiv:1905.04149 (replaced) [pdf, other]
Title: A Survey on Deep Learning-based Non-Invasive Brain Signals:Recent Advances and New Frontiers
Comments: Accepted by Journal of Neural Engineering. Summarized more than 200+ brain signal-related papers, systematically covering 8 Brain-Computer Interface (BCI) categories and 10+ deep learning models
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Signal Processing (eess.SP); Neurons and Cognition (q-bio.NC)
[407]  arXiv:1905.09517 (replaced) [pdf, other]
Title: IN2LAAMA: INertial Lidar Localisation Autocalibration And MApping
Comments: This work is accepted for publication in IEEE Transactions on Robotics. Please find IEEE's copyright statement at the bottom of the first page
Subjects: Robotics (cs.RO)
[408]  arXiv:1905.11503 (replaced) [pdf, other]
Title: Body Shape Privacy in Images: Understanding Privacy and Preventing Automatic Shape Extraction
Journal-ref: Proc. of the IEEE European Conference on Computer Vision Workshops (ECCVW), CV-COPS@ECCV2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[409]  arXiv:1905.13462 (replaced) [pdf, other]
Title: Neural Markov Logic Networks
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[410]  arXiv:1906.01852 (replaced) [pdf, other]
Title: Variational Inference for Graph Convolutional Networks in the Absence of Graph Data and Adversarial Settings
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[411]  arXiv:1906.05046 (replaced) [pdf, other]
Title: Torus computed tomography
Comments: 26 pages, 12 figures, 3 tables; final version
Journal-ref: SIAM J. Appl. Math., 80(4), 1947--1976
Subjects: Functional Analysis (math.FA); Numerical Analysis (math.NA)
[412]  arXiv:1906.10197 (replaced) [pdf, other]
Title: Mutual exclusivity as a challenge for deep neural networks
Comments: Published in Advances in Neural Information Processing Systems (NeurIPS) 33
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[413]  arXiv:1906.11026 (replaced) [pdf, other]
Title: A numerical scheme for stochastic differential equations with distributional drift
Comments: 36 pages, 8 figures. Global rate obtained and structure of the paper changed slightly
Subjects: Probability (math.PR); Numerical Analysis (math.NA)
[414]  arXiv:1907.00801 (replaced) [pdf, ps, other]
Title: On Characterizations for Subclasses of Directed Co-Graphs
Comments: 26 pages
Subjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
[415]  arXiv:1909.01801 (replaced) [pdf, other]
Title: Soft Triangles for Expert Aggregation
Authors: Paul B. Kantor
Comments: 10 pp. 5 figures. 1 Table. Research Technical Report. This is really about elicitation -- the MSC class is not a very good fit This revision corrects and error in Eq 14, and adds one missing citation
Subjects: Artificial Intelligence (cs.AI)
[416]  arXiv:1909.02729 (replaced) [pdf, other]
Title: A Baseline for Few-Shot Image Classification
Journal-ref: International Conference on Learning Representations (ICLR), 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[417]  arXiv:1909.05507 (replaced) [pdf, other]
Title: Effective training of deep convolutional neural networks for hyperspectral image classification through artificial labeling
Journal-ref: Remote Sens. 2020, 12, 2653
Subjects: Neural and Evolutionary Computing (cs.NE); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[418]  arXiv:1909.05557 (replaced) [pdf, other]
Title: Modular Meta-Learning with Shrinkage
Comments: Accepted by NeurIPS 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[419]  arXiv:1909.07294 (replaced) [pdf, other]
Title: Selective Network Discovery via Deep Reinforcement Learning on Embedded Spaces
Comments: Submitted for review to the journal of applied network science (2020). This adds several new sections from the original complex networks submission which can be viewed under v1 or at this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[420]  arXiv:1909.12863 (replaced) [pdf, ps, other]
Title: Pivot Rules for Circuit-Augmentation Algorithms in Linear Optimization
Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Optimization and Control (math.OC)
[421]  arXiv:1910.02008 (replaced) [pdf, other]
Title: Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization
Comments: 35 pages
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Probability (math.PR); Machine Learning (stat.ML)
[422]  arXiv:1910.02015 (replaced) [pdf, other]
Title: Reach Out and Help: Assisted Remote Collaboration through a Handheld Robot
Comments: 11 pages, 6 figures
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
[423]  arXiv:1910.02029 (replaced) [pdf, other]
Title: Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
Comments: Accepted to IJCV 2020, 20 pages, 10 Figures, Demo Video: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Robotics (cs.RO)
[424]  arXiv:1910.02822 (replaced) [pdf, other]
Title: A mathematical theory of cooperative communication
Journal-ref: Advances in Neural Information Processing Systems 33 (NIPS 2030)
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[425]  arXiv:1910.06358 (replaced) [pdf, other]
Title: Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability
Comments: To appear in NeurIPS 2020; 9 pages, 2 figures, 2 appendices
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[426]  arXiv:1910.08639 (replaced) [pdf, other]
Title: OffWorld Gym: open-access physical robotics environment for real-world reinforcement learning benchmark and research
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
[427]  arXiv:1910.10873 (replaced) [pdf, ps, other]
Title: Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition
Comments: First two authors contributed equally. Accepted to NeurIPS 2020
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[428]  arXiv:1910.12179 (replaced) [pdf, other]
Title: BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
Comments: 27 pages(15 pages for appendix); Published in 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[429]  arXiv:1911.00696 (replaced) [pdf, other]
Title: Statistical EL is ExpTime-complete
Comments: Minor revision of the previous version
Subjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI)
[430]  arXiv:1911.01387 (replaced) [pdf, other]
Title: Understanding Static Code Warnings: an Incremental AI Approach
Comments: Accepted to Expert Systems with Applications
Subjects: Software Engineering (cs.SE)
[431]  arXiv:1911.05930 (replaced) [pdf, other]
Title: FAQ-based Question Answering via Knowledge Anchors
Comments: 12 pages, accepted by NLPCC-2020
Journal-ref: NLPCC-2020
Subjects: Computation and Language (cs.CL)
[432]  arXiv:1911.06465 (replaced) [pdf, other]
Title: Fourier Spectrum Discrepancies in Deep Network Generated Images
Comments: 11 pages, 7 figures
Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)
[433]  arXiv:1911.07101 (replaced) [pdf, other]
Title: Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data
Comments: 10 pages, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[434]  arXiv:1911.07954 (replaced) [pdf, ps, other]
Title: ISP4ML: Understanding the Role of Image Signal Processing in Efficient Deep Learning Vision Systems
Comments: 13 pages, 11 figures
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[435]  arXiv:1911.08010 (replaced) [pdf]
Title: Convolutional Neural Network and decision support in medical imaging: case study of the recognition of blood cell subtypes
Comments: 7 pages, 6 figures, 1 table
Journal-ref: CEUR-WS.org/Vol-2647 (2019), pp. 128-140
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[436]  arXiv:1911.09058 (replaced) [pdf, other]
Title: Fine-grained Synthesis of Unrestricted Adversarial Examples
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[437]  arXiv:1911.10335 (replaced) [pdf]
Title: Attention Deep Model with Multi-Scale Deep Supervision for Person Re-Identification
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[438]  arXiv:1912.00825 (replaced) [pdf, other]
Title: A novel iterative penalty method to enforce boundary conditions in Finite Volume POD-Galerkin reduced order models for fluid dynamics problems
Comments: 31 pages, 14 figures, 3 tables
Subjects: Numerical Analysis (math.NA)
[439]  arXiv:1912.08335 (replaced) [pdf, other]
Title: Learning under Model Misspecification: Applications to Variational and Ensemble methods
Comments: Camera-Ready Version. NeurIPS 2020. Minor changes
Subjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
[440]  arXiv:2001.07631 (replaced) [pdf, other]
Title: HRFA: High-Resolution Feature-based Attack
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[441]  arXiv:2001.10290 (replaced) [pdf, other]
Title: Discrete Signal Processing with Set Functions
Comments: 16 pages, submitted for publication
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
[442]  arXiv:2001.10741 (replaced) [pdf, ps, other]
Title: Extreme Algorithm Selection With Dyadic Feature Representation
Comments: Published at Discovery Science 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[443]  arXiv:2001.11316 (replaced) [pdf, other]
Title: Adversarial Training for Aspect-Based Sentiment Analysis with BERT
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
[444]  arXiv:2002.00874 (replaced) [pdf, ps, other]
Title: Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
[445]  arXiv:2002.02782 (replaced) [pdf, other]
Title: Inverse Learning of Symmetries
Comments: Accepted for publication at NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[446]  arXiv:2002.03762 (replaced) [pdf, ps, other]
Title: Extensional proofs in a propositional logic modulo isomorphisms
Comments: 26 pages
Subjects: Logic in Computer Science (cs.LO)
[447]  arXiv:2002.05100 (replaced) [src]
Title: Fundamental Limits of Biometric Identification Systems with Strong Secrecy
Comments: Some critical errors are found in the proofs of the main results, so we want to withdraw this paper. Thank you
Subjects: Information Theory (cs.IT)
[448]  arXiv:2002.05909 (replaced) [pdf, other]
Title: Deep reconstruction of strange attractors from time series
Authors: William Gilpin
Comments: 9 pages, 6 figures, plus appendices
Journal-ref: NeurIPS (Neural Information Processing Systems) 2020
Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Data Analysis, Statistics and Probability (physics.data-an); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[449]  arXiv:2002.06093 (replaced) [pdf, other]
Title: Docking Haptics: Extending the Reach of Haptics by Dynamic Combinations of Grounded and Worn Devices
Subjects: Human-Computer Interaction (cs.HC)
[450]  arXiv:2002.06873 (replaced) [pdf, other]
Title: $π$VAE: Encoding stochastic process priors with variational autoencoders
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[451]  arXiv:2002.07217 (replaced) [pdf, other]
Title: Decision-Making with Auto-Encoding Variational Bayes
Journal-ref: Advances in Neural Information Processing Systems 2020
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[452]  arXiv:2002.07686 (replaced) [pdf, other]
Title: Robust Quantization: One Model to Rule Them All
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[453]  arXiv:2002.08541 (replaced) [pdf, other]
Title: Simple and Scalable Sparse k-means Clustering via Feature Ranking
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[454]  arXiv:2002.11369 (replaced) [pdf, other]
Title: Lipschitz standardization for multivariate learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[455]  arXiv:2002.11661 (replaced) [pdf, other]
Title: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering
Comments: 27 pages, 12 figures
Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an); Machine Learning (stat.ML)
[456]  arXiv:2003.03581 (replaced) [pdf, other]
Title: StyleGAN2 Distillation for Feed-forward Image Manipulation
Comments: Camera ready ECCV 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[457]  arXiv:2003.04627 (replaced) [pdf, ps, other]
Title: SCL with Theory Constraints
Comments: 22 pages
Subjects: Logic in Computer Science (cs.LO)
[458]  arXiv:2003.07664 (replaced) [pdf, other]
Title: CinemAirSim: A Camera-Realistic Robotics Simulator for Cinematographic Purposes
Comments: Accepted preprint for the IROS 2020
Subjects: Robotics (cs.RO)
[459]  arXiv:2003.08476 (replaced) [pdf, other]
Title: Visual link retrieval and knowledge discovery in painting datasets
Comments: Published on Multimedia Tools and Applications. Modified references. Corrected typos. Added observations according to reviewers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[460]  arXiv:2003.08597 (replaced) [pdf, other]
Title: Deep convolutional embedding for digitized painting clustering
Comments: Accepted at ICPR2020. Added references. Corrected typos. Added new results and observations according to reviewers
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[461]  arXiv:2003.08744 (replaced) [pdf, other]
Title: PLOP: Probabilistic poLynomial Objects trajectory Planning for autonomous driving
Comments: Accepted at CorRL 2020 (matching camera-ready version)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
[462]  arXiv:2003.10152 (replaced) [pdf, other]
Title: SOLOv2: Dynamic, Faster and Stronger
Comments: Accepted to Proc. Advances in Neural Information Processing Systems (NeurIPS'20). Code is available at: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[463]  arXiv:2003.12309 (replaced) [pdf, other]
Title: COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations
Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG)
[464]  arXiv:2003.13924 (replaced) [pdf, other]
Title: EvolveGraph: Multi-Agent Trajectory Prediction with Dynamic Relational Reasoning
Comments: NeurIPS 2020. Website: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Robotics (cs.RO)
[465]  arXiv:2004.01486 (replaced) [pdf, other]
Title: Cell Segmentation and Tracking using CNN-Based Distance Predictions and a Graph-Based Matching Strategy
Comments: 25 pages, 14 figures, methods of the team KIT-Sch-GE for the IEEE ISBI 2020 Cell Tracking Challenge
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[466]  arXiv:2004.01806 (replaced) [pdf, other]
Title: On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs
Subjects: Numerical Analysis (math.NA); Machine Learning (cs.LG)
[467]  arXiv:2004.02881 (replaced) [pdf, other]
Title: Estimate of the Neural Network Dimension Using Algebraic Topology and Lie Theory
Comments: The title of this article was formerly "Parameterization of Neural Networks with Connected Abelian Lie Groups as Data Manifold"
Subjects: Machine Learning (stat.ML); Computational Geometry (cs.CG); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[468]  arXiv:2004.04071 (replaced) [pdf, other]
Title: Multilevel Asymptotic-Preserving Monte Carlo for Particle Simulations
Subjects: Numerical Analysis (math.NA)
[469]  arXiv:2004.07211 (replaced) [pdf, other]
Title: Dark Experience for General Continual Learning: a Strong, Simple Baseline
Comments: 24 pages, 4 figures. Accepted at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[470]  arXiv:2004.08681 (replaced) [pdf, other]
Title: Effective gaps are not effective: quasipolynomial classical simulation of obstructed stoquastic Hamiltonians
Comments: 12 pages, 6 figures, added footnotes (v2), updated following peer review (v3)
Subjects: Quantum Physics (quant-ph); Strongly Correlated Electrons (cond-mat.str-el); Data Structures and Algorithms (cs.DS)
[471]  arXiv:2004.09347 (replaced) [pdf, other]
Title: WHALETRANS: E2E WHisper to nAturaL spEech conversion using modified TRANSformer network
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[472]  arXiv:2004.11497 (replaced) [pdf, other]
Title: Causal Modeling with Stochastic Confounders
Comments: preprint, work in progress
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[473]  arXiv:2004.13889 (replaced) [pdf, other]
Title: LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space
Comments: EMNLP 2020 accepted paper
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[474]  arXiv:2004.14309 (replaced) [pdf, other]
Title: How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
[475]  arXiv:2004.14928 (replaced) [pdf, other]
Title: Language Model Prior for Low-Resource Neural Machine Translation
Comments: Accepted at EMNLP 2020. Camera-ready version
Subjects: Computation and Language (cs.CL)
[476]  arXiv:2005.00094 (replaced) [pdf, other]
Title: A simple geometric method for navigating the energy landscape of centroidal Voronoi tessellations
Comments: 27 pages, 72 figures
Subjects: Numerical Analysis (math.NA)
[477]  arXiv:2005.00687 (replaced) [pdf, other]
Title: Open Graph Benchmark: Datasets for Machine Learning on Graphs
Comments: NeurIPS 2020 camera-ready long version
Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
[478]  arXiv:2005.01038 (replaced) [pdf, other]
Title: SEPAR: Towards Regulating Future of Work Multi-Platform Crowdworking Environments with Privacy Guarantees
Subjects: Databases (cs.DB); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[479]  arXiv:2005.01580 (replaced) [pdf, ps, other]
Title: How Many Modes Can a Mixture of Gaussians with Uniformly Bounded Means Have?
Comments: 11 pages, 1 figure; this version is currently under review at Information and Inference: A Journal of the IMA
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
[480]  arXiv:2005.01856 (replaced) [pdf, other]
Title: Selecting Data Augmentation for Simulating Interventions
Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[481]  arXiv:2005.05195 (replaced) [pdf, ps, other]
Title: Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality
Comments: Submitted to JMLR
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)
[482]  arXiv:2005.07551 (replaced) [pdf, other]
Title: Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression
Comments: Accepted by Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[483]  arXiv:2005.08545 (replaced) [pdf, ps, other]
Title: Joint Index Coding and Incentive Design for Selfish Clients
Comments: 37 pages (single column), submitted for possible journal publication
Subjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
[484]  arXiv:2005.08772 (replaced) [pdf, other]
Title: Color Visual Illusions: A Statistics-based Computational Model
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[485]  arXiv:2005.09964 (replaced) [pdf, other]
Title: Iterative Network for Image Super-Resolution
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[486]  arXiv:2005.10298 (replaced) [pdf, ps, other]
Title: Sensor Networks TDOA Self-Calibration: 2D Complexity Analysis and Solutions
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
[487]  arXiv:2005.11110 (replaced) [pdf, other]
Title: Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties
Comments: 12 pages main text, 20 pages appendix. v2: changes due to NeurIPS review process. Camera-ready version to be published at NeurIPS 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[488]  arXiv:2005.12531 (replaced) [pdf, other]
Title: Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[489]  arXiv:2005.12685 (replaced) [pdf, other]
Title: Integrated Model-Driven Engineering of Blockchain Applications for Business Processes and Asset Management
Comments: to appear in SPE
Subjects: Software Engineering (cs.SE)
[490]  arXiv:2005.14436 (replaced) [pdf, other]
Title: Machine learning in spectral domain
Subjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Signal Processing (eess.SP); Machine Learning (stat.ML)
[491]  arXiv:2006.02361 (replaced) [pdf, other]
Title: Optimizing Neural Networks via Koopman Operator Theory
Comments: 11 main content pages (7 supplementary pages), 3 main content figures (3 supplementary figures), 2 main content Tables (5 supplementary Tables). 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada
Subjects: Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP); Dynamical Systems (math.DS); Computational Physics (physics.comp-ph)
[492]  arXiv:2006.03875 (replaced) [pdf, other]
Title: Coresets via Bilevel Optimization for Continual Learning and Streaming
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[493]  arXiv:2006.04115 (replaced) [pdf, ps, other]
Title: DiffGCN: Graph Convolutional Networks via Differential Operators and Algebraic Multigrid Pooling
Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[494]  arXiv:2006.04152 (replaced) [pdf, other]
Title: BERT Loses Patience: Fast and Robust Inference with Early Exit
Comments: NeurIPS 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
[495]  arXiv:2006.04176 (replaced) [pdf, ps, other]
Title: Deep active inference agents using Monte-Carlo methods
Comments: To appear in NeurIPS 2020
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[496]  arXiv:2006.04376 (replaced) [pdf, other]
Title: Speaker Diarization as a Fully Online Learning Problem in MiniVox
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
[497]  arXiv:2006.05065 (replaced) [pdf, other]
Title: Self-Distillation as Instance-Specific Label Smoothing
Journal-ref: 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[498]  arXiv:2006.05327 (replaced) [pdf, other]
Title: mEBAL: A Multimodal Database for Eye Blink Detection and Attention Level Estimation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
[499]  arXiv:2006.05849 (replaced) [pdf, other]
Title: Self-Supervised Relational Reasoning for Representation Learning
Comments: Advances in Neural Information Processing Systems (NeurIPS 2020, Spotlight)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[500]  arXiv:2006.06068 (replaced) [pdf, ps, other]
Title: Variance reduction for Random Coordinate Descent-Langevin Monte Carlo
Authors: Zhiyan Ding, Qin Li
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
[501]  arXiv:2006.06177 (replaced) [pdf, other]
Title: COVID-19-CT-CXR: a freely accessible and weakly labeled chest X-ray and CT image collection on COVID-19 from biomedical literature
Comments: Accepted by IEEE Transactions on Big Data
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[502]  arXiv:2006.06547 (replaced) [pdf, other]
Title: Avoiding Side Effects in Complex Environments
Comments: Accepted as spotlight paper at NeurIPS 2020. 10 pages main paper; 19 pages with appendices
Subjects: Artificial Intelligence (cs.AI)
[503]  arXiv:2006.06574 (replaced) [pdf, other]
Title: Dynamically Stable Infinite-Width Limits of Neural Classifiers
Comments: 26 pages, 7 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[504]  arXiv:2006.06643 (replaced) [pdf, other]
Title: Smoothed Geometry for Robust Attribution
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[505]  arXiv:2006.06743 (replaced) [pdf, other]
Title: Faster DBSCAN via subsampled similarity queries
Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[506]  arXiv:2006.07207 (replaced) [pdf, other]
Title: On topology optimization of large deformation contact-aided shape morphing compliant mechanisms
Comments: 19 pages, 12 figures, 5 tables
Journal-ref: Mechanism and Machine Theory, 2021
Subjects: Computational Engineering, Finance, and Science (cs.CE)
[507]  arXiv:2006.07242 (replaced) [pdf, other]
Title: Ensemble Distillation for Robust Model Fusion in Federated Learning
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[508]  arXiv:2006.07364 (replaced) [pdf, other]
Title: Residual Force Control for Agile Human Behavior Imitation and Extended Motion Synthesis
Authors: Ye Yuan, Kris Kitani
Comments: NeurIPS 2020. Code: this https URL Project page: this https URL
Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Machine Learning (stat.ML)
[509]  arXiv:2006.07571 (replaced) [pdf, other]
Title: $γ$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a Robust Divergence Estimator
Comments: 46 pages, 15 figures, adding the experimental results of simulation errors
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO); Methodology (stat.ME)
[510]  arXiv:2006.07882 (replaced) [pdf, other]
Title: Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence
Comments: Accepted at the Conference on Neural Information Processing Systems (NeurIPS) 2020; camera-ready version
Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Algebraic Topology (math.AT); Machine Learning (stat.ML)
[511]  arXiv:2006.08571 (replaced) [pdf, other]
Title: COT-GAN: Generating Sequential Data via Causal Optimal Transport
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[512]  arXiv:2006.09239 (replaced) [pdf, other]
Title: Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts
Comments: Neurips 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[513]  arXiv:2006.10016 (replaced) [pdf, other]
Title: Regularized ERM on random subspaces
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[514]  arXiv:2006.10160 (replaced) [pdf, other]
Title: Matérn Gaussian processes on Riemannian manifolds
Journal-ref: Advances in Neural Information Processing Systems, 2020
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[515]  arXiv:2006.10325 (replaced) [pdf, other]
Title: When OT meets MoM: Robust estimation of Wasserstein Distance
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[516]  arXiv:2006.10800 (replaced) [pdf, other]
Title: Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[517]  arXiv:2006.10815 (replaced) [pdf, other]
Title: Automatically Learning Compact Quality-aware Surrogates for Optimization Problems
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[518]  arXiv:2006.11477 (replaced) [pdf, other]
Title: wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[519]  arXiv:2006.11597 (replaced) [pdf, other]
Title: Using Fault Injection to Assess Blockchain Systems in Presence of Faulty Smart Contracts
Comments: Authors' manuscript. Published in IEEE Access 2020. The final publication is available at IEEE via this http URL
Subjects: Software Engineering (cs.SE)
[520]  arXiv:2006.11650 (replaced) [pdf, ps, other]
Title: On the Theory of Transfer Learning: The Importance of Task Diversity
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[521]  arXiv:2006.11727 (replaced) [pdf, other]
Title: Affine symmetries and neural network identifiability
Comments: 59 pages, 9 figures
Subjects: Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
[522]  arXiv:2006.12147 (replaced) [pdf, other]
Title: Optimization of NB QC-LDPC Block Codes and Their Performance Analysis
Comments: 26 pages
Subjects: Information Theory (cs.IT)
[523]  arXiv:2006.12226 (replaced) [pdf, other]
Title: Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[524]  arXiv:2006.13913 (replaced) [pdf, other]
Title: Generative causal explanations of black-box classifiers
Comments: Camera-ready version to appear at NeurIPS 2020
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[525]  arXiv:2006.14748 (replaced) [pdf, other]
Title: Proper Network Interpretability Helps Adversarial Robustness in Classification
Comments: 22 pages, 9 figures, Published at ICML 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[526]  arXiv:2006.14769 (replaced) [pdf, other]
Title: Supermasks in Superposition
Comments: NeurIPS 2020 Camera Ready
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[527]  arXiv:2006.14785 (replaced) [pdf, other]
Title: On Regret with Multiple Best Arms
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
[528]  arXiv:2006.14827 (replaced) [pdf, other]
Title: Memory-efficient Embedding for Recommendations
Subjects: Information Retrieval (cs.IR)
[529]  arXiv:2006.16736 (replaced) [pdf, other]
Title: Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency
Comments: Accepted to NeurIPS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
[530]  arXiv:2007.00323 (replaced) [pdf, other]
Title: Future Urban Scenes Generation Through Vehicles Synthesis
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computational Geometry (cs.CG)
[531]  arXiv:2007.00720 (replaced) [pdf, other]
Title: Adversarial Example Games
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[532]  arXiv:2007.01069 (replaced) [pdf, ps, other]
Title: Joint Passive Beamforming and User Association Optimization for IRS-assisted mmWave Systems
Comments: 6 pages, 5 figures
Subjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
[533]  arXiv:2007.01579 (replaced) [pdf, ps, other]
Title: Noise-Robust Adaptation Control for Supervised Acoustic System Identification Exploiting A Noise Dictionary
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[534]  arXiv:2007.02529 (replaced) [pdf, other]
Title: Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning
Comments: NeurIPS 2020 Camera Ready; first two authors contributed equally
Subjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
[535]  arXiv:2007.02817 (replaced) [pdf, other]
Title: Faster Graph Embeddings via Coarsening
Comments: 18 pages, 2 figures, to appear in the Proceedings of the 37th International Conference on Machine Learning (ICML 2020)
Subjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
[536]  arXiv:2007.02842 (replaced) [pdf, other]
Title: Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[537]  arXiv:2007.02914 (replaced) [pdf, other]
Title: Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding
Comments: Accepted to NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[538]  arXiv:2007.03210 (replaced) [pdf, ps, other]
Title: Estimation and Inference with Trees and Forests in High Dimensions
Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2020
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG)
[539]  arXiv:2007.05033 (replaced) [pdf, other]
Title: Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models
Comments: 17 pages, 5 figures, 5 tables. NeurIPS 2020
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
[540]  arXiv:2007.06049 (replaced) [pdf, other]
Title: An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay
Comments: NeurIPS 2020
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
[541]  arXiv:2007.07011 (replaced) [pdf, other]
Title: Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting
Comments: To appear in Advances in Neural Information Processing Systems 33 (NeurIPS-20)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
[542]  arXiv:2007.07803 (replaced) [pdf, other]
Title: Fine-Tune Longformer for Jointly Predicting Rumor Stance and Veracity
Authors: Anant Khandelwal
Comments: 10 pages, 2 figures, 6 tables; Accepted at ACM CoDS-COMAD 2021
Subjects: Computation and Language (cs.CL)
[543]  arXiv:2007.08095 (replaced) [pdf, other]
Title: Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis
Comments: Published in NeurIPS 2020
Subjects: Machine Learning (cs.LG); Software Engineering (cs.SE); Machine Learning (stat.ML)
[544]  arXiv:2007.09033 (replaced) [pdf, other]
Title: Region-based Non-local Operation for Video Classification
Subjects: Computer Vision and Pattern Recognition (cs.CV)
[545]  arXiv:2007.09552 (replaced) [pdf, other]
Title: Progressive Multi-Scale Residual Network for Single Image Super-Resolution
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
[546]  arXiv:2007.11078 (replaced) [pdf, other]
Title: The Complete Lasso Tradeoff Diagram
Subjects: Statistics Theory (math.ST); Information Theory (cs.IT)
[547]  arXiv:2007.11301 (replaced) [pdf, other]
Title: DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
Comments: Accepted to NeurIPS 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV)